Why do we still use Ghostscript?

Late last year, I have had a bit of a Twitter discussion on the fact that I can’t think of a good reason why Ghostscript is still a standard part of the Free Software desktop environment. The comments started from a security issue related to file-access from within a PostScript program (i.e. a .ps file), and at the time I actually started drafting some of the content that is becoming this post now. I then shelved most of it because I’ve been busy and it was not topical.

Then Tavis had to bring this back to the attention of the public, and so I’m back writing this.

To be able to answer the question I pose in the title we have to first define what Ghostscript is — and the short answer is, a PostScript renderer. Of course it’s a lot more than just that, but for the most part, that’s what it is. It deals with PostScript programs (or documents, if you prefer), and renders them into different formats. PostScript is rarely if at all use in modern desktops — not just because it’s overly complicated, but because it’s just not that useful in a world that mostly settled in PDF, which is essentially a “compiled PostScript”.

Okay not quite. There are plenty of qualifications that go around that whole paragraph, but I think it matches the practicalities of the case fairly well.

PostScript has found a number of interesting niche uses though, a lot of which focus around printing, because PostScript is the language that older (early?) printers used. I have not seen any modern printers speak PostScript though, at least after my Kyocera FS-1020, and even those who do, tend to support alternative “languages” and raster formats. On the other hand, because PostScript was a “lingua franca” for printers, CUPS and other printer-related tooling still use PostScript as an intermediate language.

In a similar fashion, quite a few software that deal with faxes (yes, faxes), tend to make use of Ghostscript itself. I would know because I wrote one, under contract, a long time ago. The reason is frankly pragmatic: if you’re on the client side, you want Windows to “print to fax”, and having a virtual PostScript printer is very easy — at that point you want to convert the document into something that can be easily shoved down the fax software throat, which ends up being TIFF (because TIFF is, as I understand it, the closest encoding to the physical faxes). And Ghostscript is very good at doing that.

Indeed, I have used (and seen used) Ghostscript in many cases to basically combine a bunch of images into a single document, usually in TIFF or PDF format. It’s very good at doing that, if you know how to use it, or you copy-paste from other people’s implementation.

Often, this is done through the command line, too, the reason for which is to be found in the licenses used by various Ghostscript implementations and versions over time. Indeed, while many people think of Ghostscript as an open source Swiss Army Knife of document processing, it actually is dual-licensed. The Wikipedia page for the project shows eight variant, with at least four different licenses over time. The current options are AGPLv3 or the commercial paid-for license — and I can tell you that a lot of people (including the folks I worked under contract for), don’t really want to pay for that license, preferring instead the “arms’ length” aggregation of calling the binary rather than linking it in. Indeed, I wrote a .NET Library to do just that. It’s optimized for (you guessed it right) TIFF files, because it was a component of an Internet Fax implementation.

So where does this leave us?

Back ten years ago or so, when effectively every Free Software desktop PDF viewer was effectively forking the XPDF source code to adapt it to whatever rendering engine they needed, it took a significant list of vulnerabilities that needed to be fixed time and time again for the Poppler project to take off, and create One PDF Rendering To Rule Them All. I think we need the same for Ghostscript. With a few differences.

The first difference is that I think we need to take a good look at what Ghostscript, and Postscript, are useful for in today’s desktops. Combining multiple images in a single document should _not_ require processing all the way to PostScript. There’s no reason to! Particularly not when the images are just JPEG files, and PDF can embed them directly. Having a tool that is good at combining multiple images into a PDF, with decent options for page size and alignment, would probably replace many of the usages of Ghostscript that I had in my own tools and scripts over the past few years.

And while rendering PostScript for either display or print are similar enough tasks, I have some doubt the same code would work right for both. PostScript and Ghostscript are often used in _networked_ printing as well. In which case there’s a lot of processing of untrusted input — both for display and printing. Sandboxing – and possibly writing this in a language better suited to deal with untrusted input than C is – would go a long way to prevent problems there.

But there are a few other interesting topics that I want to point out on this. I can’t think of any good reason for _desktops_ to support PostScript out of the box in 2019. While I can still think of a lot of tools, particularly from the old timers, that use PostScript as an intermediate format, most people _in the world_ would use PDF nowadays to share documents, not PostScript. It’s kind of like sharing DVI files — which I have done before, but I now wonder why. While both formats might have advantages over PDF, in 2019 they definitely lost the format war. macOs might still support both (I don’t know), but Windows and Android definitely don’t, which make them pretty useless to share knowledge with the world.

What I mean with that is that it’s probably due time that PostScript becomes an _optional_ component of the Free Software Desktop, one that the users need to enable explicitly _if they ever need it_, just to limit the risks that accepting, displaying and thumbnailing full, Turing-complete programs masqueraded as documents. Even Microsoft stopped running macros in Office documents by default, when they realize the type of footgun it had become.

Of course talk is cheap, and I should probably try to help directly myself. Unfortunately I don’t have much experience with graphics formats, beside for maintaining unpaper, and that is not a particularly good result either: I tried using libav’s image loading, and it turns out it’s actually a mess. So I guess I should either invest my time in learning enough about building libraries for image processing, or poke around to see if someone wrote a good multi-format image processing library in, say, Rust.

Alternatively, if someone starts to work on this and want to have some help with either reviewing the code, or with integrating the final output in places where Ghostscript is used, I’m happy to volunteer my time. I’m fairly sure I can convince my manager to let me do some of that work as a 20% project.

And what about imported libraries?

Following the previous blog here also a list of projects that seem to like importing libraries, causing code duplication even for code that was designed to be shared.

  • cdrkit, again, contains a stripped down version of libdvdread, added, of course, by our beloved Jörg Schilling; bug #206939; additionally it contains a copy of cdparanoia code; bug #207029

  • ImageMagick comes with a copy of libltdl; bug #206937

  • not even KDE4 seems to have helped libkcal which even in its newest incarnation ships with an internal copy of libical, causing me to have three copies of it installed in my system;

  • libvncserver comes with a copy of liblzo2; actually there are two, one in libvncserver and one in libvncclient; even the source files are duplicated!; bug #206941

  • SDL_sound, Wine and LAME seem to share some mp3 decoding code, which seems to come originally from mpg123;

  • cmake couldn’t stay out of this, it comes with a copy of libform (which is part of ncurses); follow bug #206920

  • I’m not sure what it is, but DigiKam, Numeric (for Python) and numpy have a few functions in common; the latter seems to have even more than that in common; bug #206931 per Numeric and numpy, and bug #206934 for DigiKam.

  • ghostscript comes with internal copies of zlib, libpng, jpeg and jasper; unfortunately jasper is also modified, for the other three there’s bug #206893; by the way, the copies are present in both the gs command and in the libgs library;

  • OpenOffice comes with loads of duplicated libraries; in particular, it comes with its own copy of icu libraries; see on bug #206889

  • TiMidity++ comes with a copy of libmikmod; bug #206943

  • Korundum for KDE3 has a copy of qtruby embedded, somehow; I wonder if it isn’t a fluke of our buildsystem; bug #206936

  • gdb contains an internal copy of readline; –bug #206947

  • tork contains a copy of some functions coming from readline; bug #206953

  • KTorrent contains a copy of GeoIP (and to think I removed the one in TorK as soon as I’ve spotted it); bug 206957

  • both ruby and php use an internal copy of – I think – oniguruma; I haven’t looked if it’s possible to add that as a system library and then use it; bug #206963

  • MPlayer seems to carry a copy of libogg together with tremor support; bug #206965

  • pkg-config ships with an internal copy of glib; bug #206966

  • tor has an internal copy of libevent’s async dns support; funny, as it links to libevent; bug #206969

  • gettext fails to find the system copy of libxml2, falling back to use the internal copy; at least it has the decency of using a proper commodity library; bug #207018

  • both Perl and Ruby have a default extension based on SDBM, a NDBM workalike; there seems not to be a shared version of it, so they just build the single source file in their own extensions directly, without hiding the symbols; beside the code re-use not being available, if a process loads both libperl and libruby, and in turn they load their sdbm extension, stuff’s gonna hurt;

  • enchant has an internal copy of Hunspell; probably due to the fact that old Hunspell built only static non-PIC libraries, and enchant uses plugins; bug #207025; upstream fixed this in their Subversion repository already;

  • gnome-vfs contains an internal copy of neon; funny as it depends on neon already, in the ebuild; bug #207031

  • KOffice’s Karbon contains an internal copy of gdk-pixbuf; bug #209561;

  • kdegraphics’s KViewShell contains an internal copy of djvulibre; bug #209565;

  • doxygen contains internal copies of zlib and libpng; bug #210237 ; this time I used a different method to identify it as doxygen does not export the symbols;

  • rsync contains an internal copy of zlib; bug #210244 ;

Unfortunately making sure that what I’m reading is true data and not false positive, looking at the output of my script, becomes more difficult now for the presence of multiple Sun JDK versions; I have to add support for alternatives, so that different libraries implementing the same interface don’t show up as colliding (they are that way by design).

Being driven crazy by lilypond

When I updated Rosegarden in the Portage Tree, I started looking at lilypond too; this because Rosegarden can make use of it for a few optional features (which were bound to an useflag in the proaudio overlay, although I preferred to drop those useflags as they are optional runtime dependencies, and Rosegarden is pretty explicit that you need those tools to enable special features anyway).

Unfortunately the first thing that stopped me from merging lilypond was a cyclic dependency of two different fontconfig versions, due to bug 178629. I’ve decided to look into that as soon as I could, and so I did in the past days.

Even if Ed’s patch actually fixes the building with recent fontconfig, I don’t seem to be able to get lilypond working. When I try to build anything with lilypond, I get a failure message from ghostscript, both the GPL and the ESP package. I asked HombreMagique in #gentoo-it to provide me a package of lilypond built on his system (with the fontforge it actually requires in portage) and the result is the same, I can’t produce the PDF file, nor disply the PS file itself.

I also tried producing an SVG, but GwenView does not display anything beside the title, and rsvg produces a PNG with a lot of Japanese glyphs. Producing a PNG directly from lilypond seems to involve ghostscript again and I have no result.

The TeX backend produces an empty .tex file.

Trying lilypond 2.11 doesn’t help, it returns me even more errors, and I can’t even choose the backend anymore (-b does not work).

The obnoxious thing is that it seems to work for everybody I ask to try it! It just doesn’t want to work for me!

Update: seems like the problem is guile built with -ftree-vectorize, thanks hkBst for pointing me at that! :)