Late last year, I have had a bit of a Twitter discussion on the fact that I can’t think of a good reason why Ghostscript is still a standard part of the Free Software desktop environment. The comments started from a security issue related to file-access from within a PostScript program (i.e. a .ps
file), and at the time I actually started drafting some of the content that is becoming this post now. I then shelved most of it because I’ve been busy and it was not topical.
Then Tavis had to bring this back to the attention of the public, and so I’m back writing this.
To be able to answer the question I pose in the title we have to first define what Ghostscript is — and the short answer is, a PostScript renderer. Of course it’s a lot more than just that, but for the most part, that’s what it is. It deals with PostScript programs (or documents, if you prefer), and renders them into different formats. PostScript is rarely if at all use in modern desktops — not just because it’s overly complicated, but because it’s just not that useful in a world that mostly settled in PDF, which is essentially a “compiled PostScript”.
Okay not quite. There are plenty of qualifications that go around that whole paragraph, but I think it matches the practicalities of the case fairly well.
PostScript has found a number of interesting niche uses though, a lot of which focus around printing, because PostScript is the language that older (early?) printers used. I have not seen any modern printers speak PostScript though, at least after my Kyocera FS-1020, and even those who do, tend to support alternative “languages” and raster formats. On the other hand, because PostScript was a “lingua franca” for printers, CUPS and other printer-related tooling still use PostScript as an intermediate language.
In a similar fashion, quite a few software that deal with faxes (yes, faxes), tend to make use of Ghostscript itself. I would know because I wrote one, under contract, a long time ago. The reason is frankly pragmatic: if you’re on the client side, you want Windows to “print to fax”, and having a virtual PostScript printer is very easy — at that point you want to convert the document into something that can be easily shoved down the fax software throat, which ends up being TIFF (because TIFF is, as I understand it, the closest encoding to the physical faxes). And Ghostscript is very good at doing that.
Indeed, I have used (and seen used) Ghostscript in many cases to basically combine a bunch of images into a single document, usually in TIFF or PDF format. It’s very good at doing that, if you know how to use it, or you copy-paste from other people’s implementation.
Often, this is done through the command line, too, the reason for which is to be found in the licenses used by various Ghostscript implementations and versions over time. Indeed, while many people think of Ghostscript as an open source Swiss Army Knife of document processing, it actually is dual-licensed. The Wikipedia page for the project shows eight variant, with at least four different licenses over time. The current options are AGPLv3 or the commercial paid-for license — and I can tell you that a lot of people (including the folks I worked under contract for), don’t really want to pay for that license, preferring instead the “arms’ length” aggregation of calling the binary rather than linking it in. Indeed, I wrote a .NET Library to do just that. It’s optimized for (you guessed it right) TIFF files, because it was a component of an Internet Fax implementation.
So where does this leave us?
Back ten years ago or so, when effectively every Free Software desktop PDF viewer was effectively forking the XPDF source code to adapt it to whatever rendering engine they needed, it took a significant list of vulnerabilities that needed to be fixed time and time again for the Poppler project to take off, and create One PDF Rendering To Rule Them All. I think we need the same for Ghostscript. With a few differences.
The first difference is that I think we need to take a good look at what Ghostscript, and Postscript, are useful for in today’s desktops. Combining multiple images in a single document should _not_ require processing all the way to PostScript. There’s no reason to! Particularly not when the images are just JPEG files, and PDF can embed them directly. Having a tool that is good at combining multiple images into a PDF, with decent options for page size and alignment, would probably replace many of the usages of Ghostscript that I had in my own tools and scripts over the past few years.
And while rendering PostScript for either display or print are similar enough tasks, I have some doubt the same code would work right for both. PostScript and Ghostscript are often used in networked printing as well. In which case there’s a lot of processing of untrusted input — both for display and printing. Sandboxing – and possibly writing this in a language better suited to deal with untrusted input than C is – would go a long way to prevent problems there.
But there are a few other interesting topics that I want to point out on this. I can’t think of any good reason for desktops to support PostScript out of the box in 2019. While I can still think of a lot of tools, particularly from the old timers, that use PostScript as an intermediate format, most people _in the world_ would use PDF nowadays to share documents, not PostScript. It’s kind of like sharing DVI files — which I have done before, but I now wonder why. While both formats might have advantages over PDF, in 2019 they definitely lost the format war. macOS might still support both (I don’t know), but Windows and Android definitely don’t, which make them pretty useless to share knowledge with the world.
What I mean with that is that it’s probably due time that PostScript becomes an optional component of the Free Software Desktop, one that the users need to enable explicitly if they ever need it, just to limit the risks that accepting, displaying and thumbnailing full, Turing-complete programs masqueraded as documents. Even Microsoft stopped running macros in Office documents by default, when they realize the type of footgun it had become.
Of course talk is cheap, and I should probably try to help directly myself. Unfortunately I don’t have much experience with graphics formats, beside for maintaining unpaper, and that is not a particularly good result either: I tried using libav’s image loading, and it turns out it’s actually a mess. So I guess I should either invest my time in learning enough about building libraries for image processing, or poke around to see if someone wrote a good multi-format image processing library in, say, Rust.
Alternatively, if someone starts to work on this and want to have some help with either reviewing the code, or with integrating the final output in places where Ghostscript is used, I’m happy to volunteer my time. I’m fairly sure I can convince my manager to let me do some of that work as a 20% project.
(disclaimer, this is from memory, not from double-checking authoritative sources)
TIFF is more a “wrapper for N graphics encodings” than it is a graphics encoding. I believe one of those is a gray-scale sequence of pixels, line by line, going from left to wight, which is pretty close to the on-the-wire fax encoding scheme. I’ve also seen two programs both emit TIFFs that can both be read by a third program, but neither can be read by the other. So, yeah…
All that in mind, I don’t think PostScript should be necessary for a desktop and (if it is), it can most probably be sandboxed to “unable to open files or network connections” (yes, I have seen a web server implemented in PostScript, it was delightful and horrible, althouhg perhaps not as horrible as a lisp interpreter in BCL).