Why do we still use Ghostscript?

Late last year, I have had a bit of a Twitter discussion on the fact that I can’t think of a good reason why Ghostscript is still a standard part of the Free Software desktop environment. The comments started from a security issue related to file-access from within a PostScript program (i.e. a .ps file), and at the time I actually started drafting some of the content that is becoming this post now. I then shelved most of it because I’ve been busy and it was not topical.

Then Tavis had to bring this back to the attention of the public, and so I’m back writing this.

To be able to answer the question I pose in the title we have to first define what Ghostscript is — and the short answer is, a PostScript renderer. Of course it’s a lot more than just that, but for the most part, that’s what it is. It deals with PostScript programs (or documents, if you prefer), and renders them into different formats. PostScript is rarely if at all use in modern desktops — not just because it’s overly complicated, but because it’s just not that useful in a world that mostly settled in PDF, which is essentially a “compiled PostScript”.

Okay not quite. There are plenty of qualifications that go around that whole paragraph, but I think it matches the practicalities of the case fairly well.

PostScript has found a number of interesting niche uses though, a lot of which focus around printing, because PostScript is the language that older (early?) printers used. I have not seen any modern printers speak PostScript though, at least after my Kyocera FS-1020, and even those who do, tend to support alternative “languages” and raster formats. On the other hand, because PostScript was a “lingua franca” for printers, CUPS and other printer-related tooling still use PostScript as an intermediate language.

In a similar fashion, quite a few software that deal with faxes (yes, faxes), tend to make use of Ghostscript itself. I would know because I wrote one, under contract, a long time ago. The reason is frankly pragmatic: if you’re on the client side, you want Windows to “print to fax”, and having a virtual PostScript printer is very easy — at that point you want to convert the document into something that can be easily shoved down the fax software throat, which ends up being TIFF (because TIFF is, as I understand it, the closest encoding to the physical faxes). And Ghostscript is very good at doing that.

Indeed, I have used (and seen used) Ghostscript in many cases to basically combine a bunch of images into a single document, usually in TIFF or PDF format. It’s very good at doing that, if you know how to use it, or you copy-paste from other people’s implementation.

Often, this is done through the command line, too, the reason for which is to be found in the licenses used by various Ghostscript implementations and versions over time. Indeed, while many people think of Ghostscript as an open source Swiss Army Knife of document processing, it actually is dual-licensed. The Wikipedia page for the project shows eight variant, with at least four different licenses over time. The current options are AGPLv3 or the commercial paid-for license — and I can tell you that a lot of people (including the folks I worked under contract for), don’t really want to pay for that license, preferring instead the “arms’ length” aggregation of calling the binary rather than linking it in. Indeed, I wrote a .NET Library to do just that. It’s optimized for (you guessed it right) TIFF files, because it was a component of an Internet Fax implementation.

So where does this leave us?

Back ten years ago or so, when effectively every Free Software desktop PDF viewer was effectively forking the XPDF source code to adapt it to whatever rendering engine they needed, it took a significant list of vulnerabilities that needed to be fixed time and time again for the Poppler project to take off, and create One PDF Rendering To Rule Them All. I think we need the same for Ghostscript. With a few differences.

The first difference is that I think we need to take a good look at what Ghostscript, and Postscript, are useful for in today’s desktops. Combining multiple images in a single document should _not_ require processing all the way to PostScript. There’s no reason to! Particularly not when the images are just JPEG files, and PDF can embed them directly. Having a tool that is good at combining multiple images into a PDF, with decent options for page size and alignment, would probably replace many of the usages of Ghostscript that I had in my own tools and scripts over the past few years.

And while rendering PostScript for either display or print are similar enough tasks, I have some doubt the same code would work right for both. PostScript and Ghostscript are often used in _networked_ printing as well. In which case there’s a lot of processing of untrusted input — both for display and printing. Sandboxing – and possibly writing this in a language better suited to deal with untrusted input than C is – would go a long way to prevent problems there.

But there are a few other interesting topics that I want to point out on this. I can’t think of any good reason for _desktops_ to support PostScript out of the box in 2019. While I can still think of a lot of tools, particularly from the old timers, that use PostScript as an intermediate format, most people _in the world_ would use PDF nowadays to share documents, not PostScript. It’s kind of like sharing DVI files — which I have done before, but I now wonder why. While both formats might have advantages over PDF, in 2019 they definitely lost the format war. macOs might still support both (I don’t know), but Windows and Android definitely don’t, which make them pretty useless to share knowledge with the world.

What I mean with that is that it’s probably due time that PostScript becomes an _optional_ component of the Free Software Desktop, one that the users need to enable explicitly _if they ever need it_, just to limit the risks that accepting, displaying and thumbnailing full, Turing-complete programs masqueraded as documents. Even Microsoft stopped running macros in Office documents by default, when they realize the type of footgun it had become.

Of course talk is cheap, and I should probably try to help directly myself. Unfortunately I don’t have much experience with graphics formats, beside for maintaining unpaper, and that is not a particularly good result either: I tried using libav’s image loading, and it turns out it’s actually a mess. So I guess I should either invest my time in learning enough about building libraries for image processing, or poke around to see if someone wrote a good multi-format image processing library in, say, Rust.

Alternatively, if someone starts to work on this and want to have some help with either reviewing the code, or with integrating the final output in places where Ghostscript is used, I’m happy to volunteer my time. I’m fairly sure I can convince my manager to let me do some of that work as a 20% project.

More about Reader and PDFs

Okay, thanks to Jeff who commented on my previous post I finally got the SD card working on Linux. If you ever have problems, enable CONFIG_SCSI_MULTI_LUN in the kernel. Tomorrow I’ll add a warning to the libprs500 ebuild if it’s unset.

Tonight I didn’t have much time, but I’ve seen that a 9 x 12 cm page is just the right setting for the reader. In LaTeX this produces a page that is perfect for reading on the Reader.

Unfortunately texinfo is not as easy as LaTeX, even if I set the size of the page, it only reduces the size of the text inside it and I can’t find how to reduce the actual page size in the PDF. While this is enough to use the zoom function of the Reader, you’ll have to repeat it for every page, and it gets boring. I’d very much like to crop the pdf file.

Unfortunately the only tool I found that can crop PDF files is ImageMagick’s convert. But convert acts on images, and that causes two problems: first it takes up a huge amount of memory (gs converts a 15MB PDF file into a 400MB PGM file, and back again); second it creates an image-only PDF file that, well, let’s just say that a 90×120 pixels (okay I got the wrong unit, it happens!) file is big 24MB, and I remember you I started from 15MB.

I was suggested by Pino (oKular developer) to try pdftk, but as far as I can see from the documentation available online, it does not allow me to crop the pages. I’ve now found an interesting script that would add cropping data to Postscript files; if ghostscript supports those, it would then allow me to convert the ps back to a cropped PDF. Tomorrow I’ll have to try.

And yes, tomorrow I ll see to provide a few more photos, of a book showing up on the Reader, both a standard PDF and an ad-hoc generated copy of “The Not So Short Guide to LaTeX2e” most likely.

On a totally different note, I was watching movie trailers on my Apple TV now and… yet another movie that makes computer capable of anything right now, “Untraceable”. And people complain that CSI is unrealistic.