Reinventing lots of little wheels

You might remember my quick review of the Secure Programming with Static Analysis book. While on the overall I was expecting a much more practical view on how to maximise the gain from static analysis (like how to make sure that trying to get rid of false positive does not end up cluttering both the source and the produced object code), it had some quite important insights that I think are worth the read, and the money of the book.

One of these insight is an explanation on why Microsoft’s “secure” interfaces differ from the standard POSIX ones. Having a status code returned, to check whether the action completed, failed, or completed-with-truncation, is definitely more useful than being returned one of the two pointers that are already provided as input. Similarly, the book shows some of the “secure wrappers” commonly used for replacing inherently insecure functions such as readlink().

Now, on the whole, this is all good, but I noticed one thing while following the libvirt development mailing lists: people end up reinventing tons of little wheels all around. While I like the idea behind gnulib, and I even wrote an article about its use a long time ago, it starts to show a couple of shortcomings in my view. The first is that the same source code has to be bundled to a number of projects; while it’s usually ignored for the most part on modern systems that have the functions available, it’s still source code that is shipped around multiple times and that might have nasty problems. The second problem is that both on modern systems (when wrappers are involved) and on less-modern systems (or systems that comply with older versions of the various standards, such as Solaris, or AIX) the same object code is added to multiple binaries, instead of shared among them, increasing both the on-disk and in-memory sizes. It also adds the burden of verification, and replacement, of interface to the single programs rather than centralising it in a project.

Why bother, given that then you might as well just port a subset of the GNU C library (or just use a ported uClibc), and at that point you might as well not use that operating system at all? Well one of the problems with the current approach is felt even by users running Linux, Gentoo users in primis as they feel the slowness of running ./configure and having to check for the same features every time (compare this old post of mine — the best way to make a configure script faster is to reduce the number of tests it has to perform!). Shouldn’t it be enough to assume that the interfaces are present, and leave it to the user to provide a replacement library if they are not?

This is after all the favourite approach of the FFmpeg project: if POSIX or C99 mandates the presence of an interface, then FFmpeg can use it; if it’s not available, it’s up to the user/developer/packager to provide the proper flags, include paths, extra libraries to have them available. Non-standard compiler features used are a different matter, of course.

But even if this would solve the problem by having some sort of libgnucompat or libposixcompliant library to deal with other operating systems it does not solve another problem that I’ve noticed applies to libvirt: reinventing wrappers, be them security-wrappers or not. Indeed if you look at the symbols exported by libvirt.so, you’ll easily see that there are sixteen functions with virFile prefix that seems to be just convenience and security wrappers around common file operations. This reduces the amount of boilerplate code that libvirt developers have to write each time they have to use that particular feature, but then you think that similar code is written by many other projects as well to deal with the same situation; this is where convenience libraries come into being, stuff like glib, for instance.

Unfortunately, since there’s more than one way to skin a cat, there is no drought of convenience libraries, even conflicting convenience libraries, out there. And nobody seems to agree on what’s the right way to do them (for instance, I can actually appreciate very well the hatred on glib’s use of g-prefixed basic types , such as guint8 and gpointer rather than keeping with the standard types that are available in C99 such as uint8_t. While these are not always available, it’d make much more sense to make those available rather than inventing your own, no? But let’s not keep on that topic for now.

Some of the most widely common wrappers are also getting slowly into the C libraries and the actual standards, although sometimes with not-too-bright results (the getline() function really could have used a nicer, less un-specific name), and other times with huge feuds between implementers (anybody has seen strl*() functions on POSIX yet? or glibc?).

With all the defects in it as well as the other autotools, libtool has probably done one of the best wrappers out there: libltdl. With all its possible problems (and there are many), that library is well designed enough to be usable in at least three widely different configurations — as described — including the ability to bundle a copy of the library but still use the system copy if so asked (or even by default). Too bad this does not seem to happen with any other kind of wrappers’ library.

This seems to be the opposite of the system when compared with the situation happening within the Ruby community; maybe because creating and publishing a gem is so easy (especially much easier compared to the standard track of release publishing for C-based libraries and packages — or any other language that is compiled, mostly), we have a huge number of “code fragments” gems, that provide one or two source files, with either a couple or classes or a handful of useful functions that are then reused on multiple packages by the same author. Not that the Ruby way here is perfect (but it surely is better than other Ruby ways I ranted on about before), and one of the biggest problems is that many time you have multiple gem solving the same problem once and again, like for testing systems.

I don’t hold much hope that developers can sit along and decide to write on a single implementation of anything, but it sure would be so nice if it happened. You’d then have the same code shared among all processes, with no duplication, with a lot of eyes to look at the possible faults and solving them, and so on so forth. Yes it’s definitely an utopian point of view. Alas.