This Time Self-Hosted
dark mode light mode Search

Reinventing lots of little wheels

You might remember my quick review of the Secure Programming with Static Analysis book. While on the overall I was expecting a much more practical view on how to maximise the gain from static analysis (like how to make sure that trying to get rid of false positive does not end up cluttering both the source and the produced object code), it had some quite important insights that I think are worth the read, and the money of the book.

One of these insight is an explanation on why Microsoft’s “secure” interfaces differ from the standard POSIX ones. Having a status code returned, to check whether the action completed, failed, or completed-with-truncation, is definitely more useful than being returned one of the two pointers that are already provided as input. Similarly, the book shows some of the “secure wrappers” commonly used for replacing inherently insecure functions such as readlink().

Now, on the whole, this is all good, but I noticed one thing while following the libvirt development mailing lists: people end up reinventing tons of little wheels all around. While I like the idea behind gnulib, and I even wrote an article about its use a long time ago, it starts to show a couple of shortcomings in my view. The first is that the same source code has to be bundled to a number of projects; while it’s usually ignored for the most part on modern systems that have the functions available, it’s still source code that is shipped around multiple times and that might have nasty problems. The second problem is that both on modern systems (when wrappers are involved) and on less-modern systems (or systems that comply with older versions of the various standards, such as Solaris, or AIX) the same object code is added to multiple binaries, instead of shared among them, increasing both the on-disk and in-memory sizes. It also adds the burden of verification, and replacement, of interface to the single programs rather than centralising it in a project.

Why bother, given that then you might as well just port a subset of the GNU C library (or just use a ported uClibc), and at that point you might as well not use that operating system at all? Well one of the problems with the current approach is felt even by users running Linux, Gentoo users in primis as they feel the slowness of running ./configure and having to check for the same features every time (compare this old post of mine — the best way to make a configure script faster is to reduce the number of tests it has to perform!). Shouldn’t it be enough to assume that the interfaces are present, and leave it to the user to provide a replacement library if they are not?

This is after all the favourite approach of the FFmpeg project: if POSIX or C99 mandates the presence of an interface, then FFmpeg can use it; if it’s not available, it’s up to the user/developer/packager to provide the proper flags, include paths, extra libraries to have them available. Non-standard compiler features used are a different matter, of course.

But even if this would solve the problem by having some sort of libgnucompat or libposixcompliant library to deal with other operating systems it does not solve another problem that I’ve noticed applies to libvirt: reinventing wrappers, be them security-wrappers or not. Indeed if you look at the symbols exported by libvirt.so, you’ll easily see that there are sixteen functions with virFile prefix that seems to be just convenience and security wrappers around common file operations. This reduces the amount of boilerplate code that libvirt developers have to write each time they have to use that particular feature, but then you think that similar code is written by many other projects as well to deal with the same situation; this is where convenience libraries come into being, stuff like glib, for instance.

Unfortunately, since there’s more than one way to skin a cat, there is no drought of convenience libraries, even conflicting convenience libraries, out there. And nobody seems to agree on what’s the right way to do them (for instance, I can actually appreciate very well the hatred on glib’s use of g-prefixed basic types , such as guint8 and gpointer rather than keeping with the standard types that are available in C99 such as uint8_t. While these are not always available, it’d make much more sense to make those available rather than inventing your own, no? But let’s not keep on that topic for now.

Some of the most widely common wrappers are also getting slowly into the C libraries and the actual standards, although sometimes with not-too-bright results (the getline() function really could have used a nicer, less un-specific name), and other times with huge feuds between implementers (anybody has seen strl*() functions on POSIX yet? or glibc?).

With all the defects in it as well as the other autotools, libtool has probably done one of the best wrappers out there: libltdl. With all its possible problems (and there are many), that library is well designed enough to be usable in at least three widely different configurations — as described — including the ability to bundle a copy of the library but still use the system copy if so asked (or even by default). Too bad this does not seem to happen with any other kind of wrappers’ library.

This seems to be the opposite of the system when compared with the situation happening within the Ruby community; maybe because creating and publishing a gem is so easy (especially much easier compared to the standard track of release publishing for C-based libraries and packages — or any other language that is compiled, mostly), we have a huge number of “code fragments” gems, that provide one or two source files, with either a couple or classes or a handful of useful functions that are then reused on multiple packages by the same author. Not that the Ruby way here is perfect (but it surely is better than other Ruby ways I ranted on about before), and one of the biggest problems is that many time you have multiple gem solving the same problem once and again, like for testing systems.

I don’t hold much hope that developers can sit along and decide to write on a single implementation of anything, but it sure would be so nice if it happened. You’d then have the same code shared among all processes, with no duplication, with a lot of eyes to look at the possible faults and solving them, and so on so forth. Yes it’s definitely an utopian point of view. Alas.

Comments 2
  1. I think you need to point the finger at something else as well: the C programming language. I’m not aware of any modern programming language which requires such a large number of workarounds, is so prone to catastrophic security errors, portability problems etc. (Except C++, but that inherits the problems of C and adds its own stupidity on top).It constantly mystifies me why there is no reasonable, modern, fast replacement for C that is widely used. If you think it should be possible to do 10x better than C, and there are languages out there which are much better, why do we not use them and abandon C?

  2. Agreed… but to be honest there are _many_ replacements for C, the problem is that they tend to have a multitude of problems, if we count compiled, non-managed, non-interpreted code I know only three alternatives, at least supporting direct access to the huge pool of C libraries (which is for most people _the_ requirement): – <notextile>C++</notextile> as you said adds its own stupidity; and since it does not really restrict the use of C++ you have you end up with the same problems and more, ugh! – trying to evolve C/<notextile>C++</notextile> further we get things like D, unfortunately the quite still proprietary native of that language made it quite unusable on the long run; – similarly, Apple is high on ObjectiveC; I haven’t tried it much, even though I know GCC has some basic support, but most of the new syntax and features seem to be available only as Apple proprietary extensions rather (ObjectiveC 2.0).There are then a few other sector-specific languages, but are either compiled to bytecode and managed by a VM or re-compiled: – Sun’s Java, which tries to learn from the mistake of <notextile>C++</notextile> but adds further problems especially with performance on the non-supported architectures and operating systems; – Microsoft’s/ECMA’s C#, which I sincerely like as a language, as it feels like <notextile>C++</notextile> done the right way, which still has trouble with the royalty-free patent licensing, and still has trouble with the overhead due to compiled code; – GNOME’s Vala is something I want to look into myself; re-compiling into C should allow enough flexibility to use the huge pool of ready-made libraries and the ability to use GObject to introspect the libraries directly is definitely useful.Unfortunately in all these cases, you still need to do something to handle the current libraries, and sometimes it even gets easier to line-by-line port a library from language to another (think of TagLib#) rather than accessing an already-existing implementation. And since you’re much *much* higher on the abstraction level, you’re mostly unable to efficiently handle things such as video decoding (so you end up wrapping FFmpeg, and that’s not always as easy as said).I think the point here is that you have too much content available for C, and so much of that is solid enough not to introduce further security problems, and thus you can only replace it for something that still allows you to mix in C libraries, and give you enough low-level access as C to do things on the lower level (assembler is one example, but even being able to manage the endianness of data structure is important in the network and multimedia domains; and being able to map on-disk – or on-network – data structures directly to memory rather than having to read-and-store is crucial performance-wise for so many uses).And let’s not even start with the fact that there is no real stable way to find out the correct ABI used by C libraries, making the FFI approach not work properly most of the time.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.