Plugins aren’t always a good choice

I’ve been saying this for quite a while, probably one of the most on-topic post has been written a few months ago but there are some indications about it in posts about xine and other again.

I used to be an enthusiast about plugin interfaces; with time, though, I started having more and more doubts about their actual usefulness — it’s a tract I really much like in myself, I’m fine with reconsidering my own positions over time, deciding that I was wrong; it happened before with other things, like KDE (and C++ in general).

It’s not like I’m totally against the use of plugins altogether. I only think that they are expensive in more ways than one, and that their usefulness is often overstated, or tied to other kind of artificial limitations. For instance, dividing a software’s features over multiple plugins makes it easier for the binary distributions to package them, usually: they only have to ship a package with the main body of the software, and many for the plugins (one per plugin might actually be too much so sometimes they might be grouped). This works out pretty well for both the distribution and, usually, the user: the plugins that are not installed will not bring in extra dependencies, they won’t take time to load and they won’t use memory for either code nor data. It basically allows binary distribution to have a flexibility to compare with Gentoo’s USE flags (and similar options in almost any other source-based distribution).

But as I said this comes with costs, that might or might not be worth it in general. For instance, Luca wanted to implement plugins for feng similarly to what Apache and lighttpd have. I can understand his point: let’s not load code for the stuff we don’t have to deal with, which is more or less the same reason why Apache and lighttpd have modules; in the case of feng, if you don’t care about access log, why should you be loading the access load support at all? I can give you a couple of reasons:

  • because the complexity of managing a plugin to deal with the access log (or any other similar task) is higher than just having a piece of static code that handles that;
  • because the overhead of having a plugin loaded just to do that is higher than that of having the static code built in and not enabled into configuration.

The first problem is a result of the way a plugin interface is built: the main body of the software cannot know about its plugins in too specific ways. If the interface is a very generic plugin interface, you add some “hook locations” and then it’s the plugin’s task to find how to do its magic, not the software’s. There are some exceptions to this rule: if you have a plugin interface for handling protocols, like the KIO interface (and I think gvfs has the same) you get the protocol from the URL and call the correct plugin, but even then you’re leaving it to the plugin to deal with doing its magic. You can provide a way for the plugin to tell the main body what it needs and what it can do (like which functions it implements) but even that requires the plugins to be quite autonomous. And that means also being able to take care of allocating and freeing the resources as needed.

The second problem is not only tied to the cost of calling the dynamic linker dynamically to load the plugin and its eventual dependencies (which is a non-trivial amount of work, one has to say), also by the need for having code that deals with finding the modules to load, the loading of those modules, their initialisation, keeping a list of modules to call at any given interface point, and two more points: the PIC problem and the problem of less-than-page-sized segments. This last problem is often ignored, but it’s my main reason to dislike plugins when they are not warranted for other reasons. Given a page size of 4KiB (which is the norm on Linux for what I know), if the code is smaller than that size, it’ll still require a full page (it won’t pack with the rest of the software’s code areas); but at least code is disk-backed (if it’s PIC, of course), it’s worse for what concerns variable data, or variable relocated data, since those are not disk-backed, and it’s not rare that you’d be using a whole page for something like 100 bytes of actual variables.

In the case of the access log module that Luca wrote for feng, the statistics are as such:

flame@yamato feng % size modules/.libs/
   text    data     bss     dec     hex filename
   4792     704      16    5512    1588 modules/.libs/

Which results in two pages (8KiB) for bss and data segments, neither disk-backed, and two disk-backed pages for the executable code (text): 16KiB of addressable memory for a mapping that does not reach 6KiB, it’s a 10KiB overhead, which is much higher than 50%. And that’s the memory overhead alone. The whole overhead, as you might guess at this point, is usually within 12KiB (since you got three segments, and each can have at most one byte less than page size as overhead — it’s actually more complex than this but let’s assume this is true).

It really doesn’t sound like a huge overhead by itself, but you have to always judge it compared to the size of the plugin itself. In the case of feng’s access log, you got a very young plugin that lacks a lot of functionality, so one might say that with the time it’ll be worth it… so I’d like to show you the size statistics for the Apache modules on the very server my blog is hosted. Before doing so, though, I have to remind you one huge difference: feng is built with most optimisations turned off, while Apache is built optimised for size; they are both AMD64 though so the comparison is quite easy.

flame@vanguard ~ $ size /usr/lib64/apache2/modules/*.so | sort -n -k 4
   text    data     bss     dec     hex filename
   2529     792      16    3337     d09 /usr/lib64/apache2/modules/
   2960     808      16    3784     ec8 /usr/lib64/apache2/modules/
   3499     856      16    4371    1113 /usr/lib64/apache2/modules/
   3617     912      16    4545    11c1 /usr/lib64/apache2/modules/
   3773     808      24    4605    11fd /usr/lib64/apache2/modules/
   4035     888      16    4939    134b /usr/lib64/apache2/modules/
   4161     752      80    4993    1381 /usr/lib64/apache2/modules/
   4136     888      16    5040    13b0 /usr/lib64/apache2/modules/
   5129     952      24    6105    17d9 /usr/lib64/apache2/modules/
   6589    1056      16    7661    1ded /usr/lib64/apache2/modules/
   6826    1024      16    7866    1eba /usr/lib64/apache2/modules/
   7367    1040      16    8423    20e7 /usr/lib64/apache2/modules/
   7519    1064      16    8599    2197 /usr/lib64/apache2/modules/
   8583    1240      16    9839    266f /usr/lib64/apache2/modules/
  11006    1168      16   12190    2f9e /usr/lib64/apache2/modules/
  12269    1184      32   13485    34ad /usr/lib64/apache2/modules/
  12521    1672      24   14217    3789 /usr/lib64/apache2/modules/
  15935    1312      16   17263    436f /usr/lib64/apache2/modules/
  18150    1392     224   19766    4d36 /usr/lib64/apache2/modules/
  18358    2040      16   20414    4fbe /usr/lib64/apache2/modules/
  18996    1544      48   20588    506c /usr/lib64/apache2/modules/
  20406    1592      32   22030    560e /usr/lib64/apache2/modules/
  22593    1504     152   24249    5eb9 /usr/lib64/apache2/modules/
  26494    1376      16   27886    6cee /usr/lib64/apache2/modules/
  27576    1800      64   29440    7300 /usr/lib64/apache2/modules/
  54299    2096      80   56475    dc9b /usr/lib64/apache2/modules/
 268867   13152      80  282099   44df3 /usr/lib64/apache2/modules/
 288868   11520     280  300668   4967c /usr/lib64/apache2/modules/

The list is ordered for size of the whole plugin (summed up, not counting padding); the last three positions are definitely unsurprisingly, although it surprises me the sheer size of the two that are not part of Apache itself (and I start to wonder whether they link something in statically that I missed). The fact that the rewrite module was likely the most complex plugin in Apache’s distribution never left me.

As you can see, almost all plugins have vast overhead especially for what concerns the bss segment (all of them have at least 16 bytes used, and that warrants a whole page for them: 4080 bytes wasted each); the data segment is also interesting: only the two external ones have more than a page worth of variables (which also is suspicious to me). When all the plugins are loaded (like they most likely are right now as well on my server) there are at least 100KiB of overhead; just for the sheer fact that these are plugins and thus have their own address space. Might not sound like a lot of overhead indeed, since Apache is requesting so much memory already, especially with Passenger running, but it definitely doesn’t sound like a good thing for embedded systems.

Now I have no doubt that a lot of people like the fact that Apache has all of those as plugins as they can then use the same Apache build across different configurations without risking to have in memory more code and data than it’s actually needed, but is that right? While it’s obvious that it would be impossible to drop the plugin interface from Apache (since it’s used by third-party developers, more on that later), I would be glad if it was possible to build in the modules that come with Apache (given I can already choose which ones to build or not in Gentoo). Of course I also am using Apache with two configurations, and for instance the other one does not use the authentication system for anything, and this one is not using CGI, but is the overhead caused by the rest of modules worth the hassle, given that Apache already has a way to not initialise the unused built-ins?

I named above “third party developers” but I have to say now that it wasn’t really a proper definition, since it’s not just what third parties would do, it might very well be the original developers who might want to make use of plugins to develop separate projects for some (complex) features, and have different release handling altogether. For uses like that, the cost of plugins is often justifiable; and I am definitely not against having a plugin interface in feng. My main beef is when the plugins are created for functions that are part of the basic featureset of a software.

Another unfortunately not uncommon problem with plugins is that the interface might be skewed by bad design, like the case was (and is) for xine: when trying to open a file, it has to pass through all the plugins, so it loads all of them into memory, together with the libraries they depend on, to ask each of them to test the current file; since plugins cannot really be properly unloaded (and it’s not just a xine limitation) the memory will still be used, the libraries will still be mapped into memory (and relocated, causing copy on write, and thus, more memory) and at least half the point of using plugins has gone away (the ability to only load the code that is actually going to be used). Of course you’re left with the chance that an ABI break does not kill the whole program, but just the plugin, but that’s a very little advantage, given the cost involved in plugins handling. And the way xine was designed, it was definitely impossible to have third-party plugins developed properly.

And to finish off, I said before that plugins cannot be cleanly unloaded: the problem is not only that it’s difficult to have proper cleanup functions for plugins themselves (since often the allocated resources are stored within state variables), but also because some libraries (used as dependency) have no cleanup altogether, and they rely (erroneously) on the fact that they won’t be unloaded. And even when they know they could be unloaded, the PulseAudio libraries, for instance, have to remain loaded because there is no proper way to clean up Thread-Local Storage variables (and a re-load would be quite a problem). Which drives away another point of using plugins.

I leave the rest to you.