Plugins aren’t always a good choice

I’ve been saying this for quite a while, probably one of the most on-topic post has been written a few months ago but there are some indications about it in posts about xine and other again.

I used to be an enthusiast about plugin interfaces; with time, though, I started having more and more doubts about their actual usefulness — it’s a tract I really much like in myself, I’m fine with reconsidering my own positions over time, deciding that I was wrong; it happened before with other things, like KDE (and C++ in general).

It’s not like I’m totally against the use of plugins altogether. I only think that they are expensive in more ways than one, and that their usefulness is often overstated, or tied to other kind of artificial limitations. For instance, dividing a software’s features over multiple plugins makes it easier for the binary distributions to package them, usually: they only have to ship a package with the main body of the software, and many for the plugins (one per plugin might actually be too much so sometimes they might be grouped). This works out pretty well for both the distribution and, usually, the user: the plugins that are not installed will not bring in extra dependencies, they won’t take time to load and they won’t use memory for either code nor data. It basically allows binary distribution to have a flexibility to compare with Gentoo’s USE flags (and similar options in almost any other source-based distribution).

But as I said this comes with costs, that might or might not be worth it in general. For instance, Luca wanted to implement plugins for feng similarly to what Apache and lighttpd have. I can understand his point: let’s not load code for the stuff we don’t have to deal with, which is more or less the same reason why Apache and lighttpd have modules; in the case of feng, if you don’t care about access log, why should you be loading the access load support at all? I can give you a couple of reasons:

  • because the complexity of managing a plugin to deal with the access log (or any other similar task) is higher than just having a piece of static code that handles that;
  • because the overhead of having a plugin loaded just to do that is higher than that of having the static code built in and not enabled into configuration.

The first problem is a result of the way a plugin interface is built: the main body of the software cannot know about its plugins in too specific ways. If the interface is a very generic plugin interface, you add some “hook locations” and then it’s the plugin’s task to find how to do its magic, not the software’s. There are some exceptions to this rule: if you have a plugin interface for handling protocols, like the KIO interface (and I think gvfs has the same) you get the protocol from the URL and call the correct plugin, but even then you’re leaving it to the plugin to deal with doing its magic. You can provide a way for the plugin to tell the main body what it needs and what it can do (like which functions it implements) but even that requires the plugins to be quite autonomous. And that means also being able to take care of allocating and freeing the resources as needed.

The second problem is not only tied to the cost of calling the dynamic linker dynamically to load the plugin and its eventual dependencies (which is a non-trivial amount of work, one has to say), also by the need for having code that deals with finding the modules to load, the loading of those modules, their initialisation, keeping a list of modules to call at any given interface point, and two more points: the PIC problem and the problem of less-than-page-sized segments. This last problem is often ignored, but it’s my main reason to dislike plugins when they are not warranted for other reasons. Given a page size of 4KiB (which is the norm on Linux for what I know), if the code is smaller than that size, it’ll still require a full page (it won’t pack with the rest of the software’s code areas); but at least code is disk-backed (if it’s PIC, of course), it’s worse for what concerns variable data, or variable relocated data, since those are not disk-backed, and it’s not rare that you’d be using a whole page for something like 100 bytes of actual variables.

In the case of the access log module that Luca wrote for feng, the statistics are as such:

flame@yamato feng % size modules/.libs/mod_accesslog.so
   text    data     bss     dec     hex filename
   4792     704      16    5512    1588 modules/.libs/mod_accesslog.so

Which results in two pages (8KiB) for bss and data segments, neither disk-backed, and two disk-backed pages for the executable code (text): 16KiB of addressable memory for a mapping that does not reach 6KiB, it’s a 10KiB overhead, which is much higher than 50%. And that’s the memory overhead alone. The whole overhead, as you might guess at this point, is usually within 12KiB (since you got three segments, and each can have at most one byte less than page size as overhead — it’s actually more complex than this but let’s assume this is true).

It really doesn’t sound like a huge overhead by itself, but you have to always judge it compared to the size of the plugin itself. In the case of feng’s access log, you got a very young plugin that lacks a lot of functionality, so one might say that with the time it’ll be worth it… so I’d like to show you the size statistics for the Apache modules on the very server my blog is hosted. Before doing so, though, I have to remind you one huge difference: feng is built with most optimisations turned off, while Apache is built optimised for size; they are both AMD64 though so the comparison is quite easy.

flame@vanguard ~ $ size /usr/lib64/apache2/modules/*.so | sort -n -k 4
   text    data     bss     dec     hex filename
   2529     792      16    3337     d09 /usr/lib64/apache2/modules/mod_authn_default.so
   2960     808      16    3784     ec8 /usr/lib64/apache2/modules/mod_authz_user.so
   3499     856      16    4371    1113 /usr/lib64/apache2/modules/mod_authn_file.so
   3617     912      16    4545    11c1 /usr/lib64/apache2/modules/mod_env.so
   3773     808      24    4605    11fd /usr/lib64/apache2/modules/mod_logio.so
   4035     888      16    4939    134b /usr/lib64/apache2/modules/mod_dir.so
   4161     752      80    4993    1381 /usr/lib64/apache2/modules/mod_unique_id.so
   4136     888      16    5040    13b0 /usr/lib64/apache2/modules/mod_actions.so
   5129     952      24    6105    17d9 /usr/lib64/apache2/modules/mod_authz_host.so
   6589    1056      16    7661    1ded /usr/lib64/apache2/modules/mod_file_cache.so
   6826    1024      16    7866    1eba /usr/lib64/apache2/modules/mod_expires.so
   7367    1040      16    8423    20e7 /usr/lib64/apache2/modules/mod_setenvif.so
   7519    1064      16    8599    2197 /usr/lib64/apache2/modules/mod_speling.so
   8583    1240      16    9839    266f /usr/lib64/apache2/modules/mod_alias.so
  11006    1168      16   12190    2f9e /usr/lib64/apache2/modules/mod_filter.so
  12269    1184      32   13485    34ad /usr/lib64/apache2/modules/mod_headers.so
  12521    1672      24   14217    3789 /usr/lib64/apache2/modules/mod_mime.so
  15935    1312      16   17263    436f /usr/lib64/apache2/modules/mod_deflate.so
  18150    1392     224   19766    4d36 /usr/lib64/apache2/modules/mod_log_config.so
  18358    2040      16   20414    4fbe /usr/lib64/apache2/modules/mod_mime_magic.so
  18996    1544      48   20588    506c /usr/lib64/apache2/modules/mod_cgi.so
  20406    1592      32   22030    560e /usr/lib64/apache2/modules/mod_mem_cache.so
  22593    1504     152   24249    5eb9 /usr/lib64/apache2/modules/mod_auth_digest.so
  26494    1376      16   27886    6cee /usr/lib64/apache2/modules/mod_negotiation.so
  27576    1800      64   29440    7300 /usr/lib64/apache2/modules/mod_cache.so
  54299    2096      80   56475    dc9b /usr/lib64/apache2/modules/mod_rewrite.so
 268867   13152      80  282099   44df3 /usr/lib64/apache2/modules/mod_security2.so
 288868   11520     280  300668   4967c /usr/lib64/apache2/modules/mod_passenger.so

The list is ordered for size of the whole plugin (summed up, not counting padding); the last three positions are definitely unsurprisingly, although it surprises me the sheer size of the two that are not part of Apache itself (and I start to wonder whether they link something in statically that I missed). The fact that the rewrite module was likely the most complex plugin in Apache’s distribution never left me.

As you can see, almost all plugins have vast overhead especially for what concerns the bss segment (all of them have at least 16 bytes used, and that warrants a whole page for them: 4080 bytes wasted each); the data segment is also interesting: only the two external ones have more than a page worth of variables (which also is suspicious to me). When all the plugins are loaded (like they most likely are right now as well on my server) there are at least 100KiB of overhead; just for the sheer fact that these are plugins and thus have their own address space. Might not sound like a lot of overhead indeed, since Apache is requesting so much memory already, especially with Passenger running, but it definitely doesn’t sound like a good thing for embedded systems.

Now I have no doubt that a lot of people like the fact that Apache has all of those as plugins as they can then use the same Apache build across different configurations without risking to have in memory more code and data than it’s actually needed, but is that right? While it’s obvious that it would be impossible to drop the plugin interface from Apache (since it’s used by third-party developers, more on that later), I would be glad if it was possible to build in the modules that come with Apache (given I can already choose which ones to build or not in Gentoo). Of course I also am using Apache with two configurations, and for instance the other one does not use the authentication system for anything, and this one is not using CGI, but is the overhead caused by the rest of modules worth the hassle, given that Apache already has a way to not initialise the unused built-ins?

I named above “third party developers” but I have to say now that it wasn’t really a proper definition, since it’s not just what third parties would do, it might very well be the original developers who might want to make use of plugins to develop separate projects for some (complex) features, and have different release handling altogether. For uses like that, the cost of plugins is often justifiable; and I am definitely not against having a plugin interface in feng. My main beef is when the plugins are created for functions that are part of the basic featureset of a software.

Another unfortunately not uncommon problem with plugins is that the interface might be skewed by bad design, like the case was (and is) for xine: when trying to open a file, it has to pass through all the plugins, so it loads all of them into memory, together with the libraries they depend on, to ask each of them to test the current file; since plugins cannot really be properly unloaded (and it’s not just a xine limitation) the memory will still be used, the libraries will still be mapped into memory (and relocated, causing copy on write, and thus, more memory) and at least half the point of using plugins has gone away (the ability to only load the code that is actually going to be used). Of course you’re left with the chance that an ABI break does not kill the whole program, but just the plugin, but that’s a very little advantage, given the cost involved in plugins handling. And the way xine was designed, it was definitely impossible to have third-party plugins developed properly.

And to finish off, I said before that plugins cannot be cleanly unloaded: the problem is not only that it’s difficult to have proper cleanup functions for plugins themselves (since often the allocated resources are stored within state variables), but also because some libraries (used as dependency) have no cleanup altogether, and they rely (erroneously) on the fact that they won’t be unloaded. And even when they know they could be unloaded, the PulseAudio libraries, for instance, have to remain loaded because there is no proper way to clean up Thread-Local Storage variables (and a re-load would be quite a problem). Which drives away another point of using plugins.

I leave the rest to you.

15 thoughts on “Plugins aren’t always a good choice

  1. I consider plugins a lesser evil. It’s just a matter of tradeoffs.If somebody wants to write some wacky or useful functionality for feng, now, it has to provide a patch that more than often could be rejected because doesn’t fit for a reason or another. I like feng to be as lean as possible and I really won’t like to put inside the main distribution lots of crazy or disputable stuff.The accesslog plugin had been written basically as proof of concept and example. If somebody wants to contribute some feature he needs he could provide a small patch for adding hooks and then can maintain the functionality as a separate project.That said I’ll make the optional features statically linkable through configure so everybody could get the best of both solutions:- kill the plugin loader and keep the functionality as built-in- stay with the loader and split plugins as shared object- any other combination.

    Like

  2. Speaking of USE flags, when changing one that adds a plugin, would it be possible to just build the plugin instead of rebuilding the whole package? It would also be nice for things like Qt… the split ebuilds are a horrible idea.

    Like

  3. flame@yamato feng % size modules/.libs/mod_accesslog.so   text   data    bss    dec    hexfilename   4792    704     16   5512   1588modules/.libs/mod_accesslog.so

    Which results in two pages (8KiB) for bss and data segments

    Are you sure about that? .data and .bss are basically the same thing (one is from disk, one is zero filled, both are read/write).I looked at elf_x86_64.x and elf_x86_64.xs [Binutils -> /usr/lib64/binutils/&#42/&#42/ldscripts] on my AMD64 system and the GNU linker at least does merge .bss into .data pages. [Along with TLS apparently]

    Like

  4. @nico: no you cannot, it’s definitely not trivial.@andrew as far as I can see, no: the final executables (and in this case the shared object) have different sections for .bss and .data (by definition) and as far as I know they are not packed together at runtime either. And definitely TLS and non-TLS pages cannot be packed&shared.

    Like

  5. I came here to make the same comment as andrew. If you compile and run the example at http://paste.factorcode.org… you will see that you are wrong: .bss and .data segments are loaded into the same memory page.For me, this gives: &a = 0x601038, &b = 0x601020

    Like

  6. Samuel, that happens for small variables, since it optimises stuff well, but it shouldn’t happen for bigger data/bss sections.

    Like

  7. Flameeyes: do you have any non-contrived example? From my experience,.data and .bss are always allocated consecutively from system startupfiles.Sure, I can build an example where some .bss entity requires analignment of 4096 and thus the .bss gets allocated in a new page, butwhat you’re saying goes against everything I’ve seen.What does @objdump –headers@ give on your modules? On most if not allbinaries, you should have VMA(.bss) = VMA(.data) + sizeof(.data) +alignment-padding.Example on glibc:<typo:code> 32 .data 00000d58 000000000034d040 000000000034d040 0014d040 2**5 33 .bss 00004a68 000000000034dda0 000000000034dda0 0014dd98 2**5% python -c “print hex(0x000000000034d040+0x00000d58)”34dd98</typo:code>This corresponds to 0x34dda0 (.bss VMA) after the 2**5 bytes alignment.

    Like

  8. While .bss and .data _can_ be the same, you forgot .rodata which _can’t_ share a page at least if you want a security/stability benefit from it), so that still leaves at least 3 pages for most modules.

    Like

  9. Oh, and it would be interesting to read up on other OSes, I think it is quite possible that e.g. OpenBSD would even try to insert a whole guard page between these – after all it even does for mallocs last I heard.

    Like

  10. Reimar: .rodata is usually disk-backed and is equivalent to .text (and is usually merged with it). And it is shared between instances.r

    Like

  11. bq. I would be glad if it was possible to build in the modules that come with Apache (given I can already choose which ones to build or not in Gentoo)How would you design a common infrastructure which catered for both dynamically-loaded plugins and optional built-in features? I ask because I am currently involved in the early stages of designing a highly modular piece of server software; and whilst the current plan is to make heavy use of plugins, I have to admit that in reality that there will be few – if any – developed by third parties.The project is a complete rewrite of some existing software. The current version uses a compile-time “builtin” system (like plugins, but can only be compiled directly into the main binary), with a lot of boilerplate per builtin (most of it frankly unnecessary – they were originally going to be fully-fledged external objects but this didn’t happen) and no ability for third parties to provide builtins. Each one is essentially an implementation of a common base class, but because they are all compiled in, the class names cannot overlap; because the symbols for the constructors all end up in the same binary, you have to know the details of all possible builtins at build time, and end up with something akin to a massive switch statement for instantiating the right derived class.Is it possible to design a plugin interface which allows for truly external, third-party plugins, with “official” plugins (optionally) compiled directly in, without having to worry about symbol collisions and special casing for constructing the compiled-in ones?

    Like

  12. Phil, I think I wrote some notes about that before, but I’ll probably write more when I’ll be back from London: this week is my vacation time :)

    Like

  13. I think the bad interface is ultimately the worst evil with plugins. It just takes ages to get a “generic” plugin interface right, and changes to the interface until it is right are always unpopular, so they may not be done altogether.It would usually be best if plugin interfaces were introduced very late in a project’s lifetime, when it is more clear what 3rd party functionality is actually needed and maintained – that way, a more adequate interface can be designed, which also has to stand up to built-in functionality.Until then, it makes so much more sense to provide a very specific plugin interface for limited functionality, which enforces a proper load-run-unload cycle, rather than some kind of generic “and here you can run whatever you want” thing.

    Like

  14. Err, doesn’t apache “already allow this”:http://httpd.apache.org/doc… ? –enable-_module_ vs –enable-_module_=shared with static linking the default?(preview fails with “You don’t have permission to access /comments/preview on this server.”)

    Like

  15. Would it by correct to sum up the problems by saying “Modules eat more memory than they save and force people to use nonoptimal ways of solving a problem”?I think something you didn’t name they can give a project is a simple way for new developers to join. They only have to learn the plugin interface instead of learning to understand the whole code.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s