I’ve been saying this for quite a while, probably one of the most on-topic post has been written a few months ago but there are some indications about it in posts about xine and other again.
I used to be an enthusiast about plugin interfaces; with time, though, I started having more and more doubts about their actual usefulness — it’s a tract I really much like in myself, I’m fine with reconsidering my own positions over time, deciding that I was wrong; it happened before with other things, like KDE (and C++ in general).
It’s not like I’m totally against the use of plugins altogether. I only think that they are expensive in more ways than one, and that their usefulness is often overstated, or tied to other kind of artificial limitations. For instance, dividing a software’s features over multiple plugins makes it easier for the binary distributions to package them, usually: they only have to ship a package with the main body of the software, and many for the plugins (one per plugin might actually be too much so sometimes they might be grouped). This works out pretty well for both the distribution and, usually, the user: the plugins that are not installed will not bring in extra dependencies, they won’t take time to load and they won’t use memory for either code nor data. It basically allows binary distribution to have a flexibility to compare with Gentoo’s USE flags (and similar options in almost any other source-based distribution).
But as I said this comes with costs, that might or might not be worth it in general. For instance, Luca wanted to implement plugins for feng similarly to what Apache and lighttpd have. I can understand his point: let’s not load code for the stuff we don’t have to deal with, which is more or less the same reason why Apache and lighttpd have modules; in the case of feng, if you don’t care about access log, why should you be loading the access load support at all? I can give you a couple of reasons:
- because the complexity of managing a plugin to deal with the access log (or any other similar task) is higher than just having a piece of static code that handles that;
- because the overhead of having a plugin loaded just to do that is higher than that of having the static code built in and not enabled into configuration.
The first problem is a result of the way a plugin interface is built: the main body of the software cannot know about its plugins in too specific ways. If the interface is a very generic plugin interface, you add some “hook locations” and then it’s the plugin’s task to find how to do its magic, not the software’s. There are some exceptions to this rule: if you have a plugin interface for handling protocols, like the KIO interface (and I think gvfs has the same) you get the protocol from the URL and call the correct plugin, but even then you’re leaving it to the plugin to deal with doing its magic. You can provide a way for the plugin to tell the main body what it needs and what it can do (like which functions it implements) but even that requires the plugins to be quite autonomous. And that means also being able to take care of allocating and freeing the resources as needed.
The second problem is not only tied to the cost of calling the dynamic linker dynamically to load the plugin and its eventual dependencies (which is a non-trivial amount of work, one has to say), also by the need for having code that deals with finding the modules to load, the loading of those modules, their initialisation, keeping a list of modules to call at any given interface point, and two more points: the PIC problem and the problem of less-than-page-sized segments. This last problem is often ignored, but it’s my main reason to dislike plugins when they are not warranted for other reasons. Given a page size of 4KiB (which is the norm on Linux for what I know), if the code is smaller than that size, it’ll still require a full page (it won’t pack with the rest of the software’s code areas); but at least code is disk-backed (if it’s PIC, of course), it’s worse for what concerns variable data, or variable relocated data, since those are not disk-backed, and it’s not rare that you’d be using a whole page for something like 100 bytes of actual variables.
In the case of the access log module that Luca wrote for feng, the statistics are as such:
flame@yamato feng % size modules/.libs/mod_accesslog.so
text data bss dec hex filename
4792 704 16 5512 1588 modules/.libs/mod_accesslog.so
Which results in two pages (8KiB) for bss and data segments, neither disk-backed, and two disk-backed pages for the executable code (text): 16KiB of addressable memory for a mapping that does not reach 6KiB, it’s a 10KiB overhead, which is much higher than 50%. And that’s the memory overhead alone. The whole overhead, as you might guess at this point, is usually within 12KiB (since you got three segments, and each can have at most one byte less than page size as overhead — it’s actually more complex than this but let’s assume this is true).
It really doesn’t sound like a huge overhead by itself, but you have to always judge it compared to the size of the plugin itself. In the case of feng’s access log, you got a very young plugin that lacks a lot of functionality, so one might say that with the time it’ll be worth it… so I’d like to show you the size statistics for the Apache modules on the very server my blog is hosted. Before doing so, though, I have to remind you one huge difference: feng is built with most optimisations turned off, while Apache is built optimised for size; they are both AMD64 though so the comparison is quite easy.
flame@vanguard ~ $ size /usr/lib64/apache2/modules/*.so | sort -n -k 4
text data bss dec hex filename
2529 792 16 3337 d09 /usr/lib64/apache2/modules/mod_authn_default.so
2960 808 16 3784 ec8 /usr/lib64/apache2/modules/mod_authz_user.so
3499 856 16 4371 1113 /usr/lib64/apache2/modules/mod_authn_file.so
3617 912 16 4545 11c1 /usr/lib64/apache2/modules/mod_env.so
3773 808 24 4605 11fd /usr/lib64/apache2/modules/mod_logio.so
4035 888 16 4939 134b /usr/lib64/apache2/modules/mod_dir.so
4161 752 80 4993 1381 /usr/lib64/apache2/modules/mod_unique_id.so
4136 888 16 5040 13b0 /usr/lib64/apache2/modules/mod_actions.so
5129 952 24 6105 17d9 /usr/lib64/apache2/modules/mod_authz_host.so
6589 1056 16 7661 1ded /usr/lib64/apache2/modules/mod_file_cache.so
6826 1024 16 7866 1eba /usr/lib64/apache2/modules/mod_expires.so
7367 1040 16 8423 20e7 /usr/lib64/apache2/modules/mod_setenvif.so
7519 1064 16 8599 2197 /usr/lib64/apache2/modules/mod_speling.so
8583 1240 16 9839 266f /usr/lib64/apache2/modules/mod_alias.so
11006 1168 16 12190 2f9e /usr/lib64/apache2/modules/mod_filter.so
12269 1184 32 13485 34ad /usr/lib64/apache2/modules/mod_headers.so
12521 1672 24 14217 3789 /usr/lib64/apache2/modules/mod_mime.so
15935 1312 16 17263 436f /usr/lib64/apache2/modules/mod_deflate.so
18150 1392 224 19766 4d36 /usr/lib64/apache2/modules/mod_log_config.so
18358 2040 16 20414 4fbe /usr/lib64/apache2/modules/mod_mime_magic.so
18996 1544 48 20588 506c /usr/lib64/apache2/modules/mod_cgi.so
20406 1592 32 22030 560e /usr/lib64/apache2/modules/mod_mem_cache.so
22593 1504 152 24249 5eb9 /usr/lib64/apache2/modules/mod_auth_digest.so
26494 1376 16 27886 6cee /usr/lib64/apache2/modules/mod_negotiation.so
27576 1800 64 29440 7300 /usr/lib64/apache2/modules/mod_cache.so
54299 2096 80 56475 dc9b /usr/lib64/apache2/modules/mod_rewrite.so
268867 13152 80 282099 44df3 /usr/lib64/apache2/modules/mod_security2.so
288868 11520 280 300668 4967c /usr/lib64/apache2/modules/mod_passenger.so
The list is ordered for size of the whole plugin (summed up, not counting padding); the last three positions are definitely unsurprisingly, although it surprises me the sheer size of the two that are not part of Apache itself (and I start to wonder whether they link something in statically that I missed). The fact that the rewrite module was likely the most complex plugin in Apache’s distribution never left me.
As you can see, almost all plugins have vast overhead especially for what concerns the bss segment (all of them have at least 16 bytes used, and that warrants a whole page for them: 4080 bytes wasted each); the data segment is also interesting: only the two external ones have more than a page worth of variables (which also is suspicious to me). When all the plugins are loaded (like they most likely are right now as well on my server) there are at least 100KiB of overhead; just for the sheer fact that these are plugins and thus have their own address space. Might not sound like a lot of overhead indeed, since Apache is requesting so much memory already, especially with Passenger running, but it definitely doesn’t sound like a good thing for embedded systems.
Now I have no doubt that a lot of people like the fact that Apache has all of those as plugins as they can then use the same Apache build across different configurations without risking to have in memory more code and data than it’s actually needed, but is that right? While it’s obvious that it would be impossible to drop the plugin interface from Apache (since it’s used by third-party developers, more on that later), I would be glad if it was possible to build in the modules that come with Apache (given I can already choose which ones to build or not in Gentoo). Of course I also am using Apache with two configurations, and for instance the other one does not use the authentication system for anything, and this one is not using CGI, but is the overhead caused by the rest of modules worth the hassle, given that Apache already has a way to not initialise the unused built-ins?
I named above “third party developers” but I have to say now that it wasn’t really a proper definition, since it’s not just what third parties would do, it might very well be the original developers who might want to make use of plugins to develop separate projects for some (complex) features, and have different release handling altogether. For uses like that, the cost of plugins is often justifiable; and I am definitely not against having a plugin interface in feng. My main beef is when the plugins are created for functions that are part of the basic featureset of a software.
Another unfortunately not uncommon problem with plugins is that the interface might be skewed by bad design, like the case was (and is) for xine: when trying to open a file, it has to pass through all the plugins, so it loads all of them into memory, together with the libraries they depend on, to ask each of them to test the current file; since plugins cannot really be properly unloaded (and it’s not just a xine limitation) the memory will still be used, the libraries will still be mapped into memory (and relocated, causing copy on write, and thus, more memory) and at least half the point of using plugins has gone away (the ability to only load the code that is actually going to be used). Of course you’re left with the chance that an ABI break does not kill the whole program, but just the plugin, but that’s a very little advantage, given the cost involved in plugins handling. And the way xine was designed, it was definitely impossible to have third-party plugins developed properly.
And to finish off, I said before that plugins cannot be cleanly unloaded: the problem is not only that it’s difficult to have proper cleanup functions for plugins themselves (since often the allocated resources are stored within state variables), but also because some libraries (used as dependency) have no cleanup altogether, and they rely (erroneously) on the fact that they won’t be unloaded. And even when they know they could be unloaded, the PulseAudio libraries, for instance, have to remain loaded because there is no proper way to clean up Thread-Local Storage variables (and a re-load would be quite a problem). Which drives away another point of using plugins.
I leave the rest to you.