I’ve been saying this for quite a while, probably one of the most on-topic post has been written a few months ago but there are some indications about it in posts about xine and other again.
I used to be an enthusiast about plugin interfaces; with time, though, I started having more and more doubts about their actual usefulness — it’s a tract I really much like in myself, I’m fine with reconsidering my own positions over time, deciding that I was wrong; it happened before with other things, like KDE (and C++ in general).
It’s not like I’m totally against the use of plugins altogether. I only think that they are expensive in more ways than one, and that their usefulness is often overstated, or tied to other kind of artificial limitations. For instance, dividing a software’s features over multiple plugins makes it easier for the binary distributions to package them, usually: they only have to ship a package with the main body of the software, and many for the plugins (one per plugin might actually be too much so sometimes they might be grouped). This works out pretty well for both the distribution and, usually, the user: the plugins that are not installed will not bring in extra dependencies, they won’t take time to load and they won’t use memory for either code nor data. It basically allows binary distribution to have a flexibility to compare with Gentoo’s USE flags (and similar options in almost any other source-based distribution).
But as I said this comes with costs, that might or might not be worth it in general. For instance, Luca wanted to implement plugins for feng similarly to what Apache and lighttpd have. I can understand his point: let’s not load code for the stuff we don’t have to deal with, which is more or less the same reason why Apache and lighttpd have modules; in the case of feng, if you don’t care about access log, why should you be loading the access load support at all? I can give you a couple of reasons:
- because the complexity of managing a plugin to deal with the access log (or any other similar task) is higher than just having a piece of static code that handles that;
- because the overhead of having a plugin loaded just to do that is higher than that of having the static code built in and not enabled into configuration.
The first problem is a result of the way a plugin interface is built: the main body of the software cannot know about its plugins in too specific ways. If the interface is a very generic plugin interface, you add some “hook locations” and then it’s the plugin’s task to find how to do its magic, not the software’s. There are some exceptions to this rule: if you have a plugin interface for handling protocols, like the KIO interface (and I think gvfs has the same) you get the protocol from the URL and call the correct plugin, but even then you’re leaving it to the plugin to deal with doing its magic. You can provide a way for the plugin to tell the main body what it needs and what it can do (like which functions it implements) but even that requires the plugins to be quite autonomous. And that means also being able to take care of allocating and freeing the resources as needed.
The second problem is not only tied to the cost of calling the dynamic linker dynamically to load the plugin and its eventual dependencies (which is a non-trivial amount of work, one has to say), also by the need for having code that deals with finding the modules to load, the loading of those modules, their initialisation, keeping a list of modules to call at any given interface point, and two more points: the PIC problem and the problem of less-than-page-sized segments. This last problem is often ignored, but it’s my main reason to dislike plugins when they are not warranted for other reasons. Given a page size of 4KiB (which is the norm on Linux for what I know), if the code is smaller than that size, it’ll still require a full page (it won’t pack with the rest of the software’s code areas); but at least code is disk-backed (if it’s PIC, of course), it’s worse for what concerns variable data, or variable relocated data, since those are not disk-backed, and it’s not rare that you’d be using a whole page for something like 100 bytes of actual variables.
In the case of the access log module that Luca wrote for feng, the statistics are as such:
flame@yamato feng % size modules/.libs/mod_accesslog.so
text data bss dec hex filename
4792 704 16 5512 1588 modules/.libs/mod_accesslog.so
Which results in two pages (8KiB) for bss and data segments, neither disk-backed, and two disk-backed pages for the executable code (text): 16KiB of addressable memory for a mapping that does not reach 6KiB, it’s a 10KiB overhead, which is much higher than 50%. And that’s the memory overhead alone. The whole overhead, as you might guess at this point, is usually within 12KiB (since you got three segments, and each can have at most one byte less than page size as overhead — it’s actually more complex than this but let’s assume this is true).
It really doesn’t sound like a huge overhead by itself, but you have to always judge it compared to the size of the plugin itself. In the case of feng’s access log, you got a very young plugin that lacks a lot of functionality, so one might say that with the time it’ll be worth it… so I’d like to show you the size statistics for the Apache modules on the very server my blog is hosted. Before doing so, though, I have to remind you one huge difference: feng is built with most optimisations turned off, while Apache is built optimised for size; they are both AMD64 though so the comparison is quite easy.
flame@vanguard ~ $ size /usr/lib64/apache2/modules/*.so | sort -n -k 4
text data bss dec hex filename
2529 792 16 3337 d09 /usr/lib64/apache2/modules/mod_authn_default.so
2960 808 16 3784 ec8 /usr/lib64/apache2/modules/mod_authz_user.so
3499 856 16 4371 1113 /usr/lib64/apache2/modules/mod_authn_file.so
3617 912 16 4545 11c1 /usr/lib64/apache2/modules/mod_env.so
3773 808 24 4605 11fd /usr/lib64/apache2/modules/mod_logio.so
4035 888 16 4939 134b /usr/lib64/apache2/modules/mod_dir.so
4161 752 80 4993 1381 /usr/lib64/apache2/modules/mod_unique_id.so
4136 888 16 5040 13b0 /usr/lib64/apache2/modules/mod_actions.so
5129 952 24 6105 17d9 /usr/lib64/apache2/modules/mod_authz_host.so
6589 1056 16 7661 1ded /usr/lib64/apache2/modules/mod_file_cache.so
6826 1024 16 7866 1eba /usr/lib64/apache2/modules/mod_expires.so
7367 1040 16 8423 20e7 /usr/lib64/apache2/modules/mod_setenvif.so
7519 1064 16 8599 2197 /usr/lib64/apache2/modules/mod_speling.so
8583 1240 16 9839 266f /usr/lib64/apache2/modules/mod_alias.so
11006 1168 16 12190 2f9e /usr/lib64/apache2/modules/mod_filter.so
12269 1184 32 13485 34ad /usr/lib64/apache2/modules/mod_headers.so
12521 1672 24 14217 3789 /usr/lib64/apache2/modules/mod_mime.so
15935 1312 16 17263 436f /usr/lib64/apache2/modules/mod_deflate.so
18150 1392 224 19766 4d36 /usr/lib64/apache2/modules/mod_log_config.so
18358 2040 16 20414 4fbe /usr/lib64/apache2/modules/mod_mime_magic.so
18996 1544 48 20588 506c /usr/lib64/apache2/modules/mod_cgi.so
20406 1592 32 22030 560e /usr/lib64/apache2/modules/mod_mem_cache.so
22593 1504 152 24249 5eb9 /usr/lib64/apache2/modules/mod_auth_digest.so
26494 1376 16 27886 6cee /usr/lib64/apache2/modules/mod_negotiation.so
27576 1800 64 29440 7300 /usr/lib64/apache2/modules/mod_cache.so
54299 2096 80 56475 dc9b /usr/lib64/apache2/modules/mod_rewrite.so
268867 13152 80 282099 44df3 /usr/lib64/apache2/modules/mod_security2.so
288868 11520 280 300668 4967c /usr/lib64/apache2/modules/mod_passenger.so
The list is ordered for size of the whole plugin (summed up, not counting padding); the last three positions are definitely unsurprisingly, although it surprises me the sheer size of the two that are not part of Apache itself (and I start to wonder whether they link something in statically that I missed). The fact that the rewrite module was likely the most complex plugin in Apache’s distribution never left me.
As you can see, almost all plugins have vast overhead especially for what concerns the bss segment (all of them have at least 16 bytes used, and that warrants a whole page for them: 4080 bytes wasted each); the data segment is also interesting: only the two external ones have more than a page worth of variables (which also is suspicious to me). When all the plugins are loaded (like they most likely are right now as well on my server) there are at least 100KiB of overhead; just for the sheer fact that these are plugins and thus have their own address space. Might not sound like a lot of overhead indeed, since Apache is requesting so much memory already, especially with Passenger running, but it definitely doesn’t sound like a good thing for embedded systems.
Now I have no doubt that a lot of people like the fact that Apache has all of those as plugins as they can then use the same Apache build across different configurations without risking to have in memory more code and data than it’s actually needed, but is that right? While it’s obvious that it would be impossible to drop the plugin interface from Apache (since it’s used by third-party developers, more on that later), I would be glad if it was possible to build in the modules that come with Apache (given I can already choose which ones to build or not in Gentoo). Of course I also am using Apache with two configurations, and for instance the other one does not use the authentication system for anything, and this one is not using CGI, but is the overhead caused by the rest of modules worth the hassle, given that Apache already has a way to not initialise the unused built-ins?
I named above “third party developers” but I have to say now that it wasn’t really a proper definition, since it’s not just what third parties would do, it might very well be the original developers who might want to make use of plugins to develop separate projects for some (complex) features, and have different release handling altogether. For uses like that, the cost of plugins is often justifiable; and I am definitely not against having a plugin interface in feng. My main beef is when the plugins are created for functions that are part of the basic featureset of a software.
Another unfortunately not uncommon problem with plugins is that the interface might be skewed by bad design, like the case was (and is) for xine: when trying to open a file, it has to pass through all the plugins, so it loads all of them into memory, together with the libraries they depend on, to ask each of them to test the current file; since plugins cannot really be properly unloaded (and it’s not just a xine limitation) the memory will still be used, the libraries will still be mapped into memory (and relocated, causing copy on write, and thus, more memory) and at least half the point of using plugins has gone away (the ability to only load the code that is actually going to be used). Of course you’re left with the chance that an ABI break does not kill the whole program, but just the plugin, but that’s a very little advantage, given the cost involved in plugins handling. And the way xine was designed, it was definitely impossible to have third-party plugins developed properly.
And to finish off, I said before that plugins cannot be cleanly unloaded: the problem is not only that it’s difficult to have proper cleanup functions for plugins themselves (since often the allocated resources are stored within state variables), but also because some libraries (used as dependency) have no cleanup altogether, and they rely (erroneously) on the fact that they won’t be unloaded. And even when they know they could be unloaded, the PulseAudio libraries, for instance, have to remain loaded because there is no proper way to clean up Thread-Local Storage variables (and a re-load would be quite a problem). Which drives away another point of using plugins.
I leave the rest to you.
I consider plugins a lesser evil. It’s just a matter of tradeoffs.If somebody wants to write some wacky or useful functionality for feng, now, it has to provide a patch that more than often could be rejected because doesn’t fit for a reason or another. I like feng to be as lean as possible and I really won’t like to put inside the main distribution lots of crazy or disputable stuff.The accesslog plugin had been written basically as proof of concept and example. If somebody wants to contribute some feature he needs he could provide a small patch for adding hooks and then can maintain the functionality as a separate project.That said I’ll make the optional features statically linkable through configure so everybody could get the best of both solutions:- kill the plugin loader and keep the functionality as built-in- stay with the loader and split plugins as shared object- any other combination.
Speaking of USE flags, when changing one that adds a plugin, would it be possible to just build the plugin instead of rebuilding the whole package? It would also be nice for things like Qt… the split ebuilds are a horrible idea.
Are you sure about that? .data and .bss are basically the same thing (one is from disk, one is zero filled, both are read/write).I looked at elf_x86_64.x and elf_x86_64.xs [Binutils -> /usr/lib64/binutils/*/*/ldscripts] on my AMD64 system and the GNU linker at least does merge .bss into .data pages. [Along with TLS apparently]
@nico: no you cannot, it’s definitely not trivial.@andrew as far as I can see, no: the final executables (and in this case the shared object) have different sections for .bss and .data (by definition) and as far as I know they are not packed together at runtime either. And definitely TLS and non-TLS pages cannot be packed&shared.
I came here to make the same comment as andrew. If you compile and run the example at http://paste.factorcode.org… you will see that you are wrong: .bss and .data segments are loaded into the same memory page.For me, this gives: &a = 0x601038, &b = 0x601020
Samuel, that happens for small variables, since it optimises stuff well, but it shouldn’t happen for bigger data/bss sections.
Flameeyes: do you have any non-contrived example? From my experience,.data and .bss are always allocated consecutively from system startupfiles.Sure, I can build an example where some .bss entity requires analignment of 4096 and thus the .bss gets allocated in a new page, butwhat you’re saying goes against everything I’ve seen.What does @objdump –headers@ give on your modules? On most if not allbinaries, you should have VMA(.bss) = VMA(.data) + sizeof(.data) +alignment-padding.Example on glibc:<typo:code> 32 .data 00000d58 000000000034d040 000000000034d040 0014d040 2**5 33 .bss 00004a68 000000000034dda0 000000000034dda0 0014dd98 2**5% python -c “print hex(0x000000000034d040+0x00000d58)”34dd98</typo:code>This corresponds to 0x34dda0 (.bss VMA) after the 2**5 bytes alignment.
While .bss and .data _can_ be the same, you forgot .rodata which _can’t_ share a page at least if you want a security/stability benefit from it), so that still leaves at least 3 pages for most modules.
Oh, and it would be interesting to read up on other OSes, I think it is quite possible that e.g. OpenBSD would even try to insert a whole guard page between these – after all it even does for mallocs last I heard.
Reimar: .rodata is usually disk-backed and is equivalent to .text (and is usually merged with it). And it is shared between instances.r
bq. I would be glad if it was possible to build in the modules that come with Apache (given I can already choose which ones to build or not in Gentoo)How would you design a common infrastructure which catered for both dynamically-loaded plugins and optional built-in features? I ask because I am currently involved in the early stages of designing a highly modular piece of server software; and whilst the current plan is to make heavy use of plugins, I have to admit that in reality that there will be few – if any – developed by third parties.The project is a complete rewrite of some existing software. The current version uses a compile-time “builtin” system (like plugins, but can only be compiled directly into the main binary), with a lot of boilerplate per builtin (most of it frankly unnecessary – they were originally going to be fully-fledged external objects but this didn’t happen) and no ability for third parties to provide builtins. Each one is essentially an implementation of a common base class, but because they are all compiled in, the class names cannot overlap; because the symbols for the constructors all end up in the same binary, you have to know the details of all possible builtins at build time, and end up with something akin to a massive switch statement for instantiating the right derived class.Is it possible to design a plugin interface which allows for truly external, third-party plugins, with “official” plugins (optionally) compiled directly in, without having to worry about symbol collisions and special casing for constructing the compiled-in ones?
Phil, I think I wrote some notes about that before, but I’ll probably write more when I’ll be back from London: this week is my vacation time 🙂
I think the bad interface is ultimately the worst evil with plugins. It just takes ages to get a “generic” plugin interface right, and changes to the interface until it is right are always unpopular, so they may not be done altogether.It would usually be best if plugin interfaces were introduced very late in a project’s lifetime, when it is more clear what 3rd party functionality is actually needed and maintained – that way, a more adequate interface can be designed, which also has to stand up to built-in functionality.Until then, it makes so much more sense to provide a very specific plugin interface for limited functionality, which enforces a proper load-run-unload cycle, rather than some kind of generic “and here you can run whatever you want” thing.
Err, doesn’t apache “already allow this”:http://httpd.apache.org/doc… ? –enable-_module_ vs –enable-_module_=shared with static linking the default?(preview fails with “You don’t have permission to access /comments/preview on this server.”)
Would it by correct to sum up the problems by saying “Modules eat more memory than they save and force people to use nonoptimal ways of solving a problem”?I think something you didn’t name they can give a project is a simple way for new developers to join. They only have to learn the plugin interface instead of learning to understand the whole code.