In a previous post of mine, Mart (leio) advocated the use of misdirected link to enable splitting the non-RSS-hungry libxml2 modules from the ones that create a lot of dirty pages; his concern is very true and I can feel it very well, since libxml2 is indeed a bit of a memory-hungry library. On my firefox instance it reports this:
vmsize rss clean rss dirty file 32 kb 0 kb 32 kb /usr/lib64/libxml2.so.2.7.2 8 kb 0 kb 8 kb /usr/lib64/libxml2.so.2.7.2 1396 kb 336 kb 0 kb /usr/lib64/libxml2.so.2.7.2
While it is shared, it still has 336KiB of resident memory, which is something that is not too bad but not even too good, after all. But how would one split that library? Well you got to know libxml2 interface a bit to understand this fully, so let’s just try to say that libxml2 has a modular design, and it offers a series of interfaces that are more or less tied together.
For instance, for my daily job I had to write a proprietary utility that uses libxml2 XPath interface as well as the writer module that allows for easy writing of XML files with a very nice interface (the work was done under Windows; building and using libxml2 was much easier than trying to get Codegear’s parser to work, or to interface to Microsoft’s MSXML libraries). I disabled everything that was not needed for this to work, and reduced libxml2 to the minimum amount of needed code.
Software that only needs parsing wouldn’t need the writer module, and not all would require DOM, SAX or PUSH, or XPath and XPointer, and so on so forth. To be able to disable the extra stuff there are a series of
./configure flags, but mapping those to USE flags is not really feasible since you’d be breaking ABI; plus a solution should be found with upstream in my opinion.
So what Mart suggested was breaking the library in half, with a “mini” version being the non-memory-hungry and the rest of the interfaces. My proposal here would be much bigger, breaking the ABI a lot, but also very very exhaustive: break up libxml2 in a series of small libraries each representing an interface. A software needing one of them would link it in and be done with it. Beside breaking ABI, this would also break all the software using libxml2 though, even rebuilding it, which is very very bad. Well, the solution is actually much easier:
OUTPUT_FORMAT ( elf64-x86-64 ) GROUP ( AS_NEEDED ( libxml2-sax2.so libxml2-schemas.so libxml2-schematron.so libxml2-writer.so libxml2-xpath.so .... ) )
This is an ldscript, which tells the linker what to do; save it as
libxml2.so and linking with
-lxml2 will pull in just the required libraries for the interface used by the program. If you look at your
/usr/lib, you got already quite a few of these because Gentoo installs those for the libraries that are moved into
/lib instead. This works around the inability to use misdirected linking for wrappers.
Now of course this trick does not work with every linker out there; but it works with GNU ld and with Sun’s linker, and those are the two for which
--as-needed makes sense; if libxml2 where to break itself in multiple libraries, they could decide depending on a configure option whether to install a ldscript wrapper or a non-asneeded capable library, so that Linux, FreeBSD and Solaris (and others) would use the ldscript without adding further ELF files, and the others would go with a compatibility method.
Please also note that using pkg-config for libraries discovery would make this also easier without having wrappers at all, as
libxml2.pc would just have to list all the interfaces in their