The battles and the losses

In the past years I picked up more than a couple of “battles” to improve Free Software quality all over. Some of these were controversial, like `–as-needed and some of them have been just lost causes (like trying to get rid of C++ strict requirements on server systems). All of those though, were fought with the hope of improving the situation all over, and sometimes the few accomplishments were quite a satisfaction by themselves.

I always thought that my battle for --as-needed support was going to be controversial because it does make a lot of software require fixes, but strangely enough, this has been reduced a lot. Most of the newly released software works out of the box with --as-needed, although there are some interesting exceptions, like GhostScript and libvirt. On the positive exceptions, there is for instance Luis R. Rodriguez, who made a new release of crda just to apply an --as-needed fix with a failure that was introduced in the previous release. It’s very refreshing to see that nowadays maintainers of core packages like these are concerned with these issues. I’m sure that when I’ve started working on --as-needed nobody would have made a new point release just to address such an issue.

This makes it much more likely for me to work on adding the warning to the new --as-needed and even more needed for me to find why ld fails to link PulseAudio libraries even though I’d have expected him to.

Another class of changes that I’ve been working on that have shown more interest around than I would have expected is my work on cowstats which, for the sake of self-interest, formed most of the changes in the ALSA 1.0.19 release for what concerns the userland part of the packages (see my previous post on the matter).

On this case, I wish first to thank _notadev_ for sending me Linkers and Loaders, that is going to help me improve Ruby-Elf more and more; thanks! And since I’m speaking of Ruby-Elf, I finally decided its fate: it’ll stay. My reasoning is that first of all I was finally able to get it to work with both Ruby 1.8 and 1.9 adding a single thin wrapper (that is going to be moved to Ruby Bombe once I actually finish that), and most importantly, the code is there, I don’t want to start from scratch, there is no point in that, and I think that both Ruby 1.9 and JRuby can improve from each other (the first losing the Global Interpreter Lock and the other one trying to speed up its starting time). And I could even decide to find time to write a C-based extension, as part of Ruby-Bombe, that takes care of byteswapping memory, maybe even using OpenMP.

Also, Ruby-Elf have been serving its time a lot with the collision detection script which is hard to move to something different since it really is a thin wrapper around PostgreSQL queries, and I don’t really like to deal with SQL in C. Speaking about the collision detection script, I stand by my conclusion that software sucks (but proprietary software stinks too).

Unfortunately while there are good signs to the issue of bundled libraries, like Lennart’s concerns with the internal copies of libltdl in both PulseAudio (now fixed) and libcanberra (also staged for removal) the whole issue is not solved yet, there are still packages in the tree with a huge amount of bundled libraries, like Avidemux and Ardour, and more scream to enter (and thankfully they don’t always do). -If you’d like to see the current list of collisions, I’ve uploaded the LZMA-compressed output of my script.- If you want you can clone Ruby-Elf and send me patches to extend the suppression files, to remove further noise from the file.

At any rate I’m going to continue my tinderboxing efforts, while waiting for the new disks, and work on my log analyser again. The problem with that is I really am slow at writing Python code, so I guess it would be much easier if I were to reimplement the few extra functions that I’m using out of Portage’s interface in Ruby and use those, or find a way to interface with Portage’s Python interface from Ruby. This is probably a good enough reason for me to stick with Ruby, sure Python can be faster, sure I can get better multithreading with C and Vala, but it takes me much less time to write these things with Ruby than it would take me in any of the other languages. I guess it’s a problem with the mindset.

And on the other hand, if I have problems with Ruby I should probably just find time to improve the implementation; JRuby is enough evidence to show that my beef against Ruby 1.9 runtime not supporting multithreading are an implementation issue and not a language issue.

Ruby-Elf and multiple compilers

I’ve written about supporting multiple compilers; I’ve written about testing stuff on OpenSolaris and I have written about wanting to support Sun extensions in Ruby-Elf.

Today I wish to write about the way I’m currently losing my head to get Ruby-Elf testsuite to apply over other compilres beside GCC. Since I’ve been implementing some new features for missingstatic upon request (by Emanuele “exg”), I decided to add some more tests, in particular considering Sun and Intel compilers that I decided to support for FFmpeg at least.

The new tests not only apply the already-present generic ELF tests (but rewritten and improved, so that I can extend them much more quickly) over files built with ICC and Sun Studio under Linux/AMD64, but also adds tests to check for the nm(1)-like code on a catalogue of different symbols.

The results are interesting in my view:

  • Sun Studio does not generate .data.rel sections, it only generates a single .picdata section, which is not divided between read-only and read-write (which might have bad results with prelinking);
  • Sun Studio also emits uninitialised non-static TLS variables as common symbols rather than in .tbss (this sounds like a mistake to me sincerely!);
  • the Intel C Compiler enables optimisation by default;
  • it also optimises out unused static symbols with -O0;
  • and even with __attribute__((used)) it optimises out static uninitialised variables (both TLS and non-TLS);
  • oh and it puts a “.0” suffix to the name of unit-static data symbols (I guess to discern between them and function-static symbols, that usually have a code after them);
  • and least but not last: ICC does not emit a .data.rel section, nor a .picdata section: everything is emitted in .data section. This means that if you’re building something with ICC and expect cowstats to work on it, then you’re out of luck; but it’s not just that, it also means that prelinking will not help you at all to reduce memory usage, just a bit to reduce startup time.

Fixing up some stuff for Sun Studio was easy, and now cowstats will work fine even under Sun Studio compiled source code, taking care of ICC quirks not so much, and also meant wasting time.

On the other hand, there is one new feature to missingstatic: now it shows the nm(1)-like symbol near the symbols that are identified as missing the static modifier, this way you can tell if it’s a function, or constant, or a variable.

And of course, there are two manpages: missingstatic(1) and cowstats(1) (DocBook 5 rulez!) that describe the options and some of the working of the two tools; hopefully I’ll write more documentation in the next weeks and that’ll help Ruby-Elf being accepted and used. Once I have enough documentation about it I might actually decide to release something. — I’m also considering the idea of routing --help to man like git commands do.

OpenSolaris Granularity

If you follow my blog since a long time ago you know I had to fight already a couple of time with Solaris and VirtualBox to get a working Solaris virtual machine to test xine-lib and other sotware on.

I tried again yesterday to get one working, since Innotek was bought by Sun, VirtualBox support for Solaris improved notably, to the point they now have a different network card emulated by default, that works with Solaris (that has been the long-standing problem).

So I was able to install OpenSolaris, and thanks to Project Indiana I was able to check which packages were installed, to remove stuff I don’t need and add what I needed. Unfortunately I think the default granularity is a bit concerning. Compiz on a virtual machine?

The first thing I noticed is that an update of a newly-installed system with the last released media requires to download almost the size of the whole disk in updates, the disk is a simple 650MB CD image, and the updates were over 500MB. I suppose this is to be expected, but at that point, why not pointing to some updated media by default, considering updating is far from being trivial? Somehow I was unable to perform the update properly with the GUI package manager, and I had to use the command-line tools.

Also, removing superfluous packages is not an easy task, since the dependency tracking is not exactly the best out there: it’s not strange for a set of packages not to be removed because some of them are dependencies… of one of them being removed (this usually seems to be due to plugin-ins; even after removing the plugins, it’d still cache the broken dependency and disallow me from removing the packages).

It’s not all here of course, for instance to find the basic development tools in their package manager is a problem of its own; while if you look for “automake” it will find a package named SUNWgnu-automake, if you look for “autoconf” it will find nothing; the package is called SUNWaconf. I still haven’t been able to find pkg-config, although the system installs .pc files just fine.

I guess my best bet would be to remove almost everything out the system from their own package manager and decide to try prefixed-Portage, but I just haven’t had the will to look into that just yet. I hope it would also help with the version of GCC that Sun provides (3.4.3).

I got interested back into Solaris since, after a merge of Firefox 3.0.2, I noticed cowstats throwing up an error on an object file, and following to that, I found out a couple of things:

  • cowstats didn’t manage unknown sections very well;
  • Firefox ships with some testcases for the Google-developed crash handler;
  • one of these testcases is an ELF ET_EXEC file (with .o extension) built for Solaris, that reports a standard SysV ABI (rather than a Sun-specific one), but still contains Sun-specific sections;
  • readelf from Binutils is not that solid as its homologue from Elfutils.

Now cowstats should handle these corner-cases pretty well, but I want to enrich my testcases with some Solaris objects. Interestingly enough, in ruby-elf probably 80% of the size of an eventual tarball would be taken up by test data rather than actual code. I guess this is a side-effect of TDD, but also exactly why TDD-based code is usually more solid (every time I find an error of mine in ruby-elf, I tend to write a test for it).

Anyway, bottom line: I think Project Indiana would have been better by adapting RPM to their needs rather than inventing the package manager they invented, since it doesn’t seem to have any feature lacking in Fedora, but it lacks quite a bit of other things.

My checklist when fixing packages

As I wrote I’ll be trying to write more documentation about what I do, rather than doing stuff. This is because I’m simply too tired, and I should rest and relax rather than stress myself.

So after playing some Lego Star Wars I’ve decided to take a look to what I need to document for PAM. There was an easy bug to fix so I decided to tackle that down; tackling that down I decided to look if I was missing anything and I noticed that sys-libs/pam could use a debug USE flag. Unfotunately, not only it does not build with debug USE flag enabled, but it also fails with it disabled because the configure file was written by someone who yet again fail at using AC_ARG_ENABLE.

But this was just one of the two things I noticed today and I wished to fix if I didn’t have to rest, so I decided to write here a small checklist I follow when I have to check or fix packages:

  • If the package is using autotools, I make sure they can be rebuilt with a simple autoreconf -i. Usually this fails when macros are present in the m4 directory (or something like that), or if it misses the gettext version specification for autopoint.
  • If the package supports out-of-sourcetree builds, I create an “enterprise” directory and build from there (usually it involves a ../configure). A lot of packages fail at this step because they assume that source directory and build directory are one and the same.
  • If the package uses assert() I make sure it works with it disabled (-DNDEBUG); this is usually nice to link to the debug USE flag to remove debugging code.
  • I check the resulting object files with cowstats (check Introducing cowstats for more information about this), and see if I can improve the situation with some trivial changes.
  • I check the resulting object files with missingstatic (another script in ruby-elf).
  • If the package uses automake I make sure the _LDFLAGS variables don’t contain libraries to link to (would break --as-needed).
  • I check for possible bundled libraries we don’t want to use.
  • I check for possible automagic dependencies that we don’t want in the ebuild.
  • I run a build with all the useful warnings enabled, and see if there is something that needs to be fixed.

Such a checklist, if done from start to end, may generate a fair amount of patches that have to be sent upstream. It usually requires to check them on their development repositories too so that the patches are not obsoleted already.

As you can guess by now, it’s not exactly the quickest of the tasks, and it depends a lot on the power of the development box I’m working on. Unfortunately using a faster remote box does not always help because, even if Emacs’s tramp is quite neat, it does not make it easy to access the sources for editing. And having the sources locally and mounting them remotely doesn’t resolve either, as the build would then stall on getting the sources.

My plans were to get either the YDL PowerStation or a dual quad-core Opteron system (yes I know it’s overkill, but I don’t want to have to upgrade system every three years). It wouldn’t have been that bad, I just needed to take a couple of jobs during summer and early fall, and I could afford them. Right now, though, the situation looks pretty bad. I’m not sure whether I can get a new job done before fall, and even though medical care in Italy is paid for by the government, there are a few expenses I had to make (like for instance an over-quota for my Internet connection to download the software to view my CAT scans while I was in the hospital — long story, I’ll write about that another day), and the visit next Tuesday is in private practice (so I’ll have to pay for it).

If you care about a package, try to apply these checks on it, and see if upstream can pick up some improvements :) Every little drop is an advantage!

A good kind of cow

After all I wrote before, I’m sure at least some people might think that every COW is a bad COW, and that you should never use COW sections in your programs and libraries.

It’s not exactly this way. There are times when using a copy on write section like .bss is a good choice over the alternatives.

This is true for instance for buffers: there are mainly three ways to handle buffers: malloc() allocated buffers, automatic buffers allocated on the stack, and static buffers that are added to .bss.

A buffer allocated on the stack has the main advantage of not having to be explicitly free()’d, but big buffers on the stack, especially if not well warded, might cause security issues. Plus it might require a big stack.

A buffer allocated in heap through malloc() is more flexible, as you only request memory as needed, and free it as soon as it’s not needed anymore (for stack-based buffers, you need to wait the end of the block, or create a block for the instructions to be executed that use the buffer). This reduces the memory footprint when looked at in time, but it has a little overhead as malloc() and free() are called.

Another option is to use static arrays as buffers. Non-initialised static arrays are put into .bss which is a copy on write section that is usually backed against the zero page (although I’m not yet sure how the changes in Linux 2.6.24 about zero page affect this statement). The good thing about having static buffers is that you don’t need to manage their lifetime, neither explicitly, nor implicitly, as they are already allocated at the start of the program.

This is not good for libraries, as you might have a static buffer in .bss which is never used, but still takes up memory once copy on write of other, used, .bss variables are modified. The thing is better for simple fire-off programs, which starts and terminate quickly (non-interactive programs).

It’s also important to note that libraries should always be designed for work in multi-threaded software as a good design principle, and that static variables and arrays there are not much useful, unless they are all marshalled by the presence of a mutex (which will reduce performance). For this reason, .bss is a bad thing, for libraries, in almost all cases.

For fire-off programs, as I said this is less of a problem, as the buffer might just be used a few couple of times during the life of the program, and if it’s reasonably sized, it might not even impact the whole memory usage of the program (even a single static variable, once changed, will require you to waste a 4KiB memory page, so if you add a 100 bytes variable, that will not change; it will change if you use a 4KiB, or bigger, static buffer).

So sometimes you just have to give up, the static buffer might be increasing the performance of the program, so it just has to stay there. This is why I don’t really fight with .bss too much, the only thing that I don’t think should ever go to .bss are tables: calculating them at runtime is useful only for single task embedded systems, so there has to be a way to opt out from that by using hardcoded tables calculated before or right at build time.

Another good use of .bss is when the memory would just be allocated at the start of the program and then freed at the exit. This is often the case in xine-ui for instance, as there are big structures with all the state information for a certain feature. Those data cannot be shared between instances, and has a life so long that it’s just easier to allocate it all at once, rather than using up heap or stack for them (in xine-ui, especially, some structures were accessed through a .bss pointer, which was set to an allocated memory area once the feature was activated, and freed either when the feature was temporarily not used anymore, or when the program exited; while you don’t always get the stream information, trying to save a few KiBs of memory by using heap memory might not be a good idea if you have to access the data through a pointer, rather than having the structure in .bss and skipping over the pointer).

So for this reason while I’d be happy if we could find ways to avoid using COW sections at all and still be fast, I’m not targetting moving all the numbers to zero, I just want to make sure that there aren’t memory areas where the space is just wasted.

On the other hand, there are cases which show that something in the design of the program might just be way out of the sane league, for instance the 10MB of .bss that is used by the quota support of xfsprogs is tremendously suspicious.

New patches! Just for our Gentoo users – and for developers of other distributions who want to take the patches ;) – I’ve added a few more patches to my overlay: giflib’s patch is now in sync with what upstream applied, moving back a constant to variable as needed, while app-arch/libarchive and sys-apps/file got one patch each to reduce their COW memory usage. Both had good results by applying character arrays in structures.

A few statistics out of cowstats on media-libs libraries

Today is a short day to work, my connection is unstable and unusable, and tonight I’m out with a couple of friends of mine.

So I simply prepared a (complex) oneliner to gather some statistics about media libraries I have installed in my system, out of the static archives (so it’s not the worst case).

qlist -I -v media-libs -C | 
  while read pkg; do 
     pkgname=${pkg##*/}; mkdir -p ${pkgname}; pushd ${pkgname}; 
     qlist $pkg | egrep '.a$' | 
        while read lib; do 
            libname=$(basename $lib); mkdir -p $libname; pushd $libname; 
             ar x ${lib}; 
             ruby -I ~/devel/repos/ruby-elf  ~/devel/repos/ruby-elf/cowstats.rb --statistics --total *.o > ~/mytmpfs/libstats/${pkgname}:${libname}; 

The result of this, grepped around a bit gave me these statistics:

a52dec-0.7.4-r5:liba52.a:  Total 4593 bytes of variables in copy-on-write sections
alsa-lib-1.0.15:libasound.a:  Total 13164 bytes of variables in copy-on-write sections
alsa-lib-1.0.15:smixer-ac97.a:  Total 192 bytes of variables in copy-on-write sections
alsa-lib-1.0.15:smixer-hda.a:  Total 192 bytes of variables in copy-on-write sections
alsa-lib-1.0.15:smixer-python.a:  Total 2144 bytes of variables in copy-on-write sections
alsa-lib-1.0.15:smixer-sbase.a:  Total 104 bytes of variables in copy-on-write sections
alsa-oss-1.0.15:libalsatoss.a:  Total 28 bytes of variables in copy-on-write sections
alsa-oss-1.0.15:libaoss.a:  Total 128 bytes of variables in copy-on-write sections
alsa-oss-1.0.15:libossredir.a:  Total 112 bytes of variables in copy-on-write sections
audiofile-0.2.6-r3:libaudiofile.a:  Total 6408 bytes of variables in copy-on-write sections
faac-1.26-r1:libfaac.a:  Total 9612 bytes of variables in copy-on-write sections
faad2-2.6.1:libfaad.a:  Total 8138 bytes of variables in copy-on-write sections
flac-1.2.1-r2:libFLAC.a:  Total 1044 bytes of variables in copy-on-write sections
fontconfig-2.5.0-r1:libfontconfig.a:  Total 2196 bytes of variables in copy-on-write sections
gd-2.0.35:libgd.a:  Total 144508 bytes of variables in copy-on-write sections
giflib-4.1.6:libgif.a:  Total 1043 bytes of variables in copy-on-write sections
ilmbase-1.0.1:libHalf.a:  Total 1 bytes of variables in copy-on-write sections
ilmbase-1.0.1:libIex.a:  Total 8 bytes of variables in copy-on-write sections
ilmbase-1.0.1:libIlmThread.a:  Total 16 bytes of variables in copy-on-write sections
ilmbase-1.0.1:libImath.a:  Total 311 bytes of variables in copy-on-write sections
imlib-1.9.15-r2:libImlib.a:  Total 28 bytes of variables in copy-on-write sections
imlib2-1.4.0:id3.a:  Total 8 bytes of variables in copy-on-write sections
imlib2-1.4.0:libImlib2.a:  Total 17468 bytes of variables in copy-on-write sections
imlib2-1.4.0:xpm.a:  Total 8 bytes of variables in copy-on-write sections
jasper-1.900.1-r1:libjasper.a:  Total 13703 bytes of variables in copy-on-write sections
lcms-1.17:liblcms.a:  Total 11002 bytes of variables in copy-on-write sections
libao-0.8.8:libalsa09.a:  Total 96 bytes of variables in copy-on-write sections
libao-0.8.8:libao.a:  Total 590 bytes of variables in copy-on-write sections
libao-0.8.8:liboss.a:  Total 72 bytes of variables in copy-on-write sections
libao-0.8.8:libpulse.a:  Total 80 bytes of variables in copy-on-write sections
libart_lgpl-2.3.19-r1:libart_lgpl_2.a:  Total 16 bytes of variables in copy-on-write sections
libcddb-1.3.0-r1:libcddb.a:  Total 2472 bytes of variables in copy-on-write sections
libdvdcss-1.2.9-r1:libdvdcss.a:  Total 2064 bytes of variables in copy-on-write sections
libdvdread-0.9.7:libdvdread.a:  Total 2108 bytes of variables in copy-on-write sections
libexif-0.6.16-r1:libexif.a:  Total 51096 bytes of variables in copy-on-write sections
libgpod-0.6.0:libgpod.a:  Total 20 bytes of variables in copy-on-write sections
libid3tag-0.15.1b:libid3tag.a:  Total 1 bytes of variables in copy-on-write sections
liblrdf-0.4.0:liblrdf.a:  Total 57376 bytes of variables in copy-on-write sections
libmikmod-3.1.11-r5:libmikmod.a:  Total 7798 bytes of variables in copy-on-write sections
libmp4v2-  Total 288 bytes of variables in copy-on-write sections
libsdl-1.2.13:libSDL.a:  Total 49795 bytes of variables in copy-on-write sections
libsndfile-1.0.17-r1:libsndfile.a:  Total 18651 bytes of variables in copy-on-write sections
libsvg-0.1.4:libsvg.a:  Total 552 bytes of variables in copy-on-write sections
libvorbis-1.2.0:libvorbis.a:  Total 4 bytes of variables in copy-on-write sections
netpbm-10.40.0:libnetpbm.a:  Total 10612 bytes of variables in copy-on-write sections
openexr-1.6.1:libIlmImf.a:  Total 1205 bytes of variables in copy-on-write sections
raptor-1.4.16:libraptor.a:  Total 3365 bytes of variables in copy-on-write sections
sdl-gfx-2.0.16:libSDL_gfx.a:  Total 6184 bytes of variables in copy-on-write sections
sdl-image-1.2.6:libSDL_image.a:  Total 67259 bytes of variables in copy-on-write sections
sdl-mixer-1.2.8:libSDL_mixer.a:  Total 495 bytes of variables in copy-on-write sections
sdl-net-1.2.7:libSDL_net.a:  Total 7 bytes of variables in copy-on-write sections
sdl-pango-0.1.2:libSDL_Pango.a:  Total 44 bytes of variables in copy-on-write sections
sdl-ttf-2.0.9:libSDL_ttf.a:  Total 19 bytes of variables in copy-on-write sections
smpeg-0.4.4-r9:libsmpeg.a:  Total 117478 bytes of variables in copy-on-write sections
t1lib-5.1.1:libt1.a:  Total 45191 bytes of variables in copy-on-write sections
tiff-3.8.2-r3:libtiff.a:  Total 7746 bytes of variables in copy-on-write sections
x264-svn-20070924:libx264.a:  Total 16832 bytes of variables in copy-on-write sections
xvid-1.1.3-r2:libxvidcore.a:  Total 209384 bytes of variables in copy-on-write sections

As you can see there’s quite some space for improvement, and remember that these statistics are done on non-PIC objects, the PIC objects will have probably more, within .data.rel sections.

I’ll be writing a few more patches around for this, trying to reduce the numbers as much as I can, even if sometimes it won’t cause much improvement on the actual shared library, either because of relocation of symbols, or just because there are one or two static variables that cause the 4KB for the cow section to be allocated.

My overlay now features patches to mark tables as constant for: libvorbis, libtheora, libpng, giflib, speex, flac, libdca and libmpcdec. I’ll add a few more either tonight when I come home or tomorrow, or anyway in the next weeks.

All the patches were sent to the respective upstream, so that they can be applied there and be available for everybody in the future.

Hopefully some of them might just be applied in Gentoo too, soon :)

Introducing cowstats

No it’s not a script to find statistics on Larry, it’s a tool to get statistics for copy-on-write pages.

I’ve been writing for quite a while about memory usage, RSS memory and other stuff like that on my blog so if you want to get some more in-deep information about it, please just look around. If I start linking here all the posts I’ve made on the topic (okay the last one is not a blog post ;) ) I would probably spend the best part of the night to dig them up (I only linked here the most recent ones on the topic).

Trying to summarise for those who didn’t read my blog for all this time, let’s start with saying that a lot of software, even free software, nowadays wastes memory. When I say waste, I mean it uses memory without a good reason to, I’m not saying that software that uses lots of memory to cache or precalculate stuff and thus be faster is wasting memory, that’s just using memory. I’m not even referring to memory leaks, which are usually just bugs in the code. I’m saying that a lot of software wastes memory when it could save memory without losing performances.

The memory I declare wasted is that memory that could be shared between processes, but it’s not. That’s a waste of memory because you end up using twice or more of the memory for the same goal, which is way sub-optimal. Ben Maurer (a GNOME contributor) wrote a nice script (which is in my overlay if you want; I should finish fixing a couple of things up in the ebuild and commit it to main tree already, the deps are already in main tree) that tells you, for a given process, how much memory is not shared between processes, the so-called “dirty RSS” (RSS stands for Resident Set Size, it’s the resident memory, so the memory that the process is actually using from your RAM).

Dirty RSS is caused by “copy-on-write” pages. What is a page, and what is a copy-on-write pages? Well, memory pages are the unit used to allocate memory to processes (and to threads, and kernel systems, but let’s not go too in deep there); when a process is given a page, it usually also get some permission on that, it might be readable, writable or executable. Trying not to get too in deep on this either (I could easily write a book on this, maybe I should, actually), the important thing is that read-only pages can easily be shared between processes, and can be mapped directly from a file on-disk. This means that two process can use both the same 4KB read-only page, using just 4KB of memory, while if the same content was present in a writable page, the two processes would have their own copy of it, and would require 8KB of memory. Maybe more importantly, if the page is mapped directly from a file on-disk, when the kernel need to make space for new allocated memory, it can just get rid of the page, and then re-load it from the original file, rather than writing it down on the swap file, and then load from that.

To make it easier to load the data from the files on disk, and reduce the memory usage, modern operating systems use copy-on-write. The pages are shared as long as they are not changed from the original; when a process tries to change the content of the page, it’s copied in a new empty, writable page, and the process gets exclusive access to the page, “eating” the memory. This is the reason why using PIC shared objects usually save memory, but that’s another story entirely.

So we should reduce the amount of copy-on-write pages, trying to favour read-only sharable pages. Great, but how? Well, the common way to do so is to make sure that you mark (in C) all the constants as constant, rather than defining them as variables even if you never change their value. Even better, mark them static and constant.

But it’s not so easy to check the whole codebase of a long-developed software to mark everything constant, so there’s the need to analyse the software post-facto and identify what should be worked on. To do so I used objdump (from binutils) up to now, it’s a nice tool to have raw information about ELF files, it’s not easy, but I grew used to it so I can easily grok its output.

Focusing on ELF files, which are the executable and library files in Linux, FreeBSD and Solaris (plus other Unixes), the copy-on-write pages are those belonging, mostly, to these sections: .data, .data.rel and .bss (actually, there are more sections, like .data.local and, but let’s just consider those prefixes for now).

.data section keeps the non-stack variables (which means anything declared as static but non-constant in C source) which were initialised in the source. This is probably the cause of most waste of memory: you define a static array in C, you don’t mark it constant properly (see this for string arrays), but you never touch it after definition.

.data.rel section keeps the non-stack variables that need to be relocated at runtime. For instance it might be a static structure with a string, or a pointer to another structure or an array. Often you can’t get rid of relocations, but they have a cost in term of CPU time used, and also a cost in memory usage, as the relocation will trigger for sure the copy-on-write… unless you use prelink, but as you’ll read on that link, it’s not always a complete solution. You usually can live with these, but if you can get rid of instances here, it’s a good thing.

.bss section keeps the uninitalised non-stack variables, for instance if you declare and define a static array, but don’t fill it at once, it will be added to the .bss section. That section is mapped on the zero page (a page entirely initialised to zero, as the name suggests), with a copy-on-write: as soon as you write to the variable, a new page is allocated, and thus memory is used. Usually, runtime-initialised tables falls into this section. It’s often possible to replace them (maybe optionally) with precalculated tables, saving memory at runtime.

My cowstats script analyse a series of object files (tomorrow I’ll work on an ar parser so that it can be ran on static libraries; unfortunately it’s not possible to run it on executables or shared libraries as they tend to hide the static symbols, which are the main cause of wasted memory), looks for the symbols present in those sections, and lists them to you, or in alternative it shows you some statistics (a simple table that tells you how many bytes are used in the three sections for the various object files it was called with). This way you can easily see what variables are causing copy-on-write pages to be requested, so that you can try to change them (or the code) to avoid wasting memory.

I wrote this script because Mike asked me if I had an automated way to identify which variables to work on, after a long series of patches (many of which I have to fix and re-submit) for FFmpeg to reduce the memory usage. It’s now available at as it’s simply a Ruby script using my ELF parser for ruby started last May. It’s nice to see that something I did some time ago for a completely different reason now comes useful again ;)

I mailed the results on my current partly-patched libavcodec, they are quite scary, it’s over 1MB of copy-on-write pages. I’ll continue working so that the numbers will come near to zero. Tomorrow I’ll also try to run the script on xine-lib’s objects, as well as xine-ui. It should be interesting.

Just as a test, I also tried running the script over libvorbis.a (extracting the files manually, as for now I have no way to access those archives through Ruby), and here are the results:

cowstats.rb: lookup.o: no .symtab section found
File name  | .data size | .bss size  | .data.rel.* size
psy.o             22848            0            0
window.o          32640            0            0
floor1.o              0            8            0
analysis.o            4            0            0
registry.o           48            0            0
    55540 bytes of writable variables.
    8 bytes of non-initialised variables.
    0 bytes of variables needing runtime relocation.
  Total 55548 bytes of variables in copy-on-write sections

(The warning tells me that the lookup.o file has no symbols defined at all; the reason for this is that the file is under one big #ifdef; the binutils tools might be improved to avoid packing those files at all, as they can’t be used for anything, bearing no symbol… although it might be that they still can carry .init sections, I admit my ignorance here).

Now, considering the focus of libvorbis (only Vorbis decoding), it’s scary to see that there are almost 55KB of memory in writable pages; especially since, looking down to it, I found that they are due to a few tables which are never modified but are not marked as constant.

The encoding library libvorbisenc is even worse:

File name   | .data size | .bss size  | .data.rel.* size
vorbisenc.o      1720896            0            0
    1720896 bytes of writable variables.
    0 bytes of non-initialised variables.
    0 bytes of variables needing runtime relocation.
  Total 1720896 bytes of variables in copy-on-write sections

Yes that’s about 1.7 MB of writable pages brought in by libvorbisenc per every process which uses it. And I’m unfortunately going to tell you that any xine frontend (Amarok included) might load libvorbisenc, as libavcodec has a vorbis encoder which uses libvorbisenc. Yes it’s not nice at all!

Tomorrow I’ll see to prepare a patch for libvorbis (at least) and see if Xiph will not ignore me at least this time. Once the script will be able to act on static libraries, I might just run it on all the ones I have on my system and identify the ones that really need to be worked on. This of course will have not to hinder my current jobs (I’m considering this in-deep look at memory usage part of my job as I’m probably going to need it in a course I have to teach next month), as I really need money, especially to get a newer box before end of the year, Enterprise is getting slow.

Mike, I hope you’re reading this blog, I tried to explain the thing I’ve been doing in the best way possible :)