Postmortem of a patch, or how do you find what changed?

Two days ago, Luca asked me to help him figure out what’s going on with a patch for libav which he knew to be the right thing, but was acting up in a fashion he didn’t understand: on his computer, it increased the size of the final shared object by 80KiB — while this number is certainly not outlandish for a library such as libavcodec, it does seem odd at a first glance that a patch removing source code is increasing the final size of the executable code.

My first wild guess which (spoiler alert) turned out to be right, was that removing branches out of the functions let GCC optimize the function further and decide to inline it. But how to actually be sure? It’s time to get the right tools for the job: dev-ruby/ruby-elf, dev-util/dwarves and sys-devel/binutils enter the battlefield.

We’ve built libav with and without the patch on my server, and then rbelf-size told us more or less the same story:

% rbelf-size --diff libav-{pre,post}/avconv
        exec         data       rodata        relro          bss     overhead    allocated   filename
     6286266       170112      2093445       138872      5741920       105740     14536355   libav-pre/avconv
      +19456           +0         -592           +0           +0           +0       +18864 

Yes there’s a bug in the command, I noticed. So there is a total increase of around 20KiB, where is it split? Given this is a build that includes debug info, it’s easy to find it through codiff:

% codiff -f libav-{pre,post}/avconv

  avg_no_rnd_pixels8_9_c    | -163
  avg_no_rnd_pixels8_10_c   | -163
  avg_no_rnd_pixels8_8_c    | -158
  avg_h264_qpel16_mc03_10_c | +4338
  avg_h264_qpel16_mc01_10_c | +4336
  avg_h264_qpel16_mc11_10_c | +4330
  avg_h264_qpel16_mc31_10_c | +4330
  ff_dsputil_init           | +4390
 8 functions changed, 21724 bytes added, 484 bytes removed, diff: +21240


If you wonder why it’s adding more code than we expected, it’s because there are other places where functions have been deleted by the patch, causing some reductions in other places. Now we know that the three functions that the patch deleted did remove some code, but five other functions added 4KiB each. It’s time to find out why.

A common way to do this is to generate the assembly file (which GCC usually does not represent explicitly) to compare the two — due to the size of the dsputil translation unit, this turned out to be completely pointless — just the changes in the jump labels cause the whole file to be rewritten. So we rely instead on objdump, which allows us to get a full disassembly of the executable section of the object file:

% objdump -d libav-pre/libavcodec/dsputil.o > dsputil-pre.s
% objdump -d libav-post/libavcodec/dsputil.o > dsputil-post.s
% diff -u dsputil-{pre,post}.s | diffstat
 unknown |245013 ++++++++++++++++++++++++++++++++--------------------------------
 1 file changed, 125163 insertions(+), 119850 deletions(-)

As you can see, trying a diff between these two files is going to be pointless, first of all because of the size of the disassembled files, and secondarily because each instruction has its address-offset prefixed, which means that every single line will be different. So what to do? Well, first of all it’s useful to just isolate one of the functions so that we reduce the scope of the changes to check — I found out that there is a nice way to do so, and it involves relying on the way the function is declared in the file:

% fgrep -A3 avg_h264_qpel16_mc03_10_c dsputil-pre.s
00000000000430f0 :
   430f0:       41 54                   push   %r12
   430f2:       49 89 fc                mov    %rdi,%r12
   430f5:       55                      push   %rbp

While it takes a while to come up with the correct syntax, it’s a simple sed command that can get you the data you need:

% sed -n -e '/ dsputil-func-pre.s
% sed -n -e '/ dsputil-func-post.s
% diff -u dsputil-func-{pre,post}.s | diffstat
 dsputil-func-post.s | 1430 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 1376 insertions(+), 54 deletions(-)

Okay that’s much better — but it’s still a lot of code to sift through, can’t we reduce it further? Well, actually… yes. My original guess was that some function was inlined; so let’s check for that. If a function is not inlined, it has to be called, the instruction for which, in this context, is callq. So let’s check if there are changes in the calls that happen:

% diff -u =(fgrep callq dsputil-func-pre.s) =(fgrep callq dsputil-func-post.s)
--- /tmp/zsh-flamehIkyD2        2013-01-24 05:53:33.880785706 -0800
+++ /tmp/zsh-flamebZp6ts        2013-01-24 05:53:33.883785509 -0800
@@ -1,7 +1,6 @@
-       e8 fc 71 fc ff          callq  a390 
-       e8 e5 71 fc ff          callq  a390 
-       e8 c6 71 fc ff          callq  a390 
-       e8 a7 71 fc ff          callq  a390 
-       e8 cd 40 fc ff          callq  72e0 
-       e8 a3 40 fc ff          callq  72e0 
-       e8 00 00 00 00          callq  43261 
+       e8 00 00 00 00          callq  8e670 
+       e8 71 bc f7 ff          callq  a390 
+       e8 52 bc f7 ff          callq  a390 
+       e8 33 bc f7 ff          callq  a390 
+       e8 14 bc f7 ff          callq  a390 
+       e8 00 00 00 00          callq  8f8d3 

Yes, I do use zsh — on the other hand, now that I look at the code above I note that there’s a bug: it does not respect $TMPDIR as it should have used /tmp/.private/flame as base path, dang!

So the quick check shows that avg_pixels8_l2_10 is no longer called — but does that account for the whole size? Let’s see if it changed:

% nm -S libav-{pre,post}/libavcodec/dsputil.o | fgrep avg_pixels8_l2_10
00000000000072e0 0000000000000112 t avg_pixels8_l2_10
00000000000072e0 0000000000000112 t avg_pixels8_l2_10

The size is the same and it’s 274 bytes. The increase is 4330 bytes, which is around 15 times more than the size of the single function — what does that mean then? Well, a quick look around shows this piece of code:

        41 b9 20 00 00 00       mov    $0x20,%r9d
        41 b8 20 00 00 00       mov    $0x20,%r8d
        89 d9                   mov    %ebx,%ecx
        4c 89 e7                mov    %r12,%rdi
        c7 04 24 10 00 00 00    movl   $0x10,(%rsp)
        e8 cd 40 fc ff          callq  72e0 
        48 8d b4 24 80 00 00    lea    0x80(%rsp),%rsi
        49 8d 7c 24 10          lea    0x10(%r12),%rdi
        41 b9 20 00 00 00       mov    $0x20,%r9d
        41 b8 20 00 00 00       mov    $0x20,%r8d
        89 d9                   mov    %ebx,%ecx
        48 89 ea                mov    %rbp,%rdx
        c7 04 24 10 00 00 00    movl   $0x10,(%rsp)
        e8 a3 40 fc ff          callq  72e0 
        48 8b 84 24 b8 04 00    mov    0x4b8(%rsp),%rax
        64 48 33 04 25 28 00    xor    %fs:0x28,%rax
        00 00 
        75 0c                   jne    4325c 

This is just a fragment but you can see that there are two calls to the function, followed by a pair of xor and jne instructions — which is the basic harness of a loop. Which means the function gets called multiple times. Knowing that this function is involved in 10-bit processing, it becomes likely that the function gets called twice per bit, or something along those lines — remove the call overhead (as the function is inlined) and you can see how twenty copies of that small function per caller account for the 4KiB.

So my guess was right, but incomplete: GCC not only inlined the function, but it also unrolled the loop, probably doing constant propagation in the process.

Is this it? Almost — the next step was to get some benchmark data when using the code, which was mostly Luca’s work (and I have next to no info on how he did that, to be entirely honest); the results on my server has been inconclusive, as the 2% loss that he originally registered was gone in further testing and would, anyway, be vastly within margin of error of a non-dedicated system — no we weren’t using full-blown profiling tools for that.

While we don’t have any sound numbers about it, what we’re worried about is for cache-starved architectures, such as Intel Atom, where the unrolling and inlining can easily cause performance loss, rather than gain — which is why all us developers facepalm in front of people using -funroll-all-loops and similar. I guess we’ll have to find an Atom system to do this kind of runs on…

The importance of opaque types

I sincerely don’t remember whether I already discussed about this before or not; if I didn’t, I’ll try to explain here. When developing in C, C++ and other languages that support some kind of objects as type, you usually have two choices for a composited type: transparent or opaque. If the code using the type is able to see the content of the type, it’s a transparent type, if it cannot, it’s an opaque type.

There are though different grades of transparent and opaque types depending on the language and the way they would get implemented; to simplify the topic for this post, I’ll limit myself to the C language (not the C++ language, be warned) and comment about the practises connected to opaque types.

In C, an opaque type is a structure whose content is unknown; this usually is declared in ways such as the following code, in a header:

struct MyOpaqueType;
typedef struct MyOpaqueType MyOpaquetype;

At that point, the code including the header will have some limitations compared to transparent types; not knowing the object size, you cannot declare objects with that type directly, but you can only deal with pointers, which also means you cannot dereference them or allocate new objects. For this reason, you need to provide functions to access and handle the type itself, including allocation and deallocation of them, and these functions cannot simply be inline functions since they would need to access the content of the type to work.

All in all you can see that the use of opaque types tend to be a big hit for what concerns performance; instead of a direct memory dereference you need always to pass through a function call (note that this seems the same as accessor functions in C++, but those are usually inline functions that will be replaced at compile-time with the dereference anyway); and you might even have to pass through the PLT (Procedure Linking Table) which means further complication to get to the type.

So why should you ever use opaque types? Well they are very useful when you need to export the interface of a library: since you don’t know either the size or the internal ordering of an opaque type, the library can change the opaque type without changing ABI, and thus requiring a rebuild of the software using it. Repeat with me: changing the size of a transparent type, or the order of its content, will break ABI.

And this gets also particularly important when you’d like to reorder some structures, so that you can remove padding holes (with tools like pahole from the dwarves package, see this as well if you want to understand what I mean). For this reason, sometimes you might prefer having slower, opaque types in the interface, instead of faster but riskier transparent types.

Another place where opaque types are definitely helpful is when designing a plugin interface especially for software that was never designed as a library and has, thus, had an in-flux API. Which is one more reason why I don’t think feng is ready for plugins just yet.

Security considerations: scanning for bundled libraries

My fight against bundled libraries might soon transcend the implementation limits of my ruby-elf script .

The script I’ve been using to find the bundled libraries up to now was not designed with that in mind originally; the idea was to identify colliding symbols between different object files, so to identify failure cases like xine’s aac decoder hopefully before they become a nuisance to users like PHP. Unfortunately the amount of data generated by the script due to bundled libraries makes it tremendously difficult to deal with in advance, so it can currently only be used as a post-mortem.

But as a security tool, I already stated it’s not enough because it only checks for symbols that are exported by shared objects (and often mistakenly by executables). To actually go deeper, one would have to look at one of two options: the .symtab entries in the ELF files (which are stripped out before installing), or the data that the compiler emits for each output file in form of DWARF sections with -g flags. The former can be done with the same tools I’ve been using up to now, the latter you list with pfunct from dev-util/dwarves. Trust me, though, that if the current database of optimistically suppressed symbols is difficult to deal with, doing a search using dwarf functions is likely to be unmanageable, at least to handle with the same algorithm that I’m using for the collision detection script.

Being able to identify bundled libraries in that kind of output is going to be tremendously tricky; if my collision detection script already finds collision between executables like the one from MySQL (well, before Jorge’s fixes at least) and Samba packages, because they don’t use internally shared libraries, running it against the internal symbols list is going to be even worse because it would then find equally-named internal functions (usage() anybody), static libraries links (including system support libraries) and so on.

So there are little hopes to tackle the issue this way; which makes the idea of finding beforehand all the bundled libraries in a system an inhuman task; on the other hand that doesn’t mean I have to give up on the idea. We can still make use of that data to do some kind of post-mortem, once again, with some tricks though. When it comes to vulnerabilities, you usually have a function, or a series of function, that are involved; depending on the centrality of the functions in a library, there will be more or less applications using that vulnerable codepath; while it’s not extremely difficult to track them down when the function is a direct API (just look for software having external references to that symbol), it’s quite another story with internal convenience functions, since they are called indirectly. For this reason while some advisories do report the problematic symbols, most of the time the thing is just ignored.

We can, though, use that particular piece of information to track down extra vulnerable software, that bundles the code. I’ve been doing that on request for Robert a couple of times with the data produced by the collision detection scripts, but unfortunately it doesn’t help because it also is only able to check externally-defined API, just like a search for use would. How to solve the problem? Well, I could just not strip the files and just read the data from .symtab to see whether the function is defined, and this might actually be what I’m going to do soonish; unfortunately this creates a couple of issues that needs to be taken care of.

The first is that the debug data is not exactly small, the second is that the chroots volume is under RAID1 so the space is a concern; it’s already 100GB big and with just 10% of it free, if I am not to strip data, it’s going to require even more space; I can probably just split out some of the data of the volume in a chroots-throwable volume that I don’t have to keep on RAID1. If I split the debug data with the splitdebug feature, it would make it quite easy to deal with.

Unfortunately this brings me to the second problem, or rather the second set of problems: ruby-elf does not currently support the debuglink facilities, but that’s easy to implemente, after all it’s just a note section with the name of the debug file, the second is nastier and relates to the fact that the debuglink section created by portage lists the basename of the file with the debug information, which is basically the same name as the original with a .debug suffix. The reason why this is not just left to be intended is that if you look up the debuglink for libfoo.so you’ll see the real name might be libfoo.so.2.4.6.debug; on the other hand it’s far from trivial since it leaves something to be intended: the path to find the file into. By default all tools will be looking at the same path as the executable file, and prepend /usr/lib/debug to that. All well as long as there are no symlinks in the path, but if there are (like on multilib AMD64 systems), it starts to be a problem: accessing a shared object via /usr/lib/libfoo.so will try a read of /usr/lib/debug/usr/lib/libfoo.so.2.4.6.debug which will not exist (it would be /usr/lib/debug/usr/lib64/libfoo.so.2.4.6.debug). I have to track down and check if it’s feasible to use full canonicalised path for the debuglink; on the other hand that will assume that the root for the file is the same as the root of the system, which might not be the case. The third option is to use a debugroot-relative path, so that debuglink would look like usr/lib64/libfoo.so.2.4.6.debug; unfortunately I have no clue how gdb and company would take a debuglink like that, and I have to check it).

Problem does not stop here though; since packages collide one with the other when they try to install files with the same name (even when they are not really alternatives), I cannot rely to have all the packages installed in the tinderbox, which is actually making it even worse to analyse the symbol collisions dataset. So I should at least scan the data before merge on livefs is done, and load it in a database, indexed on a per-package per-slot basis, and then select the search that data to identify the problems. Not an easy or a quick solution.

Nor a complete one to be honest: the .symtab method will not show the symbols that are not emitted, like inlined functions; while we do want the unused symbols to be cut out, we still need static inlined functions names, since if a vulnerability is found there, it has to be found. I should check whether DWARF data is emitted for that at least but I wouldn’t be surprised if it wasn’t either. And also does not cope at all with renamed symbols, or copied code… So, still a long way before we actually can reassure users that all security issues are tackled down when found (and this does not limit to Gentoo, remember; Gentoo is the base tool I use to tackle the task, but the same problems involve basically every distribution out there).

Using dwarves for binaries’ data mining

I’ve wrote a lot about my linking collisions script that also shown the presence of internal copies of libraries in binaries. It might not be understood that this is just a side-effect, and that the primary scope of my script is not to find the included libraries, but rather to find possible collisions between two software with similar symbols and no link between them. This is what I found in Ghostscript bug #689698 and poppler bug #14451 . Those are really bad things to happen and that was my first reason for writing the script.

One reason why this script cannot be used with discovery of internal copies of libraries as main function is that it will not find internal copies of libraries if they have hidden visibility, which is a prerequisite for properly importing an internal copy of whatever library (skipping over the fact that is not a good idea to do such an import).

To find internal copies of libraries, the correct thing to do is to build all packages with almost full debug information (so -ggdb), and use the dwarf data in them to find the definition of functions. These definitions won’t disappear with hidden visibility so they can be relied upon.

Unfortunately parsing DWARF data is a very complex matter, and I doubt I’ll ever add DWARF parsing support to ruby-elf, not unless I can find someone else to work with me on it. But there is already a toolset that you can use for this: dwarves (dev-util/dwarves). I haven’t written an harvesting and analysis tool yet, and I’m just wasting a lot of CPU cycles to scan all the ELF files for single functions, at the moment, but I’ll soon write something for that.

The pfunct tool in dwarves allows you to find a particular function in a binary file, I ran pfunct over all the ELF files in my system, looking for two functions up to now: adler32 and png_free. The first is a common function from zlib, the latter is, obviously, a common function from libpng. Interestingly enough, I found two more packages that use an internal copy of zlib (one of which is included in an internal copy of libpng): rsync and doxygen.

It’s interesting to see how a base system package like rsync is suffering from this problem. It means that it’s not just uncommon libraries to be bundled by remotely used programs, but also widely known and accepted software to include omnipresent libraries like zlib.

I’m now looking for internal copies of popt, which I’ve also seen imported more than a couple of time by software (cough distcc cough), and is a dependency of system packages already. The problem is that dwarf parsing is slow and takes time for pfunct to scan all the system. That’s why I should use another harvest and an analyse script.

Oh well, more analysis for the future 🙂 And eliasp, when I’ve got this script done, then I’ll likely accept your offer 😉

On adding new tools

Please excuse the mistakes I’m most likely going to make while writing this entry, it’s 9 AM and I didn’t sleep a single minute tonight.

Now that I’m back on track as a Gentoo developer, I decided it’s a good time to start moving out a few things that are currently on my overlay to join the big public Portage tree. Why? Well, there are a few tools that might come handy to a few people, like pahole (now dwarves), vbindiff and picocom. I used to use these tools for either my last paid job or for xine-lib development, they are mostly development tools (beside picocom that is an even lighter serial terminal emulator than minicom, quite nice to integrate with Konsole directly without using Serielle, if you want).

So this morning I decided to go with the first one: Arnaldo released last month the 1.0 version of the dwarves package (where pahole is), that is now committed in main tree as dev-util/dwarves; if you don’t know pahole, it’s a tool to inspect the effective representation of structures in a compiled program, highlighting the padding holes, so that they can be filled in by reordering the fields.

Probably later I’ll look to merge in vbindiff or picocom; there are also a few patched ebuilds that needs to be merged, thanks to mcummings (and cartman from Pardus) at least dev-perl/Compress-Zlib-Raw is done 🙂 And thanks to Luca, also libtheora is patched not to cause memory corruption and crashes in xine-lib with some FSF videos.

Guys I hate the heat of this season!

Edit: I didn’t notice it before, picocom is already in the tree, Vapier added it about around the time I added it to my overlay, oh well, one less package to take care of then 🙂

The first improvement in xine with Mercurial

So, after xine finally moved to Mercurial for xine-lib management, I’ve decided to start working on those things that required me to branch out, at least on some of them that is; the first one I was able to tackle down was ffmpeg_integration, that now works fine beside the dist target.

And then I moved working on the structures, applying pahole to all the structures in libxine, even those comprising the public ABI of the library, as I could just break the ABI when needed, rather than limiting to the local structures of the plugins. Some of these changes applied to structures that are not part of the public ABI, so I ended up merging them to the main repository already, and will be present in 1.1.5, even if they are mostly bytes-size changes, that nobody beside me should care about.

But then tonight I gone looking for the 32 buttons that are/were present as an inline array in one of the video overlay structures; I was going to change the inline array with a dynamic array or with an array of pointers, so that the memory was going to be used only when actually used..

It was a sour surprise to find out that the array was never used at all in the code, and it wasn’t used on frontends either, and by removing it, the size of the structure in which it was dropped from 86KB to 40 bytes.. and then the video_overlay_s structure dropped from over 4MB to about 3KB… finally with it removed, there was also a 10MB of memory usage cut down during xine-lib runtime playback, 13 of the memory usage when playing an mp3 file:

I’m sorry, this used to have images of massif graphs for xine before and after the change, but unfortunately they got lost.

For starters, this does seem quite good, don’t you think? 🙂

Pahole and xine-lib

So, I’ve taken two days totally off from almost any kind of communication, I tried to relax, and now I feel a bit better. I still do think I haven’t been able to do much good beside my work on Gentoo, but I’m not ready yet to give up, even if it’s hard to continue, I will continue. At least for a while.

Unfortunately my current UPS setup is not going to fly, X-Drum in #gentoo-it informed me last night of the incompatibility between PSUs with active PFC and UPSes with stepped sinusoidal wave.. and the new PSU I bought two months ago has active PFC. The result is that I need a new UPS, of the Smart UPS series from APC, which will cost me €420, sigh.

Talking about the topic in subject, two days ago I analysed most of the xine-lib plugins with pahole (with a patch to fix an integer overflow in the offset counter (the author wasn’t expecting structures bigger than 64KiB, but this in xine-lib is not rare), and I do have at least one good news: FFmpeg decoding of non-mpeg video streams was taking 1MB of memory for a libmpeg2 buffer that was not going to be used; I’ve now fixed this so that the structure is only allocated and initialised when needed, so decoding will take 1MB of memory less than before, on next xine-lib release.

Unfortunately, I’ve found similar mistakes in design in other structures, most of which are public, so part of the libxine ABI, and thus I can’t fix them in the 1.1 series, not unless there are good reasons and good results to achieve by breaking it. But, next week we’re going to move to Mercurial, thanks to Darren Salt and Reinhard Tartler who are helping with the migration (for who’s wondering about hosting, it will likely be on Alioth, if they accept us), so I can branch out and fix the stuff.. and then merge back in either 1.1 or 1.2 as we feel needed.

One of the structures I surely will be refactoring is the video overlay structure.. that has a size over 4MiB as it is, which explains why the function to initialise video overlay consumes 5MiB of memory right after xine initialisation, even when playing a sound file. By instancing the structures only when really needed, and making sure that there aren’t holes around, it should be possible to reduce drastically the memory used up by xine.

Another thing in my TODO list is, as I said already, rewriting the plugins cache code. I will also try to provide a simple way to regenerate this cache globally, so that for instance it can be installed directly by the ebuild, without asking users to regenerate the cache by theirselves, and sharing it in memory (through mmap) between users too.

To help this, I’ll also see to change the way the plugins are handled, by using where possible inline arrays for the names and description of the plugins, rather than pointers, allowing to share the structures in memory, where this does not waste too many bytes.

Anyway, I still need to relax a bit more because I can’t really rest lately, and I do need some rest if I am to carry on.

Huge structures

Today I was finally able to talk with Vir (Matthias Kretz, the Phonon guy), and I got a partially good news, but also a pretty bad news.

The good thing is that there will be a notification daemon in KDE4, that will use Phonon, so that not every application that requires notification will have to load libxine (and all its damned plugins); it might be loaded on the fly on a file dialog to preview a multimedia file though, which means it’s still going to be used by more than just a couple of applications at a time.

So the result is that we are in the need to branch xine-lib to make more consistent changes to the codebase for a 1.2 series, like for instance a shared cache for the plugins, created – if at all possible – during make install phase, as well as configuration options caching. It’s a massive amount of work that has to go in that direction, so we can make more data in the plugins constant, reducing the chances of having to make the RSS pages dirty, so reducing the overall size of xine-lib’s memory occupation.

Other changes includes the ability to choose the demuxer through a shared mime type, rather than by extension, which would allow not to have to load all the demuxers one by one when we’re passing through them, and a way for the plugin to tell the library if it supports ways to check what it can demux other than by content (some plugins fall back to a by-content detection when they don’t support an extension decoding); plus there is the always-improvable support for seeking, right now there is only one choice: an input plugin can seek or cannot, simple as that, while there can be difference grades of seeking: no seeking at all (sequential), slow seeking (you can seek, but you shouldn’t just jump around to find what the format is, an example of this is HTTP streaming), usual seeking (which is what most plugins already implement), and time seeking (MMS and RTSP support this).

RTSP support is also quite lacking, and it could certainly get improved by using libnemesi (Luca knows I tried already but as it is the structure of xine is not good enough); and another external library we might as well use is libavformat from FFmpeg: right now we only support libavcodec and, to some extents, libpostproc from FFmpeg project, while we could easily make use of libavformat as an extra demuxer plugin, as that can certainly support a lot of formats without us needing to take care of them one by one.

Unfortunately, I don’t think we’ll be able to provide much of this with a 1.2 series right away, especially since we don’t have many developers around at the moment, but that can be hoped.

Now, to return on the title of my blog, I’ve decided to play a bit with pahole, a nice software coming from Arnaldo Carvalho de Melo, a kernel developer, that is able to analyse the dwarf data of binaries (them being final executeables, libraries or kernels) to identify the structures used internally, and their content, to find their size and the eventual padding holes that would allow to reduce their size.

Well, a simple run of it, shown that there is at least one structure video_overlay_object_t (which is also used in video_ovelray_event_t) that takes more than 80KB of space. I’m not sure how many of these structures are loaded, but even a single one is a lot of space used, and I’m quite sure most of it is wasted; as far as I can see, it is used for the overlaid buttons in DVD menus (and other kind of menus I suppose, although I don’t think we currently support Matroska’s menus, although I’m not sure of it); even if that is the case, I don’t really see often DVD menus with 32 buttons, and as every button take more than 2KB, it should have probably be improved by using a button list rather than a static array, allocating the buttons only when effectively needed.

As I said there is a lot of work to do, especially when it comes to memory, but hopefully as soon as we have a new DVCS the work might become easier. Of course switching to a distributed mode will require some adjustment time, as we all need to learn Mercurial (well beside Darren that is), and there will be a lot to do to let others involved, but for instance then the VDR patches from Reinhard Nissl would just become a different branch of xine-lib, that could easily be merged from time to time (today I merged two more patches from him, and I’m sure more will come in the future).

I suppose right now the main question is: where will we host the sources? I suppose the best thing would be some hosting provider on this side of the Atlantic to avoid USA patents, but it’s still something we haven’t taken a sure decision, one idea was to ask on Alioth but it’s all up to be seen.

If the switch goes well, I could also see us moving away from SF.net’s tracker system, that is pretty much unusable, but there isn’t much choice about it, I don’t intend using Mantis again in my life (I used it for NoX-Wizard, do you remember Fabrizio?); Scarab looks interesting, but I’m not sure where we could find an hosting provider with JSP support for no price (or a cheap one).

Oh well, time to go to sleep soon now. If you’re interested in pahole, you can find an ebuild for it in my overlay (I know, gitarella is having trouble, I’m afraid that the Ruby update screwed up both gitarella and ServoFlame, so my services are down until I find time (and will) to fix them.)