Cleaning up after yourself

I have noted in my post about debug information that in feng I’m using a debug codepath to help me reduce false positives in valgrind. When I wrote that I looked up an older post, that promised to explain but never explained. An year afterwards, I guess it’s time for me to explain, and later possibly document this on the lscube documentation that I’ve been trying to maintain to document the whole architecture.

The problem: valgrind is an important piece of equipment in the toolbox of a software developer; but as any other tool, it’s also a tool; in the sense that it’ll blindly report the facts, without caring about the intentions the person writing the code had at the time. Forgetting this leads to situation like Debian SA 1571 (the OpenSSL debacle), where an “unitialised value” warning was intended like the wrong thing, where it was actually pretty much intended. At any rate, the problem here is slightly different of course.

One of the most important reasons to be using valgrind is to find memory leak: memory areas that are allocated but never freed properly. This kind of errors can make software either unreliable or unusable in production, thus testing for memory leak for most seasoned developers is among the primary concerns. Unfortunately, as I said, valgrind doesn’t understand the intentions of the developers, and in this context, it cannot discern between memory that leaks (or rather, that is “still reachable” when the program terminates) and memory that is being used until the program stops. Indeed, since the kernel will free all the memory allocated to the process when it ends, it’s a common doing to simply leave it to the kernel to deallocate those structures that are important until the end of the program, such as configuration structures.

But the problem with leaving these structures around is that you either have to ignore the “still reachable” values (which might actually show some real leaks), or you receive a number of false positive introduced by this practice. To remove the false positive, is not too uncommon to free the remaining memory areas before exiting, something like this:

extern myconf *conf;

int main() {
  conf = loadmyconf();
  process();
  freemyconf(conf);
}

The problem with having code written this way is that even just the calls to free up the resources will cause some overhead, and especially for small fire-and-forget programs, those simple calls can become a nuisance. Depending on the kind of data structures to free, they can actually take quite a bit of time to orderly unwind it. A common alternative solution is to guard the resource-free calls with a debug conditional, of the kind I have written in the other post. Such a solution usually ends up being #ifndef NDEBUG, so that the same macro can get rid of the assertions and the resource-free calls.

This works out quite decently when you have a simple, top-down straight software, but it doesn’t work so well when you have a complex (or chaotic as you prefer) architecture like feng does. In feng, we have a number of structures that are only used by a range of functions, which are themselves constrained within a translation unit. They are, naturally, variables that you’d consider static to the unit (or even static to the function, from time to time, but that’s just a matter of visibility to the compiler, function or unit static does not change a thing). Unfortunately, to make them static to the unit you need an externally-visible function to properly free them up. While that is not excessively bad, it’s still going to require quite a bit of work to jump between the units, just to get some cleaner debug information.

My solution in feng is something I find much cleaner, even though I know some people might well disagree with me. To perform the orderly cleanup of the remaining data structures, rather than having uninit or finalize functions called at the end of main() (which will then require me to properly handle errors in sub-procedures so that they would end up calling the finalisation from main()!), I rely on the presence of the destructor attribute in the compiler. Actually, I check if the compiler supports this not-too-uncommon feature with autoconf, and if it does, and the user required a debug build, I enable the “cleanup destructors”.

Cleanup destructors are simple unit-static functions that are declared with the destructor attribute; the compiler will set them up to be called as part of the _fini code, when the process is cleared up, and that includes both orderly return from main() and exit() or abort(), which is just what I was looking for. Since the function is already within the translation unit, the variables don’t even need to be exported (and that helps the compiler, especially for the case when they are only used within a single function, or at least I sure hope so).

In one case the #ifdef conditional actually switches a variable from being stack-based to be static on the unit (which changes quite a bit the final source code of the project), since the reference to the head of the array for the listening socket is only needed when iterating through them to set them up, or when freeing them; if we don’t free them (non-debug build) we don’t even need to save it.

Anyway, where is the code? Here it is:

dnl for configure.ac

CC_ATTRIBUTE_DESTRUCTOR

AH_BOTTOM([#if !defined(NDEBUG) && defined(SUPPORT_ATTRIBUTE_DESTRUCTOR)
           # define CLEANUP_DESTRUCTOR __attribute__((__destructor__))
           #endif
          ])

(the CC_ATTRIBUTE_DESTRUCTOR macro is part of my personal series of additional macros to check compiler features, including attributes and flags).

And one example of code:

#ifdef CLEANUP_DESTRUCTOR
static void CLEANUP_DESTRUCTOR accesslog_uninit()
{
    size_t i;

    if ( feng_srv.config_storage )
        for(i = 0; i < feng_srv.config_context->used; i++)
            if ( !feng_srv.config_storage[i].access_log_syslog &&
                 feng_srv.config_storage[i].access_log_fp != NULL )
                fclose(feng_srv.config_storage[i].access_log_fp);
}
#endif

You can find the rest of the code over to the LScube GitHub repository — have fun!

(Val)grinding your code

This article was originally published on the Axant Technical Blog.

One of of the most exoteric tool when working on software projects that developers should be using more often than they do, is certainly Valgrind. It’s a multi-faced tool: timing profiler, memory profiler, race conditions detector, … it can do all this because it’s mainly a simulated CPU, which executes the code, keeping track of what happens, and replacing some basic functions like the allocation routines.

So this works pretty well, to a point; unfortunately it does not always work out that well because there are a few minor issues: it needs to know about all the instructions used in the programs (otherwise it raises SIGILL crashes), and right now it does not know about SSE4 on x86 and x86-64 processors. And it has to know about the places in the C library where it cannot correctly keep scores via a series of suppression files. Turns out it currently fails at properly knowing about the changes in glibc 2.10, which makes it pretty difficult to find the real issues right in this moment with the most bleeding edge tools available.

But beside these problems with the modern tools and hardware, there are other interesting edges; one of these, the one I’m going to write about today, relate to the way the memory checker tool identifies leaks in the software: all the memory areas that cannot be reached in some way at the end of the execution, are considered leaked. Unfortunately this often enough counts in the memory areas that are allocated at one point and would just be freed at the end of the process, which can be said not to be a leak at all.

All the memory areas that are only to be freed at the end of a process don’t really need to be freed at all: the operating system will take care of that; leaks are those where the memory is used up and never freed nor used during execution; especially bad are those leaks that happen inside either iterative cycles or upon similar inputs, because they build up.

Actually freeing up the resources that does not need to be freed up can increase the size of the executable section of a binary (executable or library) for no good reason during production usage of the software, it is thus often considered a bad idea. At the same time, freeing them up makes it easier to find the actual bad leaks in the software. My choice for this is to write the actual source code to do the freeing up, but only compile it when debug code is enabled, just like assertions and debug output.

This way, the production builds won’t have unneeded code, while the debug builds will give proper results out on tools like valgrind. But it doesn’t stop at freeing the resources at the end of the main() function, because sometimes you don’t preserve reference to the actual memory areas allocated during initialization (since they are not strictly needed), or you just keep them as static local variables, which are both reported as unreachable by the end of the process, and are not easy to free up. To work this out, in feng, I’ve been using a slightly more sophisticated solution, making use of a feature of recent GCC and ICC versions: destructor functions.

Destructor functions are called within the fini execution path, after the main() and before the process is unloaded and the resource freed automatically. Freeing the memory at that point is the ideal situation to make sure that all the resources are freed up properly. Of course, this reduces the cases when the code is enabled to having both debug system enabled, and a compiler that supports the destructor functions. But this is a broad enough definition to allow proper development. A simple check with autoconf and the thing is done.

By switching the local static variables with unit-static variables (which, once compiled, are basically the same thing), and adding a cleanup function, almost all memory can be freed without having an explicit deallocation function. There is still the matter of initializing these memory areas; in this case there are two options: either you just initialize them explicitly, or you initialize them the first time they are needed. Both these options allow for the size of the structure to be chosen at runtime (from a configuration file), for example for the number of configuration contexts to initialize. The third way, that is to initialize them with constructor functions, also works out well for fixed-size memory areas, but since I don’t really have a good reason for which they could be better than the alternative the to just call a function during initialization, I don’t have any reason to take them into consideration.

For dynamic initialization, the interface usually used is the “once” interface, which also allows for multithreaded applications to have run-once blocks of code without having to fiddle with mutexes and locks; this is implemented by the pthread_once_t handling functions from the POSIX thread interface, or GOnce as provided by glib. This allows delayed allocation and initialization of structures which is especially useful for parts of the code that are not used constantly but just in some specific cases.

All the code can be found in your local feng repository or the nearest git server!

Profiling Proficiency

This article was originally published on the Axant Technical Blog.

Trying to assess the impact of the Ragel state machine generator on software, I’ve been trying to come down with a quick way to seriously benchmark some simple testcases, making sure that the results are as objective as possible. Unfortunately, I’m not a profiling expert; to keep with the alliteration in the post’s title, I’m not proficient with profilers.

Profiling can be done either internally to the software (manually or with GCC’s help), externally with a profiling software, or in the kernel with a sampling profiler. Respectively, you’d be found using the rdtsc instruction (on x86/amd64), the gprof command, valgrind and one of either sysprof or oprofile. Turns out that almost all these options don’t give you anything useful in truth, and you have to decide whether to get theoretical data, or practical timing, and in both cases, you don’t have a very useful result.

Of that list of profiling software, the one that looked more promising was actually oProfile; especially more interesting since the AMD Family 10h (Barcelona) CPUs supposedly have a way to have accurate reporting of instruction executed, which should combine the precise timing reported by oProfile with the invariant execution profile that valgrind provides. Unfortunately oprofile’s documentation is quite lacking, and I could find nothing to get that “IBS” feature working.

Since oprofile is a sampling profiler, it has a few important points to be noted: it requires kernel support (which in my case required a kernel rebuild because I had profiling support disabled), it requires root to set up and start profiling, through a daemon process, by default it profiles everything running in the system, which on a busy system running a tinderbox might actually be too much. Support for oprofile in Gentoo also doesn’t seem to be perfect, for instance there is no standardised way to start/stop/restart the daemon, which might not sound that bad for most people, but is actually useful because sometimes you forget to have the daemon running and you try to start it time and time again and it doesn’t seem to work as you expect. Also, for some reason it stores the data in /var/lib instead of /var/cache; this wouldn’t be a problem if it wasn’t that if you don’t pay enough attention you can easily see your /var/lib filesystem filling up (remember: it runs at root so it bypasses the root-reserved space).

More worrisome, you’ll never get proper profiling on Gentoo yet, at least for AMD64 systems. The problem is that all sampling profilers (so the same holds true for sysprof too) require frame pointer information; the well-known -fomit-frame-pointer flag, that allows to save precious registers on x86, and used to break debug support, can become a problem, as Mart pointed out to me. The tricky issue is that since a few GCC and GDB versions, the frame pointer is no longer needed to be able to get complete backtraces of processes being debugged; this meant that, for instance on AMD64, the flag is now automatically enabled by -O2 and higher. On some architecture, this is still not a problem to sample-based profiling, but on AMD64 it is. Now, the testcases I had to profile are quite simple and minimal and only call into the C library (and that barely calls the kernel to read the input files), so I only needed to have the C library built with frame pointers to break down the functions; unfortunately this wasn’t as easy as I hoped.

The first problem is that the Gentoo ebuild for glibc does have a glibc-omitfp USE flag; but does not seem to explicitly disable frame pointer omitting, just explicitly enable it. Changing that, thus, didn’t help; adding -fno-omit-frame-pointer also does not work properly because flags are (somewhat obviously) filtered down; adding that flag to the whitelist in flag-o-matic does not help either. I have yet to see it working as it’s supposed to, so for now oprofile is just accumulating dust.

My second choice was to use gprof together with GCC’s profiling support, but that only seems to provide a breakdown in percentage of execution time, which is also not what I was aiming for. The final option was to fall back to valgrind, in particular the callgrind tool. Unfortunately the output of that tool is not human-readable, and you usually end up using software like KCacheGrind to be able to parse it; but since this system is KDE-free I couldn’t rely on that; and also KCacheGrind does not provide an easy way to extract data out of it to compile benchmark tables and tests.

On the other hand, the callgrind_annotate script does provide the needed information, although still in a not exactly software-parsable fashion; indeed to find the actual information I had to look at the main() calltrace and then select the function I’m interested in profiling, that provided me with the instruction count of execution. Unfortunately this does not really help me as much as I was hoping, since it tells me nothing on how much time will it take for the CPU to execute the instruction (which is related to the amount of cache and other stuff like that), but it’s at least something to start with.

I’ll be providing the complete script, benchmark and hopefully results once I can actually have a way to correlate them to real-life situations.

Cleaning up

Following the debug for yesterday’s problem I decided to spend some time to analyse the memory behaviour of feng with valgrind, and I noticed a few interesting things. The main one is that there are quite a few places where memory is allocated but is never ever freed, because it’s used from a given point till the end of the execution.

Since memory gets deallocated when the process exits; adding explicit code to free that type of resources is not necessary; on the other hand, it’s often suggested as one of the best practises in development since leaks are never a good thing, code may be reused in different situations where freeing the resource is actually needed, and most importantly: not freeing resources will sidetrack software like valgrind that is supposed to tell you where the leaks are, which either forces you to reduce the level of detail (like hiding all the still-reachable memory) or to cope with false positives.

With valgrind, the straightforward solution is to use a suppression file to hide what is known to be false positives; on the other hand, this solution is not extremely practical: it is valgrind specific, yet valgrind is not the only software doing this; it does not tell you if you were, for any reason, fiddling with pointers during the course of the program, and it separates the knowledge about the program from the program itself, which is almost never a good thing.

Now of course one could just always free all the resources; the overhead of doing that is likely minimal when it comes to complex software, because a few calls to the free() functions and friends are probably just a small percentage of the whole code of the software; but still if they are unnecessary there is no real reason to have them compiled in the final code. My approach is perhaps a bit more hacky, but I think it’s a nice enough approach: final memory cleanup is done under preprocessor conditional, checking for the NDEBUG preprocessor macro, the same macro that is used to disable assertions.

In a few places, though, I’ve been using the GOnce constructor (more or less equivalent to pthread_once) that allows me to execute initialisation when actually needed instead of having initialisation and cleanup functions to call from the main function. While this works pretty well, it makes it tricky to know when to actually free resources. Luckily most modern compilers support the destructor attribute, which basically is a way to say that a given function, that can very well be static, should be called before closing down. Relying on these destructors for actual program logic is not a very good idea because they are not really well supported, but for leak testing and debugging they can be quite useful instrumentation.

I’ll try to provide more detailed data once I make feng not report leaks any longer.

The frustration of debugging

I’m currently working on a replacement for bufferpool in lscube; the bufferpool library has provided up to now two different (quite different actually) functionalities to the lscube suite of software: it provided an IPC method by using shared memory buffers, and also a producer/consumer system for live or multicast streaming. Unfortunately, by the way it was developed in the first place, it is really fragile, and is probably one of the least understood parts of the code.

After trying to look to improve it I decided that it’s easier to discard it and replace it altogether. Like many other parts, it has been written by many people at different times, and this could be seen quite well. I think the one part that shows very well that it has been written with too little knowledge of programming details is that the actual payload area of the buffers is not only fixed-sized, but of a size (2000 bytes) that does not falls into the usual aligned data copy. Additionally, while the name sounded like it was quite generic, the implementation certainly wasn’t, since it kept some RTP-related details directly into the transparent object structures. Not good at all.

At any rate, I decided it was time to replace the thing and started looking into it, and designing a new interface. My idea was to build something generic enough to be reusable in other places, even in software completely different, but at the same time I didn’t feel like going all the way to follow the GObject module since that was way too much for what we needed. I started thinking about a design with one, then many, asynchronous queues, and decided to try that road. But since I like thinking of me as a decent developer, I started writing the documentation before the code. Writing down the documentation has actually shown me that my approach would not have worked well at all; after a few iterations over just the documentation of how the thing was supposed to work, I was able to get one setup that looked promising, and started implementing it.

Unfortunately, right after implementing it and replacing the old code with the new one, the thing wasn’t working; I’m still not sure now why it’s not working but I’ll go back to that in a moment. One other thing I would like to say though is that after writing the code, and deciding it might have been something I did overlook in the term of implementation, I simply had to look again at the documentation I wrote, as well as looking for “todo” markup inside the source code (thanks Doxygen!), to implement what I didn’t implement the first time around (but I decided was needed beforehand). So as a suggestion to everybody: keep documenting as you write code, is a good practice.

But, right now, I’m still unsure of what the problem is; it would be quite easy to just find it if I could watch at the code as it was executed, but it seems like the GNU debugger (gdb) is not willing to collaborate today. I start feng inside it, set a breakpoint on the consumer hook-up, and launch it, but when it actually stops, info threads shows me nothing, although at that point there are, for sure, at least three threads: the producer (the file parser), the consumer (the rtsp session) and the main loop. The funny thing is that the problem is certainly in my system, because the same exact source code does work fine for Luca. I’ll have to use his system to debug, or set up another system for pure debugging purposes.

This is the second big problem with gdb today, the first happened when I wanted to try gwibber (as provided by Jürgen); somehow it’s making the gnome-keyring-daemon crash, but if I try to hook gdb to that, and break it on the abort() call (the problem is likely an assertion that fails), it’s gdb itself that crashes, disallowing me from reading a backtrace.

I have to say, I’m not really happy with the debugging facilities on my system today, not at all. I could try valgrind, but last time I used it, it failed because my version of glib is using some SSE4 instruction it didn’t know about (for that reason I use a 9999 version of valgrind, and yet it doesn’t usually work either). I’m afraid at the end I’ll have to rely on adding debug output directly to the bufferqueue code and hope to catch what the problem is.

Current mood: frustrated.

Even little differences count

If you ever find yourself relying for any reason on the behaviour of Microsoft Excel, make sure you never ever change its version. Seems like I had some problems with my job because the version of Microsoft Excel I bought is not the same as the one they are using at the office.

Also, if you want to shut up Valgrind’s warnings to make life easier for people developing against a given library, it’s time you hear about suppression files, rather than trying to change code you don’t know enough about.

Seems like a little difference in the patches applied by Debian (and Ubuntu) made OpenSSL vulnerable like it never was. Even my keys has to be considered compromised, so I had to change them on every service I use. This means Alioth, Gentoo’s Infra, GentooExperimental, SourceForge, Berlios, Savannah, Rubyforge, Launchpad (don’t ask), KDE, and of course my own server.

I think yesterday will be remembered for the years to come as a critical day. It’s like a city of the size of Milan starting to change the door locks for all the apartments at once. I’m surprised there hasn’t been more confusion.

Myself, I’m starting to consider some other kind of authentication, as the passphrases I use for the keys are quite long, and I’m tired of having to key it in every time I start the system. SmartCard and authentication tokens start to look quite nice, and it would be a nice test for Gentoo’s PAM infrastructure.

It’s unfortunate that Alon, who as far as I remembered managed these things, left Gentoo yesterday :/ Although I suppose that if I am to start looking into these things I might be able to lend a hand.

At the moment, I’m oriented toward an authentication token, as that would make it possible for me to have it around on the laptop too at any time. Strangely enough, I found an Italian producer, Eutronsec, and their CryptoIdentity token seems to be supported by pcsclite.

SmartCards are more likely useful in an organisation where you’d have one reader on a given system and many many cards around. Here I’d just have one reader and one smartcard if I did go that way. If the Italian ID card was a SmartCard I could have chosen that, but at the moment it doesn’t look like a convenient idea.

Unfortunately it seems like Eutronsec only sells to other companies, I’ve mailed them asking if they could sell to single entities, but I haven’t received an answer yet. I’m afraid I’ll have to start looking for alternatives if I want to proceed on this road.

On a totally different topic, but still related to “little differences”, I’m having difficulties finding cups that I can use with my espresso machine. Usual teacups that my family uses for coffee (yeah they are quite larger for coffee, we’re used to those though, a six servings Moka is good for three here) tend to be too thin at the bottom and large at the top, so the coffee spills a lot, especially when doing a hot boiling espresso. Mugs are quite better for the task, but all the ones I have here are little less than 9.5 centimetres (little more than three and a half inches), and require to be skewed to enter the espresso machine.

The idea would be an 8.5cm cup, but I can’t seem to find them in the shops around here, they are either 6 centimetres, which is way too short, or they are the “usual” mug size. In a Chinese shop around here I found some nice cups that are about 12cm actually, I was tempted to take one to hold the milk for cappuccino, actually, but they are quite too high for the espresso machine.

Now I know why there aren’t many valgrind frontends

valgrind has a very nice way to output information in parseable form for the memcheck (the primary) tool: XML.

You don’t really need a custom parser, you just need to read the correct elements and attributes, and you’re done. The format seems actually to provide enough extensibility so that it can be used for every tool, and so I expected.

Starting from this I decided to start working on a new valgrind frontend, in Ruby. To begin with, I started writing the abstraction, and a test suite so that I could make sure what I wrote was what I was expecting to get.

Unfortunately, when I decided to start adding the support for a second tool, to see that my code was doing the right thing with different tools, I discovered that helgrind does not output XML. Nor does massif.

So it seems that to write a frontend to something more than just memcheck (which is what Valkyrie supported), you’d have to parse the plaintext output, which is quite more difficult.

I suppose this is why Valkyrie stopped at supporting memcheck..

At this point, I suppose that to write an Helgrind frontend I’d have to hack at valgrind itself to add XML output to the tool. I think I’ll try that soon enough.

Frontends and scripting languages

So today I updated my system and seen that the new nmap version features a totally new frontend, written in Python with pygtk. While this bothres me a bit because I had to install the pygtk package with its siblings, I actually find this an interesting and useful change.

I say this because, well, frontends don’t need to be fast or to have a tight memory usage pattern; most of the times, frontends only have to take care of user interaction, and the user interaction is neither cpu, nor memory, nor I/O bound, it’s user bound.

Scripting languages don’t usually suffer from architecture problems, like endianness of variables, or missing definitions or include files moved around; well, the base interpreter does suffer from these problems, but it’s one package instead of every package.

Also, scripting languages usually makes it easier for users to change the frontends if they need, which is certainly an advantage for the part of the program – the frontend, as said – that faces the user.

This made me think about writing a new Valgrind frontend; my language of choice would certainly be Ruby, the problem is: which GUI libraries would I be using?

An year ago I was dead set on Korundum, KDE’s Ruby bindings, but I don’t think this is feasible now. The reason is of course tightly related to the KDE4 development: writing on Korundum makes little sense, considering that the focus is now KDE4, but KDE4 itself is too young to write the frontend using that. Plus there is one big problem with QtRuby: the Qt3 and Qt4 Ruby bindings install the same shared object, with different APIs, making a Qt3 Ruby program incompatible with a system where Qt4 Ruby is installed. And writing the frontend in pure Qt4… I’m not sure about that.

The alternative is to use ruby-gtk2. Even if I don’t like the GTK C interface myself, the Ruby version seems nicer to me. And with Nimbus I can finally stand GTK2 interfaces. I should probably try this out, it might be good enough to go.

Actually, with little work it’s likely that I could write a pure-Ruby abstraction for a valgrind frontend, and then two interfaces, one in GTK2 and one in Qt4. This is another nice part of scripting languages: you usually have the basic set of objects not being part of the libraries used (QString, GString, …) and you can more easily share code between Qt, GTK, NCurses, whatever you want.

So, anyway.. I need something to keep my mind busy tonight, as I’m having some real-life trouble (no, not health trouble, just… love trouble :/), so I’ll probably start tonight to work on something.

Valgrind 3.3.0: good and bad sides

So, as Albert already blogged Valgrind 3.3.0 was released, with a new Helgrind (mutex analyser), a new massif and some new tools too.

So it’s great, having a new version of such a powerful debugging and analysis tool, is quite nice for developers; it’s also one of my main tools for xine-lib’s testing.

There are, though, some bad news that go with the current release though.

The first problem is that the newly rewritten massif now only outputs a way different information output through a ms_print script; there isn’t a graph image anymore, the only graph is in ASCII-art, and I really fail to interpret it. I hope that in the next weeks someone (maybe me) will write a script that produces the nice graph images for the new massif too.

The second problem is that helgrind doesn’t seem to work properly with xine-lib at least, as it’s stuck after not much time from the start; I’ll have to look deeply into it.

The third, and hopefully last problem, is that valkyrie is not updated since August 2006, so it’s practically impossible that it supports the new options provided by the new memcheck too, while malloc-fill is something I’d very much like to use to check xine-lib.

I’d say we need better valgrind frontends! :)

The other big memory waste

Tonight I finally started the true 1.2 series branch on xine-lib’s repository; this branch is the merge of the nopadding and the ffmpeg_integration branches, which means it starts with two big improvements: 10MB memory saving and a simpler to upgrade FFmpeg snapshot, with handling for disabling uncommon decoders, just like Miguel wanted.

Now, if you looked well enough at the graphs you posted, you already know what I’m talking about, if you didn’t, well here there’s a new graph, created using the -pq option at xine so that the logo is not displayed once the mp3 finished playing, removing the spurious spike at the end of the graph.

29-11-06_1750

The yellow are you see it’s the memory created by _x_fifo_buffer_new called by _x_video_decoder_init function. That’s right, there’s a huge amount of memory allocated by the video decoder initialisation function when playing an audio only mp3.

Okay, so what the problem is here? First, as the code cannot tell if what is playing is audio only or contains video too beforehand, it has to initialise the video driver every time too; this is quite normal though, nothing to be a problem.

The problem is that the video initialisation initialise automatically also a number of buffers, by default 500, of size 8KB.. even if they never gets used. It also allocates a single memory area for the buffers, 2KB-aligned, which is not exactly the cheapest way to allocate memory.

Now, this in general should be okay, shouldn’t it? Most of the times you’re playing video with xine, and so it should be fine, but there are cases when using this method will just waste a lot of memory, especially if the user tried to be smart and increased the number of buffers to something that isn’t exactly sane…

I’ll now be working on checking if it’s possible to just allocate the buffers when they are needed; in that case the graph will certainly appear different, as the memory will increase less steeply, and would just run down together at the end.

Let’s return to work (although I should be finding a way to clean up the mess that is my home office here)…