Finding the cause of xine slow-load

So, today I was talking with a friend of mine, Fabrizio Montesi, about the slow load of picoxine, even if it’s a minimal implementation. The cause was not in picoxine code (this time), but rather in xine-lib itself.

Tonight I decided to give a try to KCacheGrind to try finding the cause of this slow startup. Unfortunately, I also hit KDE BUG #124406 so I cannot actually see the interesting call graph 🙁 But I can see the fsview to have a proportion of how the time is spent.

-This is- Here was a graph generated by kcachegrind (sorry for the dimensions, but it’s difficult to read it otherwise). The first big rectangle area, that is easily divided in two areas, is the proper xine_init() call. The middle area is the demuxer detection, while the are at the end is the actual decode job dome by FFmpeg (it’s an mp3 but not using mad).

It’s easy to see that most of the time is spent in do_lookup_x, that is a sub-function of dlopen(), it means that xine really spends a lot of time loading the plugins, in the binding stage (which means it really improved stuff when I’ve moved to use hidden visibility, because the amount of exported symbols decreased a lot, and that having a properly hidden FFmpeg will improve performance a lot again).

But there are more interesting and subtle facts to go along with this. Although xine_init actually loads all the plugins, they seem to be reloaded when they are successively tried for the demuxer, which is quite a waste of time. But not enough yet.

In the first big rectangle area you don’t just see the plugins load, but also the use of sscanf and string comparisons and other stuff. Looking down to what the code is doing there, you can see that is the loading and saving of the file ~/.xine/catalog.cache, that contains the data loaded from the plugins, probably created in theory to avoid loading all the plugins every time, and just loading them when needed (which means that demuxers will be always loaded to be tried, but this is fast because they usually don’t have much external libraries’ dependencies, and for the important plugins, the decoders, they would be loaded only when actually needed). But for what I can see, the plugins are always reloaded entirely, the cache file is loaded and then saved again, discarding its actual content or at least not caring too much about it.

So, what can I deduce from this graph? Well, at first, xine could really make use of the improvements in bindings performances, this means new hash tables and similar things that are going on in binutils and glibc, but there are improvements that can be applied inside xine-lib itself, although they do require a bit of work (and I might not be able to do them myself, especially because I’ll probably be signing for a job soon and the time I get out of that job will be spent taking care of my packages, following Gentoo so that I can fullfill my role as a councilour properly, and relaxing). One example could be changing the format of the plugins cache from plain text (that requires time to read and parse) to a binary cache thing like, that can easily parsed without using string comparisons (that are not cheap). After this is fixed, xine should be changed so to reload only the plugins that actually changed their mtime from the last time the cache was generated; this way all the extra load time is skipped (unless you update, rebuild, or develop xine-lib you’ll almost always hit the plugins cache, rather than miss).
Finally, the demuxer choice should happen without re-loading all of the plugins, but just the demuxer ones.

Please note that fixing the plugins cache hit/miss problem will not only improve startup times, but also the memory usage of xine itself, as you wouldn’t need, for instance, the postprocess plugins loaded when you don’t have video output enabled (if the cache is done in such a way that does not try to load audio/video/post plugins until they are actually requested).

The remaining problem now is implementing these changes. As I said, it’s unlikely that I’ll be doing them, because to do this I’d have to sacrifice my remaining free time, and I’d rather avoid it. If somebody else is able to work on this it would be pleasant, or in case you can always try to lure me into doing this 😉

P.S.: Brian I hope this time the image is clear, I know you love profiling 🙂