A misc weekend

So today is the first week without Gentoo work for me. I admit I feel a bit disappointed, I used to do a lot of stuff and now that I can’t do them anymore (for my choice too), I feel like I’m useless. But I’m resisting the temptation to feel my overlay with a lot of ebuilds for the bumps that I would have done myself previously.

First of all, I wish to thank the anonymous “bla” who commented on my previous post about the lack of analysers for the usbmon output, suggesting me Wireshark. I really didn’t know they were working on supporting USB sniffing, and yes that is nice and stops me from deciding to write an analyser myself. Unfortunately, the current weekly snapshot of libpcap and the current subversion trunk of Wireshark does not get along well together: they simply don’t allow me to sniff at all, I get a series of errors instead.

Talking about the Chinese adapter, I found something on their site but I’m not really sure if it’s a specification or simply a description of the driver interface. Nonetheless, I’ve been able to identify a few patterns in the logs I was able to get with usbmon (analysed by hand), and I now know two major type of commands that are sent to the device: a pair of set and get commands used to set and get a 16 bit word out of at least three «registers»… I’m not sure if they are registers at all, they only are three 16-bit words which have another 16-bit word assigned to them, and which is set and get with some vendor-type requests. Unfortunately, I haven’t been able to get a hold of their actual meaning yet.

I’ve also found a sort of “commit” request, or a “get internal status” command, that tells me I haven’t initialised the device correctly yet: when handled by the Windows drivers, this request always return 0x9494, while here it returns values like 0xf4f4 or similar, never the actual 0x9494 word I’m expecting.

I didn’t work on Rust in the past few days but I have been thinking a lot about it, as David is being the first user of it, and submitted me a few reports of features currently missing, which I didn’t think of before, namely support for multiple constructors, and proper support for sub-classes. I haven’t worked on this yet, but I’ll do soon. He also provided me a patch that I merged, but I’ll have to write a series of testcase for that too, so that I can actually be sure that Rust continue behaving as needed.

Today I also decided to look a bit more on xine, as lately I’ve been neglecting it a lot. I’ve tried to cleanup my Wavpack demuxer a bit more, and I just finished a Valgrind (massif) run while playing two wavpack files one after the other. The results are somewhat interesting: xine does not leak memory, as the memory consumption remains constant between the two playbacks, but it also seems to allocate about ten megabytes of memory in _x_video_decode that are not used at all, as I’ve disabled the effect plugins, and so there should be no video at all. About seven more megabytes are wasted in the overlay handling, which also shows that the video output is not correctly disabled while playing audio tracks, with xine-ui at least.

I’ll be running a callgrind now also to test where time is actually lost, it’s good now that I have KCacheGrind displaying properly the callgraph, thanks to Albert Astals Cid (TSDgeos from KDE). One thing that I’m afraid of is that xine does not really have good performances on lossless files because it uses too small buffers, at least by default; I have this doubt because while mp3s play mostly fine – with mad decoder, not FFmpeg’s – it skips a lot for both Flac and WavPack files. This is especially seen when you run it through Valgrind, as the code is obviously slowed down and shows the need for improvements.

If only there was support in ruby-prof for Valgrind-style output to analyse the output with KCacheGrind, I could be improving Rust too. But that will have to wait I’m afraid.

Oh well.

Finding the cause of xine slow-load

So, today I was talking with a friend of mine, Fabrizio Montesi, about the slow load of picoxine, even if it’s a minimal implementation. The cause was not in picoxine code (this time), but rather in xine-lib itself.

Tonight I decided to give a try to KCacheGrind to try finding the cause of this slow startup. Unfortunately, I also hit KDE BUG #124406 so I cannot actually see the interesting call graph :( But I can see the fsview to have a proportion of how the time is spent.

-This is- Here was a graph generated by kcachegrind (sorry for the dimensions, but it’s difficult to read it otherwise). The first big rectangle area, that is easily divided in two areas, is the proper xine_init() call. The middle area is the demuxer detection, while the are at the end is the actual decode job dome by FFmpeg (it’s an mp3 but not using mad).

It’s easy to see that most of the time is spent in do_lookup_x, that is a sub-function of dlopen(), it means that xine really spends a lot of time loading the plugins, in the binding stage (which means it really improved stuff when I’ve moved to use hidden visibility, because the amount of exported symbols decreased a lot, and that having a properly hidden FFmpeg will improve performance a lot again).

But there are more interesting and subtle facts to go along with this. Although xine_init actually loads all the plugins, they seem to be reloaded when they are successively tried for the demuxer, which is quite a waste of time. But not enough yet.

In the first big rectangle area you don’t just see the plugins load, but also the use of sscanf and string comparisons and other stuff. Looking down to what the code is doing there, you can see that is the loading and saving of the file ~/.xine/catalog.cache, that contains the data loaded from the plugins, probably created in theory to avoid loading all the plugins every time, and just loading them when needed (which means that demuxers will be always loaded to be tried, but this is fast because they usually don’t have much external libraries’ dependencies, and for the important plugins, the decoders, they would be loaded only when actually needed). But for what I can see, the plugins are always reloaded entirely, the cache file is loaded and then saved again, discarding its actual content or at least not caring too much about it.

So, what can I deduce from this graph? Well, at first, xine could really make use of the improvements in bindings performances, this means new hash tables and similar things that are going on in binutils and glibc, but there are improvements that can be applied inside xine-lib itself, although they do require a bit of work (and I might not be able to do them myself, especially because I’ll probably be signing for a job soon and the time I get out of that job will be spent taking care of my packages, following Gentoo so that I can fullfill my role as a councilour properly, and relaxing). One example could be changing the format of the plugins cache from plain text (that requires time to read and parse) to a binary cache thing like ld.so.cache, that can easily parsed without using string comparisons (that are not cheap). After this is fixed, xine should be changed so to reload only the plugins that actually changed their mtime from the last time the cache was generated; this way all the extra load time is skipped (unless you update, rebuild, or develop xine-lib you’ll almost always hit the plugins cache, rather than miss).
Finally, the demuxer choice should happen without re-loading all of the plugins, but just the demuxer ones.

Please note that fixing the plugins cache hit/miss problem will not only improve startup times, but also the memory usage of xine itself, as you wouldn’t need, for instance, the postprocess plugins loaded when you don’t have video output enabled (if the cache is done in such a way that does not try to load audio/video/post plugins until they are actually requested).

The remaining problem now is implementing these changes. As I said, it’s unlikely that I’ll be doing them, because to do this I’d have to sacrifice my remaining free time, and I’d rather avoid it. If somebody else is able to work on this it would be pleasant, or in case you can always try to lure me into doing this ;)

P.S.: Brian I hope this time the image is clear, I know you love profiling :)