I’m continuing with my tentative to reduce the dirty RSS (Resident memory) in FFmpeg’s libraries. After a few more patches for libpostproc, I’ve moved on libavutil and libavcodec a second, trying to reduce the memory they do use.
The basic problem with libavcodec is the need for huge tables to speed up decoding. These tables are used to look up particular bytes (or shorts) so that their conversion to another given format need not to be calculated manually.
There are two kinds of tables usually used: prefilled (or precalculated), which are constant and goes to .rodata esction of executable files, and runtime initialised tables. You usually have to choose one or the other, although for smaller tables, choosing precalculated is the win-win choice, as the code used to generate it might be bigger than the actual table (this is the case for two tables in I-forgot-which xine plugin, which were runtime initialised before I replaced them.
FFmpeg, since a few days ago, has an option to choose between runtime-initialised and hardcoded tables, although that’s just for CRC code. Myself, I choose hardcoded, and you’ll now see why.
Runtime-initialised and hardcoded tables have different pros and cons. So it’s nice, most of the times, which means for all but the smallest tables, to be able to choose between the two.
The first difference between the two is the section they are mapped on in the resulting ELF file: pre-initialised tables are saved in .rodata, while runtime-initialised are either saved in .bss (which means they are mapped to a zero’d page, and COW’d as soon as the initialisation code is ran), or allocated dynamically at initialisation.
This means that the runtime-initialised tables are private to the process, they are then dirty RSS or heap memory used up. On the other hand, hardcoded tables are mapped out of .rodata from the on-disk ELF file, and shared between processes.
The problem of hardcoded tables is that they need to be actually saved inside the ELF file, so if (this is a real example) you got two 16KiB tables, for which the initialisation code is about 500 bytes, with hardcoded tables you save the 500 bytes but add 32KiB to the size of the executable. As you can see, the executable will be smaller if you don’t use hardcoded tables.
The size of the executable does not really change any cache-related parameter, as the tables are the same size, and does not mean that the processes will use more memory. On the other hand, they’ll use less memory because they can share the pages. It might change the time needed to load the file from disk, but that’s a one-time cost versus a per-execution cost of the initialisation.
On modern systems with big hard drives, it’s more than likely you wouldn’t see any difference in load time, and you most likely will not have trouble with the increased size. You might want to see the memory usage reduced though, especially for processes that might use xine somehow (think KDE4).
I’ve been writing a few patches to hardcode tables in FFmpeg today, hopefully I’ll be able to get them integrated soon. In my overlay I have a live SVN ebuild for FFmpeg, and it has an USE flag for the hardcoded tables. Probably the size of the libraries will be increased a lot since I’ve seen quite a few tables in .bss that might be hardcoded.
My target at the moment is to reduce the 40KiB dirty RSS I see for libavcodec.so in my Amarok’s process. Hopefully this will make FFmpeg’s shared library support more viable (considering that the runtime relocations due to PIC makes FFmpeg quite slower, not counting the fact that it’s missing one register compared to the non-PIC version.
KDE4 users would be happy for that 🙂
For what concerns my personal life, today I have a very bad throat ache, but I received the smoking pipe I ordered, which is actually a nice way to relieve stress while thinking… no I don’t smoke, I have no intention to, I just bit it and keep doing what I’m doing. With the complications I had during hospitalisation, I’ll never try to smoke at all.