For elves, size matters

This weekend I was supposed to take it slow, relax, and spend it rebuilding my strength, playing and stuff. At the end I wasn’t able to do much of that, and instead I spent it looking at projects I left dangling for a while, like my very own Ruby-Elf.

One beef I always had with ELF analysis tools was with the size command, that is used to get some quick statistics about the internal size distribution of an ELF file. I was interested in that because it’s tightly related to what cowstats tells you already. Indeed if you look at the following output, the data column is the total size of Copy on Write sections for the file.

% size /usr/lib/libxml2.so              
   text	   data	    bss	    dec	    hex	filename
1425219	  35772	   5368	1466359	 165ff7	/usr/lib/libxml2.so

You’d expect me to be happy that there is already a tool that extracts that information out of compiled ELF files, but I really am not because this does not actually provide enough granularity for what my targets are, usually.

First of all, you have to learn that the three names for the columns are not the actual names of the involved sections, even though .text, .data and .bss are three common ELF sections. The actual meanings are for bss, the sum of the sizes of all the allocated sections which are not mapped to the zero page (like .bss and .tbss), for data the sum of the sizes of all the allocated writeable sections, and for text the sum of the sizes of all the remaining allocated sections. This means that .rodata sections are counted in text rather than data, which can easily be confusing.

I blame this confusion also on the size(1) man page that comes from binutils, since it contains this:

Here is an example of the Berkeley (default) format of output from size:

$ size --format=Berkeley ranlib size
text data bss dec hex filename
294880 81920 11592 388392 5ed28 ranlib
294880 81920 11888 388688 5ee50 size

This is the same data, but displayed closer to System V conventions:

$ size --format=SysV ranlib size
ranlib :
section size addr
.text 294880 8192
.data 81920 303104
.bss 11592 385024
Total 388392
size :
section size addr
.text 294880 8192
.data 81920 303104
.bss 11888 385024
Total 388688

This would make you expect that the three columns map directly to the three named sections. Instead if you actually used the System V format for size on libxml2 you’d get a result much different:

% size --format=SysV /usr/lib/libxml2.so
/usr/lib/libxml2.so  :
section             size      addr
.gnu.hash          13012       456
.dynsym            43464     13472
.dynstr            35235     56936
.gnu.version        3622     92172
.gnu.version_r       128     95800
.rela.dyn          76872     95928
.rela.plt           3216    172800
.init                 24    176016
.plt                2160    176040
.text            1011448    178208
.fini                 14   1189656
.rodata           132592   1189696
.eh_frame_hdr      19508   1322288
.eh_frame          83924   1341800
.ctors                16   3526016
.dtors                16   3526032
.jcr                   8   3526048
.data.rel.ro       28056   3526080
.dynamic             448   3554136
.got                 720   3554584
.got.plt            1096   3555304
.data               5412   3556416
.bss                5368   3561856
.gnu_debuglink        28         0
Total            1466387

It counts all the sections one by one rather than grouping them. I’m still not sure on how it decides whether to print or not a given section, I’d have to look at the actual code from binutils to say that. It’s not counting just the allocated sections (so the ones that count in for the vmsize once the file is loaded to spawn a process), or otherwise .gnu_debuglink wouldn’t be in the list, but it’s also not showing all of the sections, since .shstrtab is not there at all. Interestingly enough, if you use the eu-size program that comes with Elfutils rather than the version coming from GNU Binutils, you’ll see that .gnu_debuglink is not in the list, which suggests a bug in Binutils.

But it’s not just this inconsistency in naming that bothers me, it’s more that the output information is almost entirely pointless to make actual guesses about the behaviour of a program, although that’s the most common usage of size. The problem is that you cannot distinguish between increases in the size of the code, or in the size of the constant tables it uses. It also does not allow to easily know whether the amount of data is or not mitigated by prelink.

The SysV-style output is even more useless since if you want to know the full data of the sections you can just use the readelf -S command, which reports among other things the size and the address it is loaded at, which is just what SysV style does. The only difference is that you cannot change radix and that it does not give you a total, but even then I don’t see what’s the point after all. By the way if you think that readelf -S is too difficult to read, try eu-readelf -S, it’s nicer.

So at the end of the day what am I supposed to do? Well at this point I’m writing my own script, rbelf-size, which shows a different set of headers, which allows me to actually distinguish between the different sections so I can judge the changes in software and libraries in different versions and with different patches. I hope to have it available on Ruby-Elf’s repository by the end of next week, maybe it can be useful to others too.

Exit mobile version