This weekend I was supposed to take it slow, relax, and spend it rebuilding my strength, playing and stuff. At the end I wasn’t able to do much of that, and instead I spent it looking at projects I left dangling for a while, like my very own Ruby-Elf.
One beef I always had with ELF analysis tools was with the size
command, that is used to get some quick statistics about the internal size distribution of an ELF file. I was interested in that because it’s tightly related to what cowstats
tells you already. Indeed if you look at the following output, the data column is the total size of Copy on Write sections for the file.
% size /usr/lib/libxml2.so
text data bss dec hex filename
1425219 35772 5368 1466359 165ff7 /usr/lib/libxml2.so
You’d expect me to be happy that there is already a tool that extracts that information out of compiled ELF files, but I really am not because this does not actually provide enough granularity for what my targets are, usually.
First of all, you have to learn that the three names for the columns are not the actual names of the involved sections, even though .text, .data and .bss are three common ELF sections. The actual meanings are for bss, the sum of the sizes of all the allocated sections which are not mapped to the zero page (like .bss and .tbss), for data the sum of the sizes of all the allocated writeable sections, and for text the sum of the sizes of all the remaining allocated sections. This means that .rodata sections are counted in text rather than data, which can easily be confusing.
I blame this confusion also on the size(1) man page that comes from binutils, since it contains this:
Here is an example of the Berkeley (default) format of output from size:
$ size --format=Berkeley ranlib size text data bss dec hex filename 294880 81920 11592 388392 5ed28 ranlib 294880 81920 11888 388688 5ee50 size
This is the same data, but displayed closer to System V conventions:
$ size --format=SysV ranlib size ranlib : section size addr .text 294880 8192 .data 81920 303104 .bss 11592 385024 Total 388392 size : section size addr .text 294880 8192 .data 81920 303104 .bss 11888 385024 Total 388688
This would make you expect that the three columns map directly to the three named sections. Instead if you actually used the System V format for size on libxml2 you’d get a result much different:
% size --format=SysV /usr/lib/libxml2.so
/usr/lib/libxml2.so :
section size addr
.gnu.hash 13012 456
.dynsym 43464 13472
.dynstr 35235 56936
.gnu.version 3622 92172
.gnu.version_r 128 95800
.rela.dyn 76872 95928
.rela.plt 3216 172800
.init 24 176016
.plt 2160 176040
.text 1011448 178208
.fini 14 1189656
.rodata 132592 1189696
.eh_frame_hdr 19508 1322288
.eh_frame 83924 1341800
.ctors 16 3526016
.dtors 16 3526032
.jcr 8 3526048
.data.rel.ro 28056 3526080
.dynamic 448 3554136
.got 720 3554584
.got.plt 1096 3555304
.data 5412 3556416
.bss 5368 3561856
.gnu_debuglink 28 0
Total 1466387
It counts all the sections one by one rather than grouping them. I’m still not sure on how it decides whether to print or not a given section, I’d have to look at the actual code from binutils to say that. It’s not counting just the allocated sections (so the ones that count in for the vmsize once the file is loaded to spawn a process), or otherwise .gnu_debuglink wouldn’t be in the list, but it’s also not showing all of the sections, since .shstrtab is not there at all. Interestingly enough, if you use the eu-size
program that comes with Elfutils rather than the version coming from GNU Binutils, you’ll see that .gnu_debuglink is not in the list, which suggests a bug in Binutils.
But it’s not just this inconsistency in naming that bothers me, it’s more that the output information is almost entirely pointless to make actual guesses about the behaviour of a program, although that’s the most common usage of size
. The problem is that you cannot distinguish between increases in the size of the code, or in the size of the constant tables it uses. It also does not allow to easily know whether the amount of data is or not mitigated by prelink.
The SysV-style output is even more useless since if you want to know the full data of the sections you can just use the readelf -S
command, which reports among other things the size and the address it is loaded at, which is just what SysV style does. The only difference is that you cannot change radix and that it does not give you a total, but even then I don’t see what’s the point after all. By the way if you think that readelf -S
is too difficult to read, try eu-readelf -S
, it’s nicer.
So at the end of the day what am I supposed to do? Well at this point I’m writing my own script, rbelf-size
, which shows a different set of headers, which allows me to actually distinguish between the different sections so I can judge the changes in software and libraries in different versions and with different patches. I hope to have it available on Ruby-Elf’s repository by the end of next week, maybe it can be useful to others too.