When I’ve read some rants about Firefox I thought they were a little bit too much. Now, I start to wonder if they were quite to the point instead. But before I start I have to say I haven’t tried contacting anybody yet, neither from the Gentoo Mozilla team not upstream. And I’m sure the Gentoo Mozilla team are doing their best to make sure that they can provide a working Firefox still following upstream guidelines on trademarks.
This actually sprouted from my previous work inspecting library paths; I went to check which libraries for firefox-bin were loaded from the system library directory, and noticed one curious thing:
/usr/lib/libsqlite3.so was being loaded. What’s the problem? The problem is that I knew that xulrunner (at least built from sources) bundles its own copy of SQLite3, so I wondered if they used the system copy for the binary package. Funnily enough, they really don’t:
yamato link-collisions # ldd /opt/firefox/firefox-bin | grep sqlite3
libsqlite3.so => /opt/firefox/libsqlite3.so (0xf67e7000)
libsqlite3.so.0 => /usr/lib/libsqlite3.so.0 (0xf621e000)
yamato link-collisions # lddtree.sh /opt/firefox/firefox-bin | grep sqlite3 -B1
libxul.so => /opt/firefox/libxul.so
libsqlite3.so => /opt/firefox/libsqlite3.so
libsoftokn3.so => /usr/lib/nss/libsoftokn3.so
libsqlite3.so.0 => /usr/lib/libsqlite3.so.0
lddtree.sh script comes from pax-utils and uses scanelf. I have a similar script in my Ruby-Elf suite implemented as a testcase, it produces the same results, basically.)
So the binary version of the package uses the system copy of NSS and thus loads the system copy of SQLite3. I haven’t gone as far as checking where the symbols were resolved, but one of the two is going to be loaded and unused, wasting memory (clean and dirty, for relocated data sections). Not nice, but one can say it’s the default binary, and has to know to adapt. In truth the problem here is that upstream didn’t use rpath, and thus the firefox-bin program does not load all its libraries from the
/opt/firefox directory (since the
/usr/lib/nss directory comes first). Had they built their binary with rpath set to
$ORIGIN it would have loaded everything from
/opt/firefox without caring about the system libraries, like it was intended to do. Interestingly enough, they do just that for Solaris, but not for Linux where they prefer fiddling with
Next, I checked the
/usr/bin/firefox started, which I already copied on the other post:
exec "/usr/lib64/mozilla-firefox"/firefox "$@"
Let’s ignore the problem with the rewriting of the environment variable, which I don’t care about right now, and check what it does. It adds the
/usr/lib64/mozilla-firefox directory to the list of paths to load libraries from. Since it’s setting LD_LIBRARY_PATH all the library resolutions will have to be done manually rather than using the
ld.so.cache file. So I checked which libraries it loads from there:
flame@yamato ~ % LD_LIBRARY_PATH=/usr/lib64/mozilla-firefox ldd /usr/lib64/mozilla-firefox/firefox | grep mozilla-firefox
flame@yamato ~ % scanelf -E ET_DYN /usr/lib64/mozilla-firefox
(The second commands finds all the libraries in the given path, by checking for ET_DYN, dynamic ELF, files.)
Okay so there is one library, but it’s not in the NEEDED lines of the firefox executable. Indeed that library is a preloadable library with a different
malloc() implementation (remember I’ve written about similar things and commented about FreeBSD solution), which means it has to be passed through
LD_PRELOAD to be useful, and I can’t see that to be used at all. Indeed, if I check the loaded libraries on my firefox process I can’t find it:
flame@yamato x86 % fgrep jemalloc /proc/`pidof firefox`/smaps
flame@yamato x86 %
Let’s go step by step though, for now we can say with enough safety that the loader is overwriting
LD_LIBRARY_PATH with no apparent good reason. Which libraries does the firefox executable load then?
flame@yamato ~ % LD_LIBRARY_PATH=/usr/lib64/mozilla-firefox ldd /usr/lib64/mozilla-firefox/firefox
linux-vdso.so.1 => (0x00007fffcabfd000)
libdl.so.2 => /lib/libdl.so.2 (0x00007fa5c2647000)
libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/libstdc++.so.6 (0x00007fa5c2338000)
libc.so.6 => /lib/libc.so.6 (0x00007fa5c1fc5000)
libm.so.6 => /lib/libm.so.6 (0x00007fa5c1d40000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007fa5c1b28000)
flame@yamato ~ % scanelf -n /usr/lib64/mozilla-firefox/firefox
TYPE NEEDED FILE
ET_EXEC libdl.so.2,libstdc++.so.6,libc.so.6 /usr/lib64/mozilla-firefox/firefox
It can’t be right, can it? We know that Firefox loads GTK+ and a bunch of other libraries, starting with xulrunner itself, but there is no link to those. But if you know your linker you should notice a funny thing:
libdl.so.2. It means the exeutable is calling into the loader at runtime, which usually means
dlopen() is used. Indeed it seems like the firefox executable loads the actual browser at runtime, as you can see by checking the smaps file.
Now there are two things to say here: there is a reason why firefox would be doing that, and the reason is that calling “firefox” with it open already should actually request a new window to be opened, rather than opening a new process. So basically I expect the executable to contain a launcher, that if a copy of firefox is running already just tells that to open a new window, and otherwise loads all the libraries and stuff. It’s a good idea, from one point of view because initialising all the graphical and rendering libraries just to tell another process to open a window would be a waste of resources. On the other hand,
dlopen() is not the best performing approach and also creates problem to prelink.
I have no idea why it happens, but the binary package as released by upstream provides a script that seems to be taking care of the launching, and then a
firefox-bin executable that doesn’t use
dlopen() to load the Gecko engine and all the graphical user interface. I would very much like to know why we don’t do the same for from-source builds, I would sincerely expect that the results would be even better when using prelink and similar.
Now, let’s return a moment to the problem of the SQLite3 loaded twice for the binary release of Firefox, surely the same wouldn’t happen for the from-source version, would it? Check it by yourself:
flame@yamato x86 % fgrep sqlite /proc/`pidof firefox`/smaps
7fea6c8c2000-7fea6c935000 r-xp 00000000 fd:08 701632 /usr/lib64/libsqlite3.so.0.8.6
7fea6c935000-7fea6cb35000 ---p 00073000 fd:08 701632 /usr/lib64/libsqlite3.so.0.8.6
7fea6cb35000-7fea6cb36000 r--p 00073000 fd:08 701632 /usr/lib64/libsqlite3.so.0.8.6
7fea6cb36000-7fea6cb38000 rw-p 00074000 fd:08 701632 /usr/lib64/libsqlite3.so.0.8.6
7fea814dc000-7fea8154f000 r-xp 00000000 fd:08 24920 /usr/lib64/xulrunner-1.9/libsqlite3.so
7fea8154f000-7fea8174f000 ---p 00073000 fd:08 24920 /usr/lib64/xulrunner-1.9/libsqlite3.so
7fea8174f000-7fea81751000 r--p 00073000 fd:08 24920 /usr/lib64/xulrunner-1.9/libsqlite3.so
7fea81751000-7fea81752000 rw-p 00075000 fd:08 24920 /usr/lib64/xulrunner-1.9/libsqlite3.so
Yes, yes it does happen. So I have a process that is loading one library for no good reason at all at runtime, and not a little one at that, when it could probably, at this point, use a single system SQLite library. I say that it could, because now I have enough evidence to support that: if the two libraries had a different ABI, depending on which one the symbols resolve to, either xulrunner or NSS would be crashing down. Since ELF uses a flat namespace, the same symbol name cannot be resolved in two different libraries, and thus one of the two libraries using them would find them in the “ẅrong” copy. And no, before you ask, neither use symbol versioning.
So at this point the question is: can both Firefox upstream and the Gentoo Firefox ebuild start providing something that does more than just working and actually works properly?