Serving WebP images

A few months ago I tried experimenting with WebP so to reduce the traffic on my blog without losing quality. At the end the results were negative and I decided instead to drop the backgrounds on my blog and replace them with some CSS to provide gradients, which was not a lossless change, but was definitely easier to load.

After the VideoLan Dev Days 2013 (of which I have to write some report soon, I went to speak with Pascal again, and he was telling me that the new version of Chrome finally fixed the HTTP accept header, so that finally it will prefer WebP to other image formats if present. I confirmed this, as Chrome 30 reports Accept: image/webp,*/*;q=0.8. The q= parameter is not needed for Apache actually, but it’s a good idea to have it there anyway.

Thanks to this change, and mod_negotiation’s MultiViews it’s possible to auto-select the format, between JPEG, PNG and WebP, for Chrome users. Indeed if you’re visiting my blog with Chrome 30 (not sure about 29), you’re going to be served mostly WebP images (the CC license logos are still provided as PNG because the lossless compression was worse, and the lossy one was not saving enough bytes to be worth it).

I started working on enabling this while waiting at Terminal 1 at CDG airport (this is the last time I’m flying Aer Lingus to Paris), and I was able to finish the thing before my flight boarded. What I did realize just before that, though, is that Apache would still prefer serving WebP, I’d venture a guess that it’s because it’s smaller in size. This is okay for Opera, Firefox and (obviously) Chrome, but not for Safari or IE.

Of course if the other browsers were to actually report the formats they supported it would be all fine, but that’s not the case. In particular, Firefox actually prefers image/png to anything else (Apache will add a low q= value to any glob request, just to be on the safe side, which is why I said earlier that q= is not needed for it), so that even if I don’t make any more changes, Firefox will still prefer PNG to WebP (but won’t do anything for JPEG, so if the web server prefers WebP to JPEG, it’s going to be fine).

So how to provide WebP without breaking other browsers? One solution would be to use PageSpeed to compress the images on the fly to WebP when requested by a compatible browser, but that is a bit of overkill, and is hard to package right, and, most importantly, requires browser detection logic on the server, which is not very safe.

At the end I decided to go with a safer option: provide WebP only to Chrome users and not to users of other browsers, at least until they decide to fix their Accept headers. But how to do that? Well, I needed to check Apache’s source code, because documentation does not seem to explain that clearly and explicitly, but to decide which format to serve, Apache will multiply the q= parameter coming from the browser, or its implicit values (that make image/* and */* have a default value of less than 0.1) with the qs= parameter passed when declaring the type:

AddType image/jpeg .jpeg .jpg .jpe
AddType image/png .png
AddType image/webp;qs=0.9 .webp

By using the value 0.9 to webp, and leaving the default 1 to the other formats, I’m basically telling Apache that, all things equals (like if the browser is sending Accept: */*, Internet Explorer style), I prefer to provide PNG or JPEG to the users, rather than WebP. It will also prefer to serve JPEG to Firefox (which uses image/*). Chrome 30, on the other hand, explicitly prefers WebP over any other image format, and so Apache will calculate the preference as 1.0*0.9 for WebP and 0.8*1.0 for PNG and JPEG. I have not checked what Opera does, but it looks like all the browsers on my cellphone would support WebP but they don’t prefer it, so they won’t be served it either.

So right now WebP images for my blog are an exclusive of Chrome users; the win is relatively good, by halving the size of the Autotools Mythbuster cover on the right, and shaving off a few bytes from the top image for the links. There are definitely more interesting way to save bandwidth by re-compressing the images that I used around the blog (many of which, re-compressed, end up taking half the space), but that will have to wait for me to fix this bloody Typo as right now editing posts is failing.

Another thing that I will have to work on is a tool to handle the re-compression. Right now I’m doing so by hand, and it’s both a waste of time, and prone to errors. I’ll have to come up with a good way to quickly describe images so that a tool can re-compress them and evaluate whether to keep them in WebP or not, and at the same time I need to find a way to store the originals at the highest quality. But that’s a topic for a different time.

The WebP experiment

You might have noticed over the last few days that my blog underwent some surgery, and in particular that some even now, on some browsers, the home page does not really look all that well. In particular, I’ve removed all but one of the background images and replaced them with CSS3 linear gradients. Users browsing the site with the latest version of Chrome, or with Firefox, will have no problem and will see a “shinier” and faster website, others will see something “flatter”, I’m debating whether I want to provide them with a better-looking fallback or not; for now, not.

But this was also a plan B — the original plan I had in mind was to leverage HTTP content negotiation to provide WebP variants of the images of the website. This was a win-win situation because, ludicrous as it was when WebP was announced, it turns out that with its dual-mode, lossy and lossless, it can in one case or the other outperform both PNG and JPEG without a substantial loss of quality. In particular, lossless behaves like a charm with “art” images, such as the CC logos, or my diagrams, while lossy works great for logos, like the Autotools Mythbuster one you see on the sidebar, or the (previous) gradient images you’d see on backgrounds.

So my obvious instinct was to set up content negotiation — I’ve used it before for multiple-language websites, I expected it to work for multiple times as well, as it’s designed to… but after setting all up, it turns out that most modern web browsers still do not support WebP *at all*… and they don’t handle content negotiation as intended. For this to work we need either of two options.

The first, best option, would be for browsers only Accept the image formats they support, or at least prefer them — this is what Opera for Android does: Accept: text/html, application/xml;q=0.9, application/xhtml+xml, multipart/mixed, image/png, image/webp, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1 but that seems to be the only browser doing it properly. In particular, in this listing you’ll see that it supports PNG, WebP, JPEG, GIF and bimap — and then it accepts whatever else with a lower reference. If WebP was not in the list, even if it had an higher preference for the server, it would not be sent to the client. Unfortunately, this is not going to work, as most browsers send Accept: */* without explicitly providing the list of supported image formats. This includes Safari, Chrome, and MSIE.

Point of interest: Firefox does explicit one image format before others: PNG.

The other alternative is for the server to default to the “classic” image formats (PNG, JPEG, GIF) and then expect the browsers supporting WebP prioritizing it over the other image formats. Again this is not the case; as shown above, Opera lists it but does not prioritize, and again, Firefox prioritizes PNG over anything else, and makes no special exception for WebP.

Issues are open at Chrome and Mozilla to improve the support but they haven’t reached mainstream yet. Google’s own suggested solution is to use mod_pagespeed instead — but this module – which I already named in passing in my post about unfriendly projects – is doing something else. It’s on-the-fly changing the content that is provided, based on the reported User-Agent.

Given that I’ve spent some time on user agents, I would say I have the experiences to say that this is a huge pandora’s vase. If I have trouble with some low-development browsers reporting themselves as Chrome to fake their way in with sites that check the user agent field in JavaScript, you can guess how many of those are going to actually support the features that PageSpeed thinks they support.

I’m going to go back to PageSpeed in another post, for now I’ll stop to say that WebP has the numbers to become the next generation format out there, but unless browser developers, as well as web app developers start to get their act straight, we’re going to have hacks over hacks over hacks for the years to come… Currently, my blog is using a CSS3 feature with the standardized syntax — not all browsers understand it, and they’ll see a flat website without gradients; I don’t care and I won’t start adding workarounds for that just because (although I might use SCSS which will fix it for Safari)… new browsers will fix the problem, so just upgrade, or use a sane browser.

User-Agent strings and entropy

It was 2008 when I first got the idea to filter User-Agents as an antispam measure. It worked for quite a while on its own, but recently my ruleset had to come up with more sophisticated fingerprinting to discover spammers. It still works better than a captcha, but it did worsen a bit.

One of the reasons why the User-Agent itself is not enough anymore is that my filtering has been hindered by a more important project. EFF’s Panopticlick has shown that the uniqueness of the strings used in User-Agent is actually an easy way to track a specific user across requests. This got so important, that Mozilla standardized their User-Agents starting with Firefox 4, to reduce their size and thus their entropy. Among other things, the “trail” component has been fixed on the desktop to 20100101 and to the same version as Firefox for the mobile version.

_Unfortunately, Mozilla lies on that page. Not only the trail is not fixed for Firefox Aurora (i.e. the alpha version), which means that my first set of rules was refusing access to all the users of that version, but also their own Lightning extension for SeaMonkey appends to the User-Agent, when they said that it wasn’t supported anymore._

A number of spambots seem to get this wrong, by the way. My guess is that they have some code that generates the User-Agent by adding a bunch of fragments, and make it randomize it, so you can’t just kick a particular agent. Damn smart if you ask me, unfortunately, as ModSecurity hashes the IP collection by remote address and user-agent, so if they cycle different user agents, it’s harder for ModSecurity to understand that it’s actually the same IP address.

I do have some reserves on Mozilla’s handling of identification of extensions. First they say that extensions and plugins should not edit the agent string anymore – but Lightning does! – then they suggest that instead they can send an extra header to identify themselves. But that just means that fingerprinting systems only need to start counting those headers as well as the generic ones that Panopticlick already considers.

On the other hand, other browsers don’t seem to have gotten the memo yet — indeed, both Safari’s and Chrome’s strings are long and include a bunch of almost-independent version numbers (AppleWebKit, Chrome, Safari — and Mobile on the iOS versions). It gets worse on Android, as both the standard browser and Chrome provide a full build identifier, which is not only different from one device to the next, but also from one firmware to the next. Given that each mobile provider has its own builds, I would be very surprised if among my friends I was able to find two with the same identifier in their browsers. Firefox is a bit better on that but it sucks in other ways so I’m not using it as my main browser anymore there.

Browsers on the Kindle Fire

A few days ago I talked about Puffin Browser with the intent to discuss into more details the situation with the browsers on the Kindle Fire tablet I’m currently using.

You might remember that at the end of last year, I decided to replace Amazon’s firmware with a CyanogenMod ROM so to get something useful on it. Beside the lack of access to Google Play, one of the problems I had with Amazon’s original firmware was that the browser that it comes with is flakey to the point of uselessness.

While Amazon’s AppStore does include many of the apps I needed or wanted – including SwiftKey Tablet which is my favourite keyboard for Android – they made it impossible to install them on their own firmware. I’ve been tempted to install their AppStore on the flashed Kindle Fire and see if they would allow me to install the apps then, it would be quite a laugh.

Unfortunately, while the CM10 firmware actually allows me to make a very good use of the device, much more than I could ever have reached with the original firmware, the browsing experience still sucks big time. I’ve currently installed a number of browsers: Android’s stock browser – with its non-compliant requests – Google Chrome, Firefox, Opera and the aforementioned Puffin. There is no real winner on the lot.

The Android browser has a terrible network implementation and takes way too much time requesting and rendering pages. Google Chrome is terrible on the whole, probably because the Fire is too underpowered to run it properly, which makes it totally useless as an app. I only keep it around for testing purposes, until I get a better Android tablet.

Firefox has the best navigation support but every time I click on a field and SwiftKey has to be brought up, it takes a full minute. Whether this is a bug in SwiftKey or Firefox, I have no idea. If someone has an idea who to complain about it to, I’d love to report it and see it fixed.

Best option you get, beside Firefox, is Opera. While slightly slower than Firefox on rendering, it does not suffer from the SwiftKey bug. I’m honestly not sure at this point if the version of Opera I’m using right now renders with their own Presto engine or with WebKit which they announced they are moving to — if it’s the latter, it’s going to be a loss for me I guess, since the two surely WebKit based browsers are not behaving nicely for me here.

Now from what I said about Puffin, you’d expect it to behave properly enough. Unfortunately that is not the case. I don’t know if it’s a problem with my local bandwidth being too limited, but in general the responsiveness is worse than Opera, although not as bad as Puffin. The end result is that even the server-side rendering does not make it usable.

More reviews of software running on the Fire will follow, I suppose, unless I decide to get a newer tablet in the next weeks.

Network Security Services (NSS) and PKCS#11

Let’s clear first a big mess. In this post I’m going to talk about dev-libs/nss or, as the title suggests, Network Security Services which is the framework developed by Netscape first, and Mozilla Project now, for implementing a number of security layers, including (especially) SSL. This should not be confused with many others similar acronym, especially with the Name Service Switch which is the interface that allows your applications to resolve hosts and users against database they aren’t designed to use in the first place.

In my previous posts about smartcard-related software components – first and second – I started posting an UML components diagram that was not very detailed but generally readable. With time, and with the need to clarify my own understanding of the whole situation, the diagram is getting more complex, more detailed, but arguably less readable.

In the current iteration of the diagram, a number of software projects are exploded in multiple components, like I originally did with the lone OpenCryptoki project (which I should have been writing about but I hadn’t had enough time to finish cleaning off yet). In particular, I split the NSS component in two sub-components: libnss3 (which provides the actual API for the applications to use), and libnssckbi that provides access to the underlying NSS database. This is important because it shows how the NSS framework actually communicates with itself through the use of the standard PKCS#11 interface.

Anyway, back to NSS proper. To handle multiple PKCS#11 providers – which is what you want to do if you intend to use a hardware token, or a virtual one for testing – you need to register them with NSS itself. If you’re a Firefox user, you can do that from its settings windows, but if you’re a Chromium user, you’re mostly out of luck for what concerns GUI: the official way to deal with certificates et simila with Chromium is to use the NSS command-line utilities available with the utils USE flag for dev-libs/nss.

First of all, by default Mozilla, Evolution and Chromium, and the command-line utilities use three different paths to find their database: one depending on the Mozilla profile, ~/.pki/nssdb and .netscape respectively. Even more importantly, by default the first and last will use an “old” version of the db, based on the Berkeley DB interface, while the other two will use a more modern, SQLite-based database. This is troublesome.

Thankfully, the Mozilla Wiki has an article on setting up a shared database for NSS which you might want to do to make sure that you use the same set of certificates between Firefox, Chromium, Evolution and the command-line utilities. What it comes to be is just a bunch of symlinks. Read the article yourself for the instructions; on the other hand I have to note you to do this as well:

~ % ln -s .pki/nssdb .netscape

This way the nss utilities will use the correct database as well. Remember that you have to logout and log back in to tell the utilities and Firefox to use the SQL database.

Unfortunately I haven’t been able to get a token to work in this environment; from one side I’m afraid I might have busted the one Eva sent me (sigh! but at least it served the purpose of getting most of this running); from the other, Scute does not allow to upload an arbitrary certificate, but only to generate a CSR, which I obviously can’t get signed by StartSSL (which is my current certificate provider). Since I’m getting paranoid about security (even more so since I’ll probably be leaving my servers in an office when I’m not around), I’ll probably be buying an Aladdin token from StartSSL though (which also means I’ll be testing out their middleware). At that point I’ll give you more details about the whole thing.

Don’t try being smarter than me — you aren’t, you’re just a website!

So today the new major version of Firefox was released. I care relatively since I’m actually a Chrome user — and the one killer feature Firefox had over Chrome is now gone (Delicious extension, doesn’t work with Firefox 4 yet). But I have to test Firefox, and today I was explicitly working on my rule set so I was interested to get the new version of Firefox on all of my systems. Also, my mother uses both Chrome and Firefox (don’t ask) and thus I had to download a copy for her as well.

This would usually not be a problem: you go to the website and download the copy you need to run on your system, even better it provides you a direct link for the package you need to use on the system you’re visiting from. But I’m not your average user, I wanted to download the Windows and OS X versions of Firefox from my main workstation, running Linux. I don’t need the Linux version here for two reasons: I don’t use Firefox here at all; if I did, I would let Portage manage that.

For most download pages, there is a simple way to deal with that as well: you have a link “Other operating systems and languages”, that gives you access to the other binary packages — I actually like LibreOffice choice: it loads all the platforms and languages in selection boxes, select by default the one you’re visiting from, but easily let you choose the others for the same language. If you visit the English/American Firefox homepage there is such a link indeed.

But since my locale is set to Italian on this computer, whenever I visit either getfirefox.com or mozilla.com I hit a captive redirect that brings me to the Italian version of the website. This is a way to be friendly to the users I’m sure, but there are two issues:

  • the link to other operating systems and languages is gone, replaced instead by the release notes; Jo (directhex) noticed that the same is true for French and German sites as well;
  • there is no way to navigate from the Italian website to the English one or the other way around: this makes it impossible to reach the page above.

And this is not the first time I find something like this, but finding it on a package such as Firefox, really stressed me up.

I killed enough trees…

Pathetic as it is, this Saturday evening was spent, by me at least, cleaning up old and new paper. And that’s the kind of trees I’m talking about.

Indeed, since I now have a self-employed activity I need to have a paper trail of all the invoices sent and received, of all the contracts, of all the shipment bills and so on. While I prefer being paperless for handling, and thus I scan everything I get (invoices, contracts, shipment bills, …) I have to keep the paper trail for my accountant, at least for now. This also means printing the stuff that otherwise I wouldn’t be printing (!) like Apple’s invoices. I was hoping to avoid that, but it turns out that my accountant wants the paper.

Interestingly enough, printing from within Firefox here on Linux is a bit of a problem: it sets itself to use Letter, even though my /etc/papersize is set properly to a4 and LC_PAPER is set to it_IT (which is, obviously, A4). It really baffles me because it starts already to be a nuisance that you have to have libpaper, when the locale settings already would have the support for discerning between different paper sizes, but the fact that Firefox also defaults to Letter (which is basically only used in US for what I know of) without having an option to change (yeah I did so already with about:config, no change) is definitely stupid.

Luckily, most of the references I’ve been using lately are available in PDF, and thanks to the Sony Reader I don’t have to print them out. What I decided to cut lately, as well, is on CDs: most stuff I can get easily on Apple’s iTunes Store(yes I know it’s not available on Linux, but the music is not DRM’d, it’s in a good format, and it’s not overly expensive); too bad that they don’t have a (more expensive even) ALAC store, or I would be buying also my Metal music (AAC does no good to metal).

Games aren’t as easy: I don’t have space on the PS3 already, and I bought just a couple from the Play Station Network store, nor I have space on the PSP, and additionally, downloaded games with an Italian account are twice the price that I can get from Amazon. Sony, if you’re reading, this is the time to fix this issue! Especially with the PSP Go! coming, I don’t think it’s going to sell well among game enthusiasts, and I’m quite sure that those who will get it will probably hate the extra-high prices.

Anyway, since I’m avoiding buying CDs and rather going with the iTunes Store, and I cannot accept direct donations any longer, you can now also consider their gift cards, they are certainly accepted…

Stash your cache away

While I’m now spending a week out of my home (I’m at my sister’s family place, while she’s at the beach), I still be working, and writing blog posts, and maybe taking care of some smaller issues in Gentoo. I’m just a bit hindered becaues while I type on the keyboard I often click something away with the trackpad; I didn’t think about getting a standalone keyboard. I guess if somebody would want to send my way an Apple bluetooth keyboard I wouldn’t be saying no.

While finally setting up a weekly backup of my /home directory, yesterday, I noticed quite a few issues with the way software makes use of it. The first thing of course was to find the right software to do the job; I opted for a simple rsync in cron, after all I don’t care much about having incremental multiple backups a-la Time Machine, having a single weekly copy of my basic data is good enough.

The second problem was that, some time ago, I found that having a 4GB USB flash drive was enough if I wanted to copy the home, but when I looked at it yesterday, I found it being well over 5GB. How did that happen? Some baobab later, I find the problems. From one side, my medical records, (over 500 pages) scanned with a hi-grade all-in-one laser printer (no, not by me at home), are too big. They might have been scanned as colour documents (they are photocopies, so that’s not really right) or they might be at huge resolution, I have to check that, since having over half a gig of just printed records is a bit too much for me (I also have another full CD of CT scan data).

The second problem is that a lot of software misuses my home by writing down cache and temporary files in it rather than in the proper locations. Let me explain: if you need to create a temporary file or socket to communicate between different software in the same host, rather than writing it to my home, you should probably use TMPDIR (like a lot of software, fortunately, does). The same goes if you write cache data, and yes I’m referring to you, Evolution and Firefox, but also to Adobe Flash, Sun JDK and IcedTea.

Indeed, the FreeDesktop specifications already provide an XDG_CACHE_DIR variable that can be used to change the place where cache data should be saved, defaulting to ~/.cache, and in my system set to /var/cache/users/flame. This way, all the (greedy) cache systems would be able to write as much data as they want, without either wasting my space on the backup flash, or forcing me to write them to two disks (/var/cache is in a sort-of throwaway disk).

For now I resolved by making some symlinks, hoping they keep stable, and creating a ~/.backup-ignore file, akin to .gitignore with the paths to the stuff that I don’t want backed up. The only problem I really have is with evolution because that one has so many subdirectories and I can’t really understand what I should backup and what not.

Oh and there are a few more problems there: the first is that a lot of software over the past two years migrated from just the home to ~/.config but the old files were kept around (nautilus is an example) and a few directories contained very very old and dusty session data that wasn’t cleared up properly.

Providing too many configuration options to tell where the stuff is, can definitely lead to bad problems, but using the right environment variable to decide where stuff should go and where it should be looked up at, can definitely solve lots of your problems!

Upgrading Typo

It’s again that time of the year when I get tired and decide to update Typo; although this is probably one of the most invasive changes since I started using this software (it moved from Subversion to GIT), it seems to have been the one with less issues to fix. Although some were quite nasty.

First problem is that the version you can find in git right now is broken and won’t start up… and to “blame” is our very own Hans (graaf)! I say “blame” because it’s not really his fault, and I just worked it around by removing the Dutch localisation, which I don’t need anyway. I had to fix up the theme to work with the new Typo code, but I also took the time to fix a few more obnoxious things in the theme so that it now looks nicer too.

There were just two main issues with the update: the easy one is that the users don’t get activated after migration, which means you cannot login, a psql call after and I’m back in; the other problem is that the Atom feed generated was invalid, because to replace the HTML entities like eacute for é and similar, it decoded all the entities… included the lt/gt used to avoid injecting tags into the posts; luckily for me I had one such “fake tag” right in my previous post so I have noticed this problem right away; I hacked it around for now, as soon as I have little more time I’m going to actually fix it properly.

I had to update the mod_security access restriction since now comments are posted through a single /comments URI, which actually makes it much nicer. I’m going to update it and post it ASAP. The live search support (which I already removed from the template a few days ago) now is optional in Typo itself, which is good; I have to port the Google custom search code to the new sidebar plugin interface, to make it blend better.

I’ll require a couple more packages to be added to portage to work too, but that can happen later, not today that I’m already swamped a bit. I like the new admin interface, although it increased the size of fonts (although not as much as WordPress) and I hate huge fonts (Firefox does not allow me to use smaller fonts it seems; the rest of my system is set to 75dpi – which is fake – but works fine, Firefox does not accept that).

Is Firefox really that bad?

When I’ve read some rants about Firefox I thought they were a little bit too much. Now, I start to wonder if they were quite to the point instead. But before I start I have to say I haven’t tried contacting anybody yet, neither from the Gentoo Mozilla team not upstream. And I’m sure the Gentoo Mozilla team are doing their best to make sure that they can provide a working Firefox still following upstream guidelines on trademarks.

This actually sprouted from my previous work inspecting library paths; I went to check which libraries for firefox-bin were loaded from the system library directory, and noticed one curious thing: /usr/lib/libsqlite3.so was being loaded. What’s the problem? The problem is that I knew that xulrunner (at least built from sources) bundles its own copy of SQLite3, so I wondered if they used the system copy for the binary package. Funnily enough, they really don’t:

yamato link-collisions # ldd /opt/firefox/firefox-bin | grep sqlite3
    libsqlite3.so => /opt/firefox/libsqlite3.so (0xf67e7000)
    libsqlite3.so.0 => /usr/lib/libsqlite3.so.0 (0xf621e000)
yamato link-collisions # lddtree.sh /opt/firefox/firefox-bin | grep sqlite3 -B1
    libxul.so => /opt/firefox/libxul.so
        libsqlite3.so => /opt/firefox/libsqlite3.so
--
        libsoftokn3.so => /usr/lib/nss/libsoftokn3.so
            libsqlite3.so.0 => /usr/lib/libsqlite3.so.0

(The lddtree.sh script comes from pax-utils and uses scanelf. I have a similar script in my Ruby-Elf suite implemented as a testcase, it produces the same results, basically.)

So the binary version of the package uses the system copy of NSS and thus loads the system copy of SQLite3. I haven’t gone as far as checking where the symbols were resolved, but one of the two is going to be loaded and unused, wasting memory (clean and dirty, for relocated data sections). Not nice, but one can say it’s the default binary, and has to know to adapt. In truth the problem here is that upstream didn’t use rpath, and thus the firefox-bin program does not load all its libraries from the /opt/firefox directory (since the /usr/lib/nss directory comes first). Had they built their binary with rpath set to $ORIGIN it would have loaded everything from /opt/firefox without caring about the system libraries, like it was intended to do. Interestingly enough, they do just that for Solaris, but not for Linux where they prefer fiddling with LD_LIBRARY_PATH.

Next, I checked the /usr/bin/firefox started, which I already copied on the other post:

#!/bin/sh
export LD_LIBRARY_PATH="/usr/lib64/mozilla-firefox"
exec "/usr/lib64/mozilla-firefox"/firefox "$@"

Let’s ignore the problem with the rewriting of the environment variable, which I don’t care about right now, and check what it does. It adds the /usr/lib64/mozilla-firefox directory to the list of paths to load libraries from. Since it’s setting LD_LIBRARY_PATH all the library resolutions will have to be done manually rather than using the ld.so.cache file. So I checked which libraries it loads from there:

flame@yamato ~ % LD_LIBRARY_PATH=/usr/lib64/mozilla-firefox ldd /usr/lib64/mozilla-firefox/firefox | grep mozilla-firefox
flame@yamato ~ % scanelf -E ET_DYN /usr/lib64/mozilla-firefox 
 TYPE   FILE 
ET_DYN /usr/lib64/mozilla-firefox/libjemalloc.so 

(The second commands finds all the libraries in the given path, by checking for ET_DYN, dynamic ELF, files.)

Okay so there is one library, but it’s not in the NEEDED lines of the firefox executable. Indeed that library is a preloadable library with a different malloc() implementation (remember I’ve written about similar things and commented about FreeBSD solution), which means it has to be passed through LD_PRELOAD to be useful, and I can’t see that to be used at all. Indeed, if I check the loaded libraries on my firefox process I can’t find it:

flame@yamato x86 % fgrep jemalloc /proc/`pidof firefox`/smaps
flame@yamato x86 % 

Let’s go step by step though, for now we can say with enough safety that the loader is overwriting LD_LIBRARY_PATH with no apparent good reason. Which libraries does the firefox executable load then?

flame@yamato ~ % LD_LIBRARY_PATH=/usr/lib64/mozilla-firefox ldd /usr/lib64/mozilla-firefox/firefox
    linux-vdso.so.1 =>  (0x00007fffcabfd000)
    libdl.so.2 => /lib/libdl.so.2 (0x00007fa5c2647000)
    libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/libstdc++.so.6 (0x00007fa5c2338000)
    libc.so.6 => /lib/libc.so.6 (0x00007fa5c1fc5000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fa5c284b000)
    libm.so.6 => /lib/libm.so.6 (0x00007fa5c1d40000)
    libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007fa5c1b28000)
flame@yamato ~ % scanelf -n /usr/lib64/mozilla-firefox/firefox 
 TYPE   NEEDED FILE 
ET_EXEC libdl.so.2,libstdc++.so.6,libc.so.6 /usr/lib64/mozilla-firefox/firefox 

It can’t be right, can it? We know that Firefox loads GTK+ and a bunch of other libraries, starting with xulrunner itself, but there is no link to those. But if you know your linker you should notice a funny thing: libdl.so.2. It means the exeutable is calling into the loader at runtime, which usually means dlopen() is used. Indeed it seems like the firefox executable loads the actual browser at runtime, as you can see by checking the smaps file.

Now there are two things to say here: there is a reason why firefox would be doing that, and the reason is that calling “firefox” with it open already should actually request a new window to be opened, rather than opening a new process. So basically I expect the executable to contain a launcher, that if a copy of firefox is running already just tells that to open a new window, and otherwise loads all the libraries and stuff. It’s a good idea, from one point of view because initialising all the graphical and rendering libraries just to tell another process to open a window would be a waste of resources. On the other hand, dlopen() is not the best performing approach and also creates problem to prelink.

I have no idea why it happens, but the binary package as released by upstream provides a script that seems to be taking care of the launching, and then a firefox-bin executable that doesn’t use dlopen() to load the Gecko engine and all the graphical user interface. I would very much like to know why we don’t do the same for from-source builds, I would sincerely expect that the results would be even better when using prelink and similar.

Now, let’s return a moment to the problem of the SQLite3 loaded twice for the binary release of Firefox, surely the same wouldn’t happen for the from-source version, would it? Check it by yourself:

flame@yamato x86 % fgrep sqlite /proc/`pidof firefox`/smaps
7fea6c8c2000-7fea6c935000 r-xp 00000000 fd:08 701632                     /usr/lib64/libsqlite3.so.0.8.6
7fea6c935000-7fea6cb35000 ---p 00073000 fd:08 701632                     /usr/lib64/libsqlite3.so.0.8.6
7fea6cb35000-7fea6cb36000 r--p 00073000 fd:08 701632                     /usr/lib64/libsqlite3.so.0.8.6
7fea6cb36000-7fea6cb38000 rw-p 00074000 fd:08 701632                     /usr/lib64/libsqlite3.so.0.8.6
7fea814dc000-7fea8154f000 r-xp 00000000 fd:08 24920                      /usr/lib64/xulrunner-1.9/libsqlite3.so
7fea8154f000-7fea8174f000 ---p 00073000 fd:08 24920                      /usr/lib64/xulrunner-1.9/libsqlite3.so
7fea8174f000-7fea81751000 r--p 00073000 fd:08 24920                      /usr/lib64/xulrunner-1.9/libsqlite3.so
7fea81751000-7fea81752000 rw-p 00075000 fd:08 24920                      /usr/lib64/xulrunner-1.9/libsqlite3.so

Yes, yes it does happen. So I have a process that is loading one library for no good reason at all at runtime, and not a little one at that, when it could probably, at this point, use a single system SQLite library. I say that it could, because now I have enough evidence to support that: if the two libraries had a different ABI, depending on which one the symbols resolve to, either xulrunner or NSS would be crashing down. Since ELF uses a flat namespace, the same symbol name cannot be resolved in two different libraries, and thus one of the two libraries using them would find them in the “ẅrong” copy. And no, before you ask, neither use symbol versioning.

So at this point the question is: can both Firefox upstream and the Gentoo Firefox ebuild start providing something that does more than just working and actually works properly?