Not all failures are caused equal

Ryan seem to think of me of inconsistent for promoting Portage failures when _FORTIFY_SOUCE points out a sure fortification error. The related bug is getting more useful now than it was when it was started, but shows not only some misunderstanding of my position regarding breakages and QA in general.

So first of all, let me repeat (because I said that before) that I prefer build-time failures to run-time failures. Why? It’s not difficult to understand, when you think about it, it covers at least my three different use cases:

  • I have two remote vservers running Gentoo; I cannot build on them (and neither should you!), but I keep a local chroot environment where I pre-build binary packages for the software I need; I update the environment once daily, and if it gets special security concerns; when I do that, I build the packages and upload them to the servers, which can take quite a bit of time as my upload is around 40 KiB/s;
  • also, I connect my whole house (my computers, my mother’s, and a bunch of various devices and appliances; plus the guests’ devices) via not a standard SoHo router, but a much more flexible Gentoo box, running basically “embedded” on a Celeron board with only SSH and serial console access; I update the “firmware” of the system once in a blue moon, mostly for security concerns, by flashing a CF card that is connected to the EIDE bus; also here I use a chroot environment to build the packages;
  • finally, I have a laptop, also running Gentoo; this one is updated more irregularly, because there are days I don’t even touch it; I also try not to build stuff when I’m on the run because I might not have enough bandwidth to download the sources (and while connected to my local network I access them directly).

In all these three cases, I’m bothered by build failures, but they aren’t much of a problem; if I’m building something in a pinch is a nuisance, in other cases, upgrade or new build, I usually have time to look up what is broken before it gets a real problem. If any package is failing at runtime, though, it’s much worse than a nuisance. If Apache dies, then I have to downgrade and go rebuild another one quickly; if the router’s SSH daemon crashes, I have to go downstairs with the laptop and access the serial console; and if at that moment the picocom tool I use to access the serial console doesn’t start, then I’m simply going to cry.

So with this situation at hand, can you blame me for preferring stricter compilers and build managers (don’t forget that Portage and its cousins are not just package managers à-la APT, but also build managers à-la FreeBSD Ports) that actually disallows installation of broken code? This is the same reason why we added --no-undefined to Ruby — rather than hiding the problems under the hood and leaving the car to melt down on failure, we notice the leak beforehand, and disallow it from ever leaving the garage!

As Mike pointed out and I exemplified on the bug linked above, even if the code is, at a bird’s eye, perfectly fine, because it relies on a number of assumptions about how the compiler and the C libraries work, if the compiler is reporting a sure failure, it is going to fail. This is the same reason why, with GCC 4.5, we had a runtime breakage of GNU tar, as it tried to fill two adjacent strings in a structure with a single copy, rather than two. It wasn’t going to create an actual overflow problem, but it was stepping over the final character; not only the compiler warned about it but… the C library aborted at runtime!

Now you could argue that it means that they are not discerning between hacky-but-good solutions to improve performance, and security-vulnerable code. But in truth, you have to take a standing at some point, and it makes total sense to actually be as safe as possible, especially in today’s world where security should be that much of a concern and performances can be easily improved with faster hardware.

This should cover my reasons to be in favour of dying when the code provides a path where the C library is going to abort at runtime (and it’s not properly controller). What about my actual criticising the unmasking of glibc and other software before, when they caused failures? Well it is also two-folded. From one side, glibc 2.12 does not only causes build-time failures; as I explained there are enough indication that software could abort at runtime when an header is missing in the sources; but whatever the problem is, let’s look at it from a need perspective; what is the reason for this failure to be introduced? Simply to clean up the code, and make it faster to build (less headers are less files to load and parse), so you force users to include what they need, rather than include a long chain of dependent headers, it’s a laudable target, but mostly one of optimisations and enhancement.

On the other hand, fortification features are used to increase security by mitigating possible vulnerabilities in the software design. This wide difference in the root reason for the breakage alone is what make them different in my personal assessing of the problem. And because of this, I think there is a case for Portage aborting right away in ~arch for packages with known overflows, even though I maintain that glibc 2.12 and similar problems shouldn’t have been left loose on the users. The former does not make it more broken, it makes it safer!

And tomorrow, “Is fixing implicit declarations for me?”

2 thoughts on “Not all failures are caused equal

  1. What aggravated me was that there was no warning that this error was going to be introduced (not even in the Changelog). If there had been some announcement made that would have given us time to fix the major offenders then I would have gotten used to the idea.Now that I have, I think the change itself is a good idea. It took me a bit of bitching (as usual) but I’m coming around.


  2. “Please do not file a Gentoo bug and instead report the above QA ││issues directly to the upstream developers of this software. ││Homepage: “This simple line is used by developers to close bugs WONTFIX as in #337943It should not appear. QA should track issues with all installed software this could lead to earlier removal from the tree in issues of security perhaps.Sometimes upstream is only an email link or a webmaster mail address. The info should be stored or if a link to an upstream bugreport found linked to.Upstream should be notified of the bug as filed build log and emerge –info should be postedWONTFIX should be reserved for deprecations, trivialities, can’t implement and such. Just my $.02


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s