The PIE is not exactly a lie…

Update (2017-09-10): The bottom line of this article changed since the 8 years it was posted, quite unsurprisingly. Nowadays, vanilla kernel has a decent ASLR and so everyone does actually have advantages in building everything as PIE. Indeed, Arch Linux and probably most other binary distributions do exactly that. The rest of the technical description of why this is important and how is still perfectly valid.

One very interesting misconception related to Gentoo, and especially the hardened sub-profile, is related to the PIE (Position-Independent Executable) support. This is probably due to the fact that up to now the hardened profile always contained PIE support, and since it relates directly to PIC (Position-Independent Code) and PIC as well is tied back to hardened support, people tend to confuse what technique is used for what scope.

Let’s start with remembering that PIC is a compilation option that produces the so-called relocatable code; that is, code that is valid no matter what base address it is loaded at. This is a particularly important feature for shared objects: to be able to be loaded by any executable and still share the code pages in memory, the code needs to be relocatable; if it’s not, a text relocation has to happen.

Relocating the “text” means changing the executable code segment so that the absolute addresses (of both functions and data — variables and constants) is correct for the base address the segment was loaded at. Doing this, causes a Copy-on-Write for the executable area, which among other things, wastes memory (each process running will have to have its private copy of the executable memory area, as well as the variable data memory area). This is the reason why shared objects in almost any modern distribution are built relocatable: faster load time, and reduced memory consumption, at the cost of sacrificing a register.

An important note here: sacrificing a register, which is something needed for PIC to keep the base address of the loaded segment, is a minuscule loss for most architectures, with the notable exception of x86, where there are very few general registers to use. This means that while PIC code is slightly (but not notably) slower for any other architecture, it is a particularly heavy hit on x86, especially for register-hungry code like multimedia libraries. For this reason, shared objects on x86 might still be built without PIC enabled, at the cost of load time and memory, while for most other architectures, the linker will refuse to produce a shared object if the object files are not built with PIC.

Up to now, I said nothing about hardened at all, so let me introduce the first relation between hardened and PIC: it’s called PaX in Hardened Linux, but the same concept is called W^X (Write xor eXecute) in OpenBSD – which is probably a very descriptive name for a programmer – NX (No eXecution) in CPUs, and DEP (Data Execution Prevention) in Windows. To put it in layman terms, what all these technologies do is more or less the same: they make sure that once a memory page is loaded with executable code, it cannot be modified, and vice-versa that a page that can be modified cannot be executed. This is, like most of the features of Gentoo Hardened, a mitigation strategy, that limits the effects of buffer overflows in software.

For NX to be useful, you need to make sure that all the executable memory pages are loaded and set in stone right away; this makes text relocation impossible (since they consists of editing the executable pages to change the absolute addresses), and also hinders some other techniques, such as Just-In-Time (JIT) optimisation, where executable code is created at runtime from an higher, more abstract language (both Java and Mono use this technique), and C nested functions (or at least the current GCC implementation, that makes use of trampolines, and thus require executable stack).

Does any of this mean that you need PIC-compiled executables (which is what PIE is) to make use of PaX/NX? Not at all. In Linux, by default, all executables are loaded at the same base address, so once the code is built, it doesn’t have to be relocated at all. This also helps optimising the code for the base case of no shared object used, as that’s not going to have to deal with PIC-related problems at all (see this old post for more detailed information about the issue).

But in the previous paragraph I did write some clue as to what the PIE technique is all about; as I said, the reason why PIE is not necessary is that by default all executables are loaded at the same address; but if they weren’t, then they’d be needing either text relocations or PIC (PIE), wouldn’t they? That’s the reason why PIE exists indeed. Now, the next question would be, how does PIE relate to hardened? Why does the hardened toolchain use PIE? Does using it make it magically possible to have a hardened system?

Once again, no, it’s not that easy. PIE is not, by itself, neither a security measure nor a mitigation strategy. It is, instead, a requirement for the combined use of two mitigation strategy, the first is the above-described NX idea (which rules out the idea of using text relocations entirely), while the second is is ASLR (Address Space Layout Randomization). To put this technique also in layman terms, you should consider that a lot of exploit require that you change the address a variable points to, so you need to know both the address of that variable, and the address to point it to; to find this stuff out, you can usually try and try again until you find the magic values, but if you randomize the addresses where code and data are loaded each time, you make it much harder for the attacker to guess them.

I’m pretty sure somebody here is already ready to comment that ASLR is not a 100% safe security measure, and that’s absolutely right. Indeed here we have to make some notes as to which situation this really works out decently: local command exploits. When attacking a server, you’re already left to guess the addresses (since you don’t know which of many possible variants of the same executable the server is using; two Gentoo servers rarely have the same executable either, since they are rebuilt on a case by case basis — and sometimes even with the same exact settings, the different build time might cause different addresses to be used); and at the same time, ASLR only changes the addresses between two executions of the same program: unless the server uses spawned (not cloned!) processes, like inetd does (or rather did), then the address space between two requests on the same server will be just the same (as long as the server doesn’t get restarted).

At any rate, when using ASLR, the executables are no longer loaded all at the same address, so you either have to relocate the text (which is denied by NX) or you’ve got to use PIE, to make sure that the addresses are all relative to the specified base address. Of course, this also means that, at that point, all the code is going to be PIC, losing a register, and thus slowed down (a very good reason to use x86-64 instead of x86, even on systems with less than 4GiB of RAM).

Bottomline of the explanation: using the PIE component of the hardened toolchain is only useful when you have ASLR enabled, as that’s the reason why the whole hardened profile uses PIE. Without ASLR, you will have no benefit in using PIE, but you’ll have quite a few drawbacks (especially on the old x86 architecture) due to building everything PIC. And this is also the same reason why software that enables PIE by itself (even conditionally), like KDE 3, is doing silly stuff for most user systems.

And to make it even more clear: if you’re not using hardened-sources as your kernel, PIE will not be useful. This goes for vanilla, gentoo, xen, vserver sources all the same. (I’m sincerely not sure how this behave when using Linux containers and hardened sources).

If you liked this explanation that costed me some three days worth of time to write, I’m happy to receive appreciation tokens page likes — yes this is a shameless plug, but it’s also to remind you that stuff like this is the reason why I don’t write structured documentation and stick to simple, short and to the point blogs.