Update (2017-09-10): The bottom line of this article changed since the 8 years it was posted, quite unsurprisingly. Nowadays, vanilla kernel has a decent ASLR and so everyone does actually have advantages in building everything as PIE. Indeed, Arch Linux and probably most other binary distributions do exactly that. The rest of the technical description of why this is important and how is still perfectly valid.
One very interesting misconception related to Gentoo, and especially the hardened sub-profile, is related to the PIE (Position-Independent Executable) support. This is probably due to the fact that up to now the hardened profile always contained PIE support, and since it relates directly to PIC (Position-Independent Code) and PIC as well is tied back to hardened support, people tend to confuse what technique is used for what scope.
Let’s start with remembering that PIC is a compilation option that produces the so-called relocatable code; that is, code that is valid no matter what base address it is loaded at. This is a particularly important feature for shared objects: to be able to be loaded by any executable and still share the code pages in memory, the code needs to be relocatable; if it’s not, a text relocation has to happen.
Relocating the “text” means changing the executable code segment so that the absolute addresses (of both functions and data — variables and constants) is correct for the base address the segment was loaded at. Doing this, causes a Copy-on-Write for the executable area, which among other things, wastes memory (each process running will have to have its private copy of the executable memory area, as well as the variable data memory area). This is the reason why shared objects in almost any modern distribution are built relocatable: faster load time, and reduced memory consumption, at the cost of sacrificing a register.
An important note here: sacrificing a register, which is something needed for PIC to keep the base address of the loaded segment, is a minuscule loss for most architectures, with the notable exception of x86, where there are very few general registers to use. This means that while PIC code is slightly (but not notably) slower for any other architecture, it is a particularly heavy hit on x86, especially for register-hungry code like multimedia libraries. For this reason, shared objects on x86 might still be built without PIC enabled, at the cost of load time and memory, while for most other architectures, the linker will refuse to produce a shared object if the object files are not built with PIC.
Up to now, I said nothing about hardened at all, so let me introduce the first relation between hardened and PIC: it’s called PaX in Hardened Linux, but the same concept is called W^X (Write xor eXecute) in OpenBSD – which is probably a very descriptive name for a programmer – NX (No eXecution) in CPUs, and DEP (Data Execution Prevention) in Windows. To put it in layman terms, what all these technologies do is more or less the same: they make sure that once a memory page is loaded with executable code, it cannot be modified, and vice-versa that a page that can be modified cannot be executed. This is, like most of the features of Gentoo Hardened, a mitigation strategy, that limits the effects of buffer overflows in software.
For NX to be useful, you need to make sure that all the executable memory pages are loaded and set in stone right away; this makes text relocation impossible (since they consists of editing the executable pages to change the absolute addresses), and also hinders some other techniques, such as Just-In-Time (JIT) optimisation, where executable code is created at runtime from an higher, more abstract language (both Java and Mono use this technique), and C nested functions (or at least the current GCC implementation, that makes use of trampolines, and thus require executable stack).
Does any of this mean that you need PIC-compiled executables (which is what PIE is) to make use of PaX/NX? Not at all. In Linux, by default, all executables are loaded at the same base address, so once the code is built, it doesn’t have to be relocated at all. This also helps optimising the code for the base case of no shared object used, as that’s not going to have to deal with PIC-related problems at all (see this old post for more detailed information about the issue).
But in the previous paragraph I did write some clue as to what the PIE technique is all about; as I said, the reason why PIE is not necessary is that by default all executables are loaded at the same address; but if they weren’t, then they’d be needing either text relocations or PIC (PIE), wouldn’t they? That’s the reason why PIE exists indeed. Now, the next question would be, how does PIE relate to hardened? Why does the hardened toolchain use PIE? Does using it make it magically possible to have a hardened system?
Once again, no, it’s not that easy. PIE is not, by itself, neither a security measure nor a mitigation strategy. It is, instead, a requirement for the combined use of two mitigation strategy, the first is the above-described NX idea (which rules out the idea of using text relocations entirely), while the second is is ASLR (Address Space Layout Randomization). To put this technique also in layman terms, you should consider that a lot of exploit require that you change the address a variable points to, so you need to know both the address of that variable, and the address to point it to; to find this stuff out, you can usually try and try again until you find the magic values, but if you randomize the addresses where code and data are loaded each time, you make it much harder for the attacker to guess them.
I’m pretty sure somebody here is already ready to comment that ASLR is not a 100% safe security measure, and that’s absolutely right. Indeed here we have to make some notes as to which situation this really works out decently: local command exploits. When attacking a server, you’re already left to guess the addresses (since you don’t know which of many possible variants of the same executable the server is using; two Gentoo servers rarely have the same executable either, since they are rebuilt on a case by case basis — and sometimes even with the same exact settings, the different build time might cause different addresses to be used); and at the same time, ASLR only changes the addresses between two executions of the same program: unless the server uses spawned (not cloned!) processes, like inetd does (or rather did), then the address space between two requests on the same server will be just the same (as long as the server doesn’t get restarted).
At any rate, when using ASLR, the executables are no longer loaded all at the same address, so you either have to relocate the text (which is denied by NX) or you’ve got to use PIE, to make sure that the addresses are all relative to the specified base address. Of course, this also means that, at that point, all the code is going to be PIC, losing a register, and thus slowed down (a very good reason to use x86-64 instead of x86, even on systems with less than 4GiB of RAM).
Bottomline of the explanation: using the PIE component of the hardened toolchain is only useful when you have ASLR enabled, as that’s the reason why the whole hardened profile uses PIE. Without ASLR, you will have no benefit in using PIE, but you’ll have quite a few drawbacks (especially on the old x86 architecture) due to building everything PIC. And this is also the same reason why software that enables PIE by itself (even conditionally), like KDE 3, is doing silly stuff for most user systems.
And to make it even more clear: if you’re not using hardened-sources as your kernel, PIE will not be useful. This goes for vanilla, gentoo, xen, vserver sources all the same. (I’m sincerely not sure how this behave when using Linux containers and hardened sources).
If you liked this explanation that costed me some three days worth of time to write, I’m happy to receive appreciation tokens page likes — yes this is a shameless plug, but it’s also to remind you that stuff like this is the reason why I don’t write structured documentation and stick to simple, short and to the point blogs.
Thank you for your time explaining these things. How do the jvm (icedtea) and mono deal with running on a hardened setup? And other jits, for that matter. And how about running a binary blob (nvidia’s drivers or google earth, for example)?I know the v8 people (google’s javascript engine) saw a significant slowdown in benchmarks when compiling v8 as PIC into chrome, so they made sure to compile it not-PIC (this was when there was no 64-bit support). I wonder what chromium does in gentoo.Also, some hand-coded assembly code in multimedia libraries can’t code with being compiled into a PIC shared object (probably because it clobbers the reserved register or something). Is that still a problem and do you know how to fix those, in general?
Zeev: Mono will not run on hardened setup without disabling secure memory protection for this executable. By using the paxctl utility, it is possible to not enforce secure memory protection for an executable, by issuing this command:$ paxctl -m filepathFor info on running Mono on hardened setup:http://en.gentoo-wiki.com/w…
So you say:”At any rate, when using ASLR, the executables are no longer loaded all at the same address, so you either have to relocate the text (which is denied by NX) or you’ve got to use PIE,”and then you say:”if you’re not using hardened-sources as your kernel, PIE will not be useful.”But how about the ASLR in vanilla-kernel?
@Xake PIE is only useful with the NX bit enabled, so it’s still not tremendously useful with just the ASLR. Beside, if I’m not mistaken the ASLR in the Linux kernel as it is is mild enough and only randomizes libraries and heap/stack space rather than executables.
@Lars that post on the wiki is one of the reason why I definitely *hate* the gentoo-wiki site. What a stupid method…
Nah, since 2.6.25 the ASLR in the kernel does also randomize for executables compiled/linked with “-fPIE -pie”. And yeah it is still pretty soft compared to for example PaX, but it is still a bit from “mild ASLR” to “no ASLR”… And when it comes to the NX-bit, at least all 64-bit processors and also PAE-kernels (on PAE-capable hardware) does enable the usage of the NX bit.So I would say that -fPIE -pie on a 64-bit machine is a good choice securitywise no matter what kernel you are running as long as it is new enough, on 32 bit without PAE/NX the same applies if you can live with the speed hit, and that text-relocs are not enforced.
Ah thanks Xake I didn’t know they upgraded the ASLR in the kernel, I’ll take that into consideration then. So the point would be “PIE is only useful if you use hardened-sources, or if you use kernel 2.6.25 onward”.
Just dropped to say that this was a good write-up. While here, I’ll make a sort note about the BSD systems and PIE.NetBSD can be compiled with PIE enabled, although the support is currently somewhat shaky; like it is presumably also in Gentoo (or at least was the last time I used hardened Gentoo).Haven’t heard of PIE in FreeBSD, but haven’t looked that closely.OpenBSD is working on making the whole system PIE by default, which is a natural choice for them. Kurt Miller presented their implementation at DCBSDCon 2009. Slides here:http://www.dcbsdcon.org/spe…
Last I checked, on FreeBSD 7, PIE was not only not supported, but created files that couldn’t be executed by the FreeBSD kernel; I have no idea if this was fixed in FreeBSD 8 (I know people were working on Hardened FreeBSD).To be honest, though, I’m still quite skeptical about ASLR, especially alone, as a mitigation strategy, but that’s something I can write about another day.
So, the whole attempt to harden Gentoo is a snap on a NXed CPUs, with x86-64 support, and a pain otherwise? I have a Pentium III that I want to use for a hardened Gentoo, should I rather go vanilla?
Not a pain, but it might be a non-trivial compromise for x86 systems; it really depends what you want to do: if it’s a webserver or a database server or something along those lines it’s probably going to be just fine to use Gentoo Hardened; if it’s a local server that is not supposed to be connected to the outside, it might be a bit too much, and the same goes if it’s a desktop that’s supposed to handle multimedia contents.
Hi Flameeyes,Another nice write-up, thank you. I always enjoy and learn something reading your blog. I was going to make a correction regarding ASLR in recent mainline linux kernels, but I see Xake already provided, which is true. :)PIE by default thus makes sense from the security conscious standpoint and consideration should be given for some means to enable such an option for mainline Gentoo users IMO. Even on x86 typically doesn’t incur that high performance hit (as you stated libs are typically PIC already). Where the big loss on x86-32 comes is by -fomit-frame-pointer, which becomes pointless.I must disagree with one other point. ASLR is not meant as a defense against local attacks, it is a poor defense against local vectors even on 64-bit. Firstly, we must also protect from more+easier leakage of info (like /proc/self/maps). Secondly, it is much easier to bruteforce ASLR locally than remotely. Thirdly, methods exist to analyze and weaken ASLR implementations – those which I’m aware really require local access.Lastly, once local access is gained it is typical to target a kernel bug (much easier and more portable) — especially considering the currently poor state of Linux kernel security and the fact that very few protections against kernel bug exploitation exist in the mainline kernel (this is slowly improving however). When local there are just so many more options for priv escalation available to make ASLR fairly pointless.
BTW, I do agree ASLR is not a total solution, but the protection it offers can be greatly enhanced when used in conjunction with other measures.ASLR can be made more effective against local command exploits when combined with brute-force detection tools (ex. suid app repeatedly executed) and hiding/denial of info like some in /proc.Also, I do concede the point about long-lived daemons (which often happen to often be net-facing as you point out). Combination with an IDS or brute-force detection mechanism(s) are also relevant (but sometimes more difficult) in this case. net-facing services which do not have an HA requirement (or if they do should have a transparent failover/be suspended from a pool automatically anyway) can be restarted periodically; much software exists to manage and automate this type of task already.
Thank you for this great post. I learned a lot by reading it.I run my x86 server hardened but I am wondering if it is worth it to switch my x86-64 desktop over to a hardened profile?
Most people reading this seem to be asking questions along the lines of “should I try hardened?”Personally I have found it to be zero issue at all on a variety of common server installs (32/64bit intel), so I would vote definite yes for your next server installationI have no experience on the desktop, but it sounds like you still get more issues trying to run 64bit than you do from hardened patches, so again for general intel 64bit desktops I would suggest yes give it a go.Note you can even get hardened gcc4.4 working very nicely if you pickup the hardened overlay version and this even works with recent uclibc – so really hardened is sufficiently mature that you should have few problems even in more obscure setups such as embedded builds!All in all a very impressive bunch of work from the hardened folks – very grateful to you all
In all this I find no comment on whether it might prove better in a desktop with ‘pic’ Though I did notice a comment by Zeev that might matter. “Multimedia libraries hand coded in assembly cannot ‘cope’ ” which libraries I wonder and should it matter.Very informative article and should be included in the Guide as a reference to further reading at the least if not incorporated under a more general Guide title 😉