TEXTRELs (Text Relocations) and their impact on hardening techniques

You might have seen the word TEXTREL thrown around security or hardening circles, or used in Gentoo Linux installation warnings, but one thing that is clear out there is that the documentation around this term is not very useful to understand why they are a problem. so I’ve been asked to write something about it.

Let’s start with taking apart the terminology. TEXTREL is jargon for “text relocation”, which is once again more jargon, as “text” in this case means “code portion of an executable file.” Indeed, in ELF files, the .text section is the one that contains all the actual machine code.

As for “relocation”, the term is related to dynamic loaders. It is the process of modifying the data loaded from the loaded file to suit its placement within memory. This might also require some explanation.

When you build code into executables, any named reference is translated into an address instead. This includes, among others, variables, functions, constants and labels — and also some unnamed references such as branch destinations on statements such as if and for.

These references fall into two main typologies: relative and absolute references. This is the easiest part to explain: a relative reference takes some address as “base” and then adds or subtracts from it. Indeed, many architectures have a “base register” which is used for relative references. In case of executable code, particularly with the reference to labels and branch destinations, relative references translate into relative jumps, which are relative to the current instruction pointer. An absolute reference is instead a fully qualified pointer to memory, well at least to the address space of the running process.

While absolute addresses are kinda obvious as a concept, they are not very practical for a compiler to emit in many cases. For instance, when building shared objects, there is no way for the compiler to know which addresses to use, particularly because a single process can load multiple objects, and they need to all be loaded at different addresses. So instead of writing to the file the actual final (unknown) address, what gets written by the compiler first – and by the link editor afterwards – is a placeholder. It might sound ironic, but an absolute reference is then emitted as a relative reference based upon the loading address of the object itself.

When the loader takes an object and loads it to memory, it’ll be mapped at a given “start” address. After that, the absolute references are inspected, and the relative placeholder resolved to the final absolute address. This is the process of relocation. Different types of relocation (or displacements) exists, but they are not the topic of this post.

Relocations as described up until now can apply to both data and code, but we single out code relocations as TEXTRELs. The reason for this is to be found in mitigation (or hardening) techniques. In particular, what is called W^X, NX or PaX. The basic idea of this technique is to disallow modification to executable areas of memory, by forcing the mapped pages to either be writable or executable, but not both (W^X reads “writable xor executable”.) This has a number of drawbacks, which are most clearly visible with JIT (Just-in-Time) compilation processes, including most JavaScript engines.

But beside JIT problem, there is the problem with relocations happening in code section of an executable, too. Since the relocations need to be written to, it is not feasible (or at least not easy) to provide an exclusive writeable or executable access to those. Well, there are theoretical ways to produce that result, but it complicates memory management significantly, so the short version is that generally speaking, TEXTRELs and W^X techniques don’t go well together.

This is further complicated by another mitigation strategy: ASLR, Address Space Layout Randomization. In particular, ASLR fully defeats prelinking as a strategy for dealing with TEXTRELs — theoretically on a system that allows TEXTREL but has the address space to map every single shared object at a fixed address, it would not be necessary to relocate at runtime. For stronger ASLR you also want to make sure that the executables themselves are mapped at different addresses, so you use PIE, Position Independent Executable, to make sure they don’t depend on a single stable loading address.

Usage of PIE was for a long while limited to a few select hardened distributions, such as Gentoo Hardened, but it’s getting more common, as ASLR is a fairly effective mitigation strategy even for binary distributions where otherwise function offsets would be known to an attacker.

At the same time, SELinux also implements protection against text relocation, so you no longer need to have a patched hardened kernel to provide this protection.

Similarly, Android 6 is now disallowing the generation of shared objects with text relocations, although I have no idea if binaries built to target this new SDK version gain any more protection at runtime, since it’s not really my area of expertise.

If my router was a Caterpie, it’s now a Metapod

And this shows just how geek I can be, by using as the title for the post one of the Pokémons whose only natural attack is … Harden!

Somebody made me notice that I’m getting more scrupulous about security lately, writing more often about it and tightening my own systems. I guess this is a good thing, as becoming responsible for this kind of stuff is important for each of us: if Richard Clarke scared us with it we’re now in the midst of an interesting situation with the stuxnet virus, which gained enough attention that even BBC World News talked about it today.

So what am I actually doing for this? Well, beside insisting on fixing packages when there are even possible security issues, which is a general environment solution, I’ve decided to start hardening my systems starting from the main home router.

You might remember my router as I wrote about it before, but to refresh your mind, and explain it to those who didn’t read about it before, my router is not an off-the-shelf blackbox, and neither it is a reflashed off-the-shelf box that runs OpenWRT or similar firmwares. It is, for the most part, a “standard” system. It’s a classic IBM PC-compatible system, with a Celeron D as CPU, 512MB of RAM and, instead of standard HDDs or SDDs, it runs off a good old fashioned CompactFlash card, with a passive adapter to EIDE.

As “firmware” (or in this case we should call it operating system I guess) it always used a pre-built Gentoo; I’m not using binpkgs, I’m rather building the root out of a chroot. Originally, it used a 32-bit system without fortified sources — as of tonight it runs Gentoo Hardened, 64-bit, with PaX and ASLR; full PIE and full SSP enabled. I guess a few explanations for the changes are worth it.

First of all, why 64-bit? As I described it, there is half a gigabyte of RAM, which fits 32-bit just nicely, no need to get over the 4GiB mark; and definitely a network router is not the kind of appliance you expect powerful CPUs to be needed. So why 64-bit? Common sense wants that 64-bit code requires more memory (bigger pointers) and has an increased code size which both increase disk usage and causes cache to be used up earlier. Indeed, at first lance it seems like this does not fall into two of the most common categories for which 64-bit is suggested: databases (for the memory space) and audio/video encoding (for the extra registers and instructions). Well, I’ll add a third category: a security-oriented hardened system of any kind, as long as ASLR is in the mix.

I have written my doubts about ASLR usefulness — well, time passes and one year later I start to see why ASLR can be useful, mostly when you’re dealing with local exploits. For network services, I still maintain that most likely you cannot solve much with ASLR without occasionally restarting them, since less and less of them actually fork one process from another, while most will nowadays prefer using threads to processes for multiprocessing (especially considering the power of modern multicore, multithread systems). But for ASLR to actually be useful you need two things: relocatable code and enough address space to actually randomize the load addresses; the latter is obviously provided by a 64-bit address space (or is it 48-bit?) versus the 32-bit address space x86 provides. Let’s consider a bit the former.

In the post I linked before, you can see that to have ASLR you end up with either having text relocations on all the executables (which are much more memory hungry than standard executables — and collide with another hardened technique) or with Position-Independent Executables (PIE) that are slightly more memory hungry than normal (because of relocations) but also slower because of sacrificing at least one extra register to build PIC. Well, when using x86-64, you’re saved by this problem: PIC is part of the architecture to the point that there isn’t really much to sacrifice when building PIC. So the bottomline is that to use ASLR, 64-bit is a win.

But please, repeat after me: the security enhancement is ASLR, not PIE.

Okay so that covers half of it; what about SSP? Well, Stack Smashing Protection is a great way to … have lots to debug, I’m afraid. While nowadays there should be much fewer bugs, and the wide use of fortified sources caused already a number of issues to be detected even by those not running a hardened compiler, I’m pretty sure sooner or later I’ll hit some bug that nobody hit before, mostly out of my bad karma, or maybe just because I like using things experimental, who knows. At any rate, it also seems to me like the most important protection here; if anything tries to break the stack boundary, kill it before it can become something more serious; if it’s a DoS, well, it’s annoying, but you don’t risk your system to be used as a spambot (and we definitely have enough of those!) — at least for what concerns C code, it does not do any good for bad codebases unfortunately.

Now the two techniques combined require a huge amount of random data, and that data is fetched from the system entropy pool; given that the router is not running with an HDD (which has non-predictable seek times and thus is a source of entropy), has no audio or video devices to use, and has no keyboard/mouse to gather entropy from, it wouldn’t be extremely unlikely to think of a possible entropy depletion attack. Thankfully, I’m using an EntropyKey to solve that problem.

Finally, to be on the safe side, I enabled PaX (which I keep repeating, has a much more meaningful name on the OpenBSD implementation; W^X), which allows for pages of executable code to be marked as read-only, non-writeable, and vice-versa writeable pages are non-executable. This is probably the most important mitigation strategy I can think of. Unfortunately, the Celeron D has no nx bit support (heck, it came way after my first Athlon64 and it lacks such a feature? Shame!) but PaX does not have that much of a hit on a similar system that mostly idles at 2% of CPU usage (even though I can’t seem to get the scaler to work at all).

One thing I had to be wary of is that enabling UDEREF actually caused my router not to start, reporting memory corruption when init started.. so if you see a problem like that, give it a try to disable it.

Unfortunately, this only protects me on the LAN side, since the WAN is still handled through a PCI card that is in truth only a glorified Chinese router using a very old 2.4 kernel.. which makes me shiver to think about. Luckily there is no “trusted” path from there to the rest of the LAN. On the other hand if somebody happens to have an ADSL2+ card that can be used with a standard Linux system, with the standard kernel and no extra modules especially, then I’d be plenty grateful.

More details on how I proceeded to configure the router will come in futher posts, this one is long enough on its own!

The PIE is not exactly a lie…

Update (2017-09-10): The bottom line of this article changed since the 8 years it was posted, quite unsurprisingly. Nowadays, vanilla kernel has a decent ASLR and so everyone does actually have advantages in building everything as PIE. Indeed, that’s what Arch Linux and probably most other binary distributions do exactly that. The rest of the technical description of why this is important and how is still perfectly valid.

One very interesting misconception related to Gentoo, and especially the hardened sub-profile, is related to the PIE (Position-Independent Executable) support. This is probably due to the fact that up to now the hardened profile always contained PIE support, and since it relates directly to PIC (Position-Independent Code) and PIC as well is tied back to hardened support, people tend to confuse what technique is used for what scope.

Let’s start with remembering that PIC is a compilation option that produces the so-called relocatable code; that is, code that is valid no matter what base address it is loaded at. This is a particularly important feature for shared objects: to be able to be loaded by any executable and still share the code pages in memory, the code needs to be relocatable; if it’s not, a text relocation has to happen.

Relocating the “text” means changing the executable code segment so that the absolute addresses (of both functions and data — variables and constants) is correct for the base address the segment was loaded at. Doing this, causes a Copy-on-Write for the executable area, which among other things, wastes memory (each process running will have to have its private copy of the executable memory area, as well as the variable data memory area). This is the reason why shared objects in almost any modern distribution are built relocatable: faster load time, and reduced memory consumption, at the cost of sacrificing a register.

An important note here: sacrificing a register, which is something needed for PIC to keep the base address of the loaded segment, is a minuscule loss for most architectures, with the notable exception of x86, where there are very few general registers to use. This means that while PIC code is slightly (but not notably) slower for any other architecture, it is a particularly heavy hit on x86, especially for register-hungry code like multimedia libraries. For this reason, shared objects on x86 might still be built without PIC enabled, at the cost of load time and memory, while for most other architectures, the linker will refuse to produce a shared object if the object files are not built with PIC.

Up to now, I said nothing about hardened at all, so let me introduce the first relation between hardened and PIC: it’s called PaX in Hardened Linux, but the same concept is called W^X (Write xor eXecute) in OpenBSD – which is probably a very descriptive name for a programmer – NX (No eXecution) in CPUs, and DEP (Data Execution Prevention) in Windows. To put it in layman terms, what all these technologies do is more or less the same: they make sure that once a memory page is loaded with executable code, it cannot be modified, and vice-versa that a page that can be modified cannot be executed. This is, like most of the features of Gentoo Hardened, a mitigation strategy, that limits the effects of buffer overflows in software.

For NX to be useful, you need to make sure that all the executable memory pages are loaded and set in stone right away; this makes text relocation impossible (since they consists of editing the executable pages to change the absolute addresses), and also hinders some other techniques, such as Just-In-Time (JIT) optimisation, where executable code is created at runtime from an higher, more abstract language (both Java and Mono use this technique), and C nested functions (or at least the current GCC implementation, that makes use of trampolines, and thus require executable stack).

Does any of this mean that you need PIC-compiled executables (which is what PIE is) to make use of PaX/NX? Not at all. In Linux, by default, all executables are loaded at the same base address, so once the code is built, it doesn’t have to be relocated at all. This also helps optimising the code for the base case of no shared object used, as that’s not going to have to deal with PIC-related problems at all (see this old post for more detailed information about the issue).

But in the previous paragraph I did write some clue as to what the PIE technique is all about; as I said, the reason why PIE is not necessary is that by default all executables are loaded at the same address; but if they weren’t, then they’d be needing either text relocations or PIC (PIE), wouldn’t they? That’s the reason why PIE exists indeed. Now, the next question would be, how does PIE relate to hardened? Why does the hardened toolchain use PIE? Does using it make it magically possible to have a hardened system?

Once again, no, it’s not that easy. PIE is not, by itself, neither a security measure nor a mitigation strategy. It is, instead, a requirement for the combined use of two mitigation strategy, the first is the above-described NX idea (which rules out the idea of using text relocations entirely), while the second is is ASLR (Address Space Layout Randomization). To put this technique also in layman terms, you should consider that a lot of exploit require that you change the address a variable points to, so you need to know both the address of that variable, and the address to point it to; to find this stuff out, you can usually try and try again until you find the magic values, but if you randomize the addresses where code and data are loaded each time, you make it much harder for the attacker to guess them.

I’m pretty sure somebody here is already ready to comment that ASLR is not a 100% safe security measure, and that’s absolutely right. Indeed here we have to make some notes as to which situation this really works out decently: local command exploits. When attacking a server, you’re already left to guess the addresses (since you don’t know which of many possible variants of the same executable the server is using; two Gentoo servers rarely have the same executable either, since they are rebuilt on a case by case basis — and sometimes even with the same exact settings, the different build time might cause different addresses to be used); and at the same time, ASLR only changes the addresses between two executions of the same program: unless the server uses spawned (not cloned!) processes, like inetd does (or rather did), then the address space between two requests on the same server will be just the same (as long as the server doesn’t get restarted).

At any rate, when using ASLR, the executables are no longer loaded all at the same address, so you either have to relocate the text (which is denied by NX) or you’ve got to use PIE, to make sure that the addresses are all relative to the specified base address. Of course, this also means that, at that point, all the code is going to be PIC, losing a register, and thus slowed down (a very good reason to use x86-64 instead of x86, even on systems with less than 4GiB of RAM).

Bottomline of the explanation: using the PIE component of the hardened toolchain is only useful when you have ASLR enabled, as that’s the reason why the whole hardened profile uses PIE. Without ASLR, you will have no benefit in using PIE, but you’ll have quite a few drawbacks (especially on the old x86 architecture) due to building everything PIC. And this is also the same reason why software that enables PIE by itself (even conditionally), like KDE 3, is doing silly stuff for most user systems.

And to make it even more clear: if you’re not using hardened-sources as your kernel, PIE will not be useful. This goes for vanilla, gentoo, xen, vserver sources all the same. (I’m sincerely not sure how this behave when using Linux containers and hardened sources).

If you liked this explanation that costed me some three days worth of time to write, I’m happy to receive appreciation tokens — yes this is a shameless plug, but it’s also to remind you that stuff like this is the reason why I don’t write structured documentation and stick to simple, short and to the point blogs.