You might have seen the word TEXTREL thrown around security or hardening circles, or used in Gentoo Linux installation warnings, but one thing that is clear out there is that the documentation around this term is not very useful to understand why they are a problem. so I’ve been asked to write something about it.
Let’s start with taking apart the terminology. TEXTREL is jargon for “text relocation”, which is once again more jargon, as “text” in this case means “code portion of an executable file.” Indeed, in ELF files, the .text
section is the one that contains all the actual machine code.
As for “relocation”, the term is related to dynamic loaders. It is the process of modifying the data loaded from the loaded file to suit its placement within memory. This might also require some explanation.
When you build code into executables, any named reference is translated into an address instead. This includes, among others, variables, functions, constants and labels — and also some unnamed references such as branch destinations on statements such as if
and for
.
These references fall into two main typologies: relative and absolute references. This is the easiest part to explain: a relative reference takes some address as “base” and then adds or subtracts from it. Indeed, many architectures have a “base register” which is used for relative references. In case of executable code, particularly with the reference to labels and branch destinations, relative references translate into relative jumps, which are relative to the current instruction pointer. An absolute reference is instead a fully qualified pointer to memory, well at least to the address space of the running process.
While absolute addresses are kinda obvious as a concept, they are not very practical for a compiler to emit in many cases. For instance, when building shared objects, there is no way for the compiler to know which addresses to use, particularly because a single process can load multiple objects, and they need to all be loaded at different addresses. So instead of writing to the file the actual final (unknown) address, what gets written by the compiler first – and by the link editor afterwards – is a placeholder. It might sound ironic, but an absolute reference is then emitted as a relative reference based upon the loading address of the object itself.
When the loader takes an object and loads it to memory, it’ll be mapped at a given “start” address. After that, the absolute references are inspected, and the relative placeholder resolved to the final absolute address. This is the process of relocation. Different types of relocation (or displacements) exists, but they are not the topic of this post.
Relocations as described up until now can apply to both data and code, but we single out code relocations as TEXTRELs. The reason for this is to be found in mitigation (or hardening) techniques. In particular, what is called W^X, NX or PaX. The basic idea of this technique is to disallow modification to executable areas of memory, by forcing the mapped pages to either be writable or executable, but not both (W^X reads “writable xor executable”.) This has a number of drawbacks, which are most clearly visible with JIT (Just-in-Time) compilation processes, including most JavaScript engines.
But beside JIT problem, there is the problem with relocations happening in code section of an executable, too. Since the relocations need to be written to, it is not feasible (or at least not easy) to provide an exclusive writeable or executable access to those. Well, there are theoretical ways to produce that result, but it complicates memory management significantly, so the short version is that generally speaking, TEXTRELs and W^X techniques don’t go well together.
This is further complicated by another mitigation strategy: ASLR, Address Space Layout Randomization. In particular, ASLR fully defeats prelinking as a strategy for dealing with TEXTRELs — theoretically on a system that allows TEXTREL but has the address space to map every single shared object at a fixed address, it would not be necessary to relocate at runtime. For stronger ASLR you also want to make sure that the executables themselves are mapped at different addresses, so you use PIE, Position Independent Executable, to make sure they don’t depend on a single stable loading address.
Usage of PIE was for a long while limited to a few select hardened distributions, such as Gentoo Hardened, but it’s getting more common, as ASLR is a fairly effective mitigation strategy even for binary distributions where otherwise function offsets would be known to an attacker.
At the same time, SELinux also implements protection against text relocation, so you no longer need to have a patched hardened kernel to provide this protection.
Similarly, Android 6 is now disallowing the generation of shared objects with text relocations, although I have no idea if binaries built to target this new SDK version gain any more protection at runtime, since it’s not really my area of expertise.
Great to see you blogging again 😁