Position Independent Code is a technique used to create executable code that, as the name implies, is independent from the starting address where it is loaded (the position). This means that the pointers to data and functions in the code, as well as in the default value of pointers cannot be assumed to be always the same as the ones set after the build process in the executable file (or the library).
What this means in practical terms is that, as you can’t be sure how many and which libraries a program might load at runtime, libraries are usually loaded at dynamically-assigned addresses, thus the code need not to statically use one value as base address. When a shared library is loaded with a static base address (thus not using PIC), it has to be relocated by the runtime loader, and that causes changes to the .text
section, which breaks the assumption that sections should be either writable or executable, but not both at the same time.
When using PIC, instead, the access to symbols (data and functions) is maintained by a global offset table (GOT), so the code does not need to be relocated, only the GOT and the pointers stored in the data sections. As you can guess, this kind of indirect access takes more time than the direct access that non-PIC code uses, and this is why a lot of people hate the use of PIC in x86 systems (on the other hand, shared libraries not using PIC not only breaks the security assumption noted above, making it impossible to use mitigation technologies like NX – PaX in Linux, W^X in OpenBSD – it also increase the memory usage of software as all the .text
sections containing code will need to be relocated and thus duplicated by the copy-on-write).
Using hidden visibility is possible to reduce the performance hit caused by GOT access, by using PC-relative addressing (relative to the position on the file), if the architecture supports them of course. It does not save much for what concerns pointers in the data sections, as they will still need relocations. This is what causes arrays of strings to be written in .data.rel.ro
sections rather than .rodata
sections: the former gets relocated, the latter doesn’t, so is always shared.
So this covers shared libraries, right? Copy-on-write on shared libraries are bad, shared-libraries use PIC, pointers in data section on PIC code cause copy-on-write. But does it stop with shared libraries?
One often useful security mitigation factor is random base address choice for executables: instead of loading the code always at the same address, it is randomised between different executions. This is useful because an attacker can’t just start guessing at which address the program will be loaded. But applying this technique to non-PIC code will cause relocations in the .text
section, which in turn will break another security mitigation technique, so is not really a good idea.
Introducing PIE (Position Independent Executable).
PIE is not really anything new: it only means that even executables are built with PIC enabled. This means that while arrays of pointer to characters are often considered fine for executables, and are written to .rodata
(if properly declared) for non-PIC code, the problem with them reappears when using PIE.
It’s not much of a concern usually because the people using PIE are usually concerned with security more than performance (after all it is slower than using non-PIC code), but I think it’s important for software correctness to actually start considering that an issue too.
Under this light, not only it is important to replace pointers to characters with characters array (and similar), but hiding the symbols for executables become even more important to reduce the hit caused by PIC.
I’m actually tempted to waste some performance in my next box and start using PIE all over just to find this kind of problems more easily… yeah I’m masochist.