Some words about global variables

While almost all courses of practical software engineer will tell you to avoid entirely the use of global variables in your software, sometimes there are reasons to use them. It usually applies only to programs, as you can easily assume that you don’t risk re-entrancy problems; libraries on the other hand should really try to avoid using global variables, especially global state variables, for their own nature.

I’ll let you guess which software makes an happy use of global variables.. yeah you guessed right: xine-ui. Now it’s actually comprehensible, as most of that stuff has no re-entrancy requirement, and just passing around a context structure would make the code messier. There are, though, quite a few notes on the topic, and this is why I decided to write something about it.

There are at least three ways to store the global variables:

  • declaring and defining them one by one, so that they are all a bunch of different symbols at linker level;
  • declaring them inside a structure (optionally anonymous) and then define a single global pointer to that structure to be shared;
  • again declaring them inside a structure (again optionally anonymous) and then define a single instance of that structure to be shared.

Up to Wednesday, xine-ui only did a mix of the last two cases, and I was the one introducing the last case, for your information.

There is no “one size fit all” solution, as it’s almost obvious with software design problems, so most likely a properly designed software will use a mix of these three cases. In xine-ui I introduced the first case Wednesday early morning, but let me first describe the various pros and cons.

Starting from the first, the problem with it is easy to guess even after just reading the point: they are a bunch of different symbols. If you don’t properly hide your symbols, each symbol is accessed through the PLT (Procedure Linking Table), and this is an expensive operation; also the symbols get exported and thus can be interposed, and if you’re using PIE, they also have to pass through the GOT (Global Offset Table). Also, they can easily get shifted between .bss, .data and .data.rel for pointers, which makes it more likely that they use multiple in-memory pages.

The difference between the last two instead, both providing a single symbol and thus less expensive to access even without hiding the symbols, stands on how and where the memory is allocated. Using a global pointer to the state structure allows you to allocate and deallocate it as needed, so for instance if it’s the state of a dialog window that user has to explicitly request and then is closed, it can be allocated upon request, and freed after the dialog was closed. The big part of the memory area is thus allocated in the heap, but on the worst case of nothing else ending up in .bss, it will cause a 4KiB page to be allocated just to keep the 4 or 8 bytes pointer (so in most cases, if the structure is smaller than 4KiB it’s still better to use a global instance).

On the other hand, when using a global instance of the state structure, it will be reserved either in the .data, .data.rel or .bss sections, depending on whether there are pointers or not, or if the structure is initialised as empty. It will, thus, most likely make better use of the memory, as it will just use the page for that section rather than allocating a page just for a single pointer.

Now of course one would suppose that the first case is never useful, as the other two seem to have less invasive disadvantages. Still, it’s not so.

Let’s focus on comparing the first and the last cases, as they both use statically-allocated memory (in sections) rather than dynamically-allocated memory (heap). When you have a single huge structure instance that contains pointers and parameters with a default value, and you’re building with PIC, the instance will fall into .data.rel, which – without prelink – will trigger a COW directly at the start of the program, as the dynamic linker will have to relocate it. This will create multiple problems, for instance the definition of a single long array might fall partly on the original (disk-backed) page, and partly on the new private page allocated for the process, resulting in a missed cacheline; or depending on the implementation – not Linux’s case, as far as I can see, but I certainly can see uses of this to mitigate the problem I just described – it might cause the copy on write of a huge .data.rel section which contains data that needs not to be relocated and that might even still have its default value. These problems are mitigated when you use multiple variables because they’ll enter the right section as they need.

But the other main difference between the three cases is in the way the code is built to access the data:

  • in the case of a global pointer, the compiler will take the address of the variable containing the pointer, dereference it to get the address of the memory area where the structure reside, sum to that the offset of the variable to access in it, and then dereference the address just obtained to access the data;
  • in the case of a global instance, the compiler will take the address of the instance directly, then sum to that the offset of the variable, and dereference the address just obtained to access the data;
  • in the case of single variables, the compiler will just take the address of the variable and use that to access the data.

While most of the compilers will see to optimise the second case so that the difference between the last two is minimal, if any, I find it better to keep the compiler to guess too much.

But the difference is not yet finished; again we can compare the global instance method with the single variables method, this time for what concerns ordering. When you declare a structure, the order of the element is exactly the one you’ve written; if you don’t pack the structure explicitly, padding will be added so that the alignment of the variables is correct for the architecture. This means that this structure:

struct {
  char d;
  void *p;
} a;

will require 16 bytes on a 64-bit architecture (and 8 on a 32-bit architecture), wasting either 7 or 3 bytes depending on the alignment (this is why dwarves) was created. While x86 and amd64 architectures can access just as easily non-aligned data, most RISC architectures can’t, and even on x86/amd64 advanced features like SSE and similar require alignment of variables.

So what is the relation between this and the two methods I described? Well, as I said, the order you use for the structure will remain unchanged, while this can help to order the variables so that variables accessed together are kept together to fall in the same cacheline, padding might waste quite a bit of stuff. The order of variables declaration isn’t imperative, instead, and the linker can easily reorder them to fill the holes on its own. It can also make use of advanced optimisation, for instance you can use my method for reducing unused symbols linking.

If you really really really know that some variables are always accessed together and thus should stay on the same cacheline and not reordered, then add them to a small structure. Not a huge one, just a small one with the minimum variables possible, it will be treated like a single element, will lose some of the advantages of having the variables split (reordering, direct access to data), but as this is usually an exception, it shouldn’t be much of a problem.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s