This Time Self-Hosted
dark mode light mode Search

Some notes about multi-threading

Ah, multithreading, such a wonderful concept, as it allows you to create programs in such a way that you don’t have to implement an event loop (for the most part). Anybody who ever programmed in the days of DOS has at least once implemented an event loop to write an interactive program. During my high school days I wrote a quite complex one for a game I designed for a class. Too bad I lost it 🙁

Multithreading is also an interesting concept nowadays that all the major CPUs are multi-core; for servers they were already for some time, but we know that mainstream is always behind on these things, right?

So, now that multithreading is even more intersting, it’s important to design programs, but even more importantly, libraries to be properly multithreaded.

I admit I’m not such a big expert of multithreading problems, I admit that, but I do know a few things that come useful when developing. One of these is that static variables are evil.

The reason why static variables are evil is because they are implicitly shared between different threads. For constants this is good, as they use less memory, but for variables this is bad, because you might overwrite the data another thread is using.

For this reason, one of the easiest thing to spot in a library to tell if it’s multithread-safe or not is to check if it relies on static variables. If it does, it’s likely not thread safe, and almost surely not thread optimised.

You could actually be quite thread safe even when using static variables, the easy way to do that is to have a mutex protecting every and all accesses to the variable, this way only one thread at a time can access it, and noone can overwrite someone else’s data.

That cause a problem though, as this serialises the execution of a given function. Let’s take for instance a function that requests data through the net with an arbitrary protocol (we don’t care which protocol it is), saves it on a static buffer, and then parse it filling a structure with the data received and parsed. If such a function is used in a multithreaded program, it has to be protected by a mutex, as it uses a static buffer. If four threads require access to that function almost simultaneously (and that might happen, especially on multi-core systems!), then the first one arriving will get the mutex, the other three will wait till the first one completed processing. Although in general, on a multicore system you’d then have other processes scheduled to be executed at that point, you’re going to waste time by waiting for a thread to complete its operation, before the next one can be resumed.

This is extremely annoying, especially now that the future of CPUs seems to be an increase in number of cores, rather than in their speed (as we’re walking around a physical limit of 3GHz as far as I can see). The correct way to handle such a situation is not to use a static buffer, but rather use a heap-allocated buffer, even if that is slightly slower for a single thread (as you have to allocate the memory and free it afterward); this way the four threads are independent and can be executed simultaneously. For this reason, libraries should try to never use static buffers, as they might not know if the software using them is multi-threaded or not.

When a library is blatantly not thread-safe, there is even a bigger problem, which can be solved in two ways: the first is to limit access to that library to a single thread. This way there are no problems with threading, but then all the requests that need to be sent to that library has to be passed to the thread, and the thread has to answer to them; while cheaper than IPC, ITC is still more expensive than using a properly thread-safe library.

The other option is to protect every use of the library with a mutex. This makes a library thread-safe if it’s at least re-entrant (that is, no function depends on the status of global variables set by other functions), but acts in the same way as the “big kernel lock” does: it does not allow you to run the same function from two threads at once, or even any function of that library from two threads at once – if the functions use shared global variables.

How should libraries behave, then, when they need to keep track of the state? Well there easiest way is obviously to have a “context” parameter, pointer to a structure that keeps all the needed state data, allowing two threads to use different contexts, and call the library simultaneously.

Sometimes, though, you just need to keep something similar to an errno variable, that is global and set by all your functions. There’s no way to handle that case gracefully through mutexes, but there’s an easy way to do that through Thread-Local Storage. If you mark the variable as thread-local, then every thread will see just one copy of that variable, and doesn’t need an explicit mutex to handle that (the implementation might use a mutex, I don’t really know the underlying details).

This is also quite useful for multi-threaded programs that would like to use global variables rather than having to pass a thread structure to all the functions. Take this code for instance:

/* Instantiated a few times simultaneously */
void *mythread(void *address) {
  mythread_state_t *state = malloc(sizeof(mythread_state_t));

  set_address(state, address);
  do_connection(state);
  check_data(state);

  do_more(state);
}

While for library API calls having a context parameter is an absolutely good idea, if the code has no reason to be reentrant, passing it as parameter might be a performance hit. At the same time, while using global variables in libraries is a very bad idea, for programs it’s not always that bad, and it can actually be useful to avoid passing parameters around or using up more memory. You could then have the same code done this way:

__thread mythread_state_t thread_state;

/* Instantiated a few times simultaneously */
void *mythread(void *address) {
  set_address(address);
  do_connection();
  check_data();

  do_more();
}

The thread_state variable would be one per thread, needing neither a mutex to protect it, nor to e passed once to every function.

There are a few notes about libraries and thread safety which I’d like to discuss, but I’ll leave those for another time. Two tech posts a day is quite a lot already, and I need to resume my paid job now.

Comments 5
  1. What happens with static const variables?For example:int function() { static int const i = 10;}Are they initialized at runtime or during program loading?In C++ most initializations happen at runtime and also forbid using static const variables in multithreaded programs.But what about variables where the value is known at compile time?

  2. YES! Exactly what I was looking at atm. Static variables, threads and mutexes. Cleared it up a bit for me. Thankyouthankyou! =)

  3. static constant in C can not have initializers: their content is moved directly in .rodata or .data.rel.ro (depending whether you got pointers in those or not), and are initialised either at build-time (.rodata) or at runtime linking (before execution starts), so they are perfectly fine in multithreaded programs.

  4. > The reason why static variables are evil is because they are implicitly shared between different threads.I think this is a huge oversimplification: having shared data is essentially the only reason to use multithreading (as opposed to using multiple processes).As for libraries with global state, I think that the “context” trick is to be preferred to TLS in terms of simplicity, maintainability and ease of testing. In c++ this can easily be realized by implementing your API as nonstatic member functions of a class, so that the implicit this acts as the additional context parameter.The performance hit is negligible for most situations.

  5. I was referring to variables static to functions, not to _global_ variables, which are another issue entirely. Note that anyway, the data sharing is usually good when there is one producer and one or more consumers. In my examples above, there are no producers or consumers, there were just functions using a static buffer.But for libraries, anyway, using global variables is also a bad idea, and the context thing is way preferred over that.As for what concern TLS, it might be simpler to take care of really simple global states errno-like through a TLS variable rather than a context, if there is no other reason to use a context but that.But as I think I made clear (and if I didn’t, I’ll see to make it clear in a future post), TLS is most useful for the final programs, rather than the libraries. Good libraries should always avoid global state anyway, to be both reentrant and thread-safe.As for the overhead of passing a parameter… in C++ you always do that implicitly so you don’t really have any problem with that, as you said; in C the thing is differnet, and often there is high-performance code that might as well just prefer not to have one registered cluttered by passing a context parameter.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.