More threads is not always a good solution

You might remember that some time ago I took over unpaper as I used (and sort of use still) to pre-filter the documents I scan to archive in electronic form. While I’ve been very busy in the past few months, I’m now trying to clear out my backlog, and it turns out that a good deal of it involves unpaper. There are bugs to fix and features to implement, but also patches to evaluate.

One of the most recent patches I received is designed to help with the performance of the tool, which are definitely not what Jens, the original author, had in mind when he came up with the code. Better performance is something that about everybody would love to have at this point. Unfortunately, it turns out that the patch is not working out as intended.

The patch is two-fold: from one side, it makes (optional) use of OpenMP to parallelize the code in hope to speed it up. Given that most of the code is a bunch of loops, it seemed obvious that using multithreading would speed things out, right? Well, first try after applying it showed very easily that it slows down, at least on Excelsior, which is a 32-cores system. While it would take less than 10 seconds for the first test to run without OpenMP, it would take over a minute with it, spinning up all the cores for a 3000% CPU usage.

A quick test shows that forcing the number of threads to 8, rather than leaving it unbound, makes it actually faster than the non-OpenMP variant. This means that there are a few different variables in play that needs to be tuned for performance to improve. Without going into profiling the code, I can figure out a few things that can go wrong with unchecked multithreading:

  • extensive locking when each worker thread is running, either because they are all accessing the same memory page, or because the loop needs a “reduced” result (e.g. a value has to be calculated as the sum of values calculated within a parallelized loop); in the case of unpaper I’m sure both situations happen fairly often in the looping codepath.
  • cache trashing: as the worker threads jump around the memory area to process, it no longer is fetching a linear amount of memory;
  • something entirely more complicated.

Beside being obvious that doing the “stupid” thing and just making all the major loops parallel, this situation is bringing me to the point where I finally will make use of the Using OpenMP book that I got a few years back and I only started reading after figuring out that OpenMP was not ready yet for prime time. Nowadays OpenMP support in Linux improved to the point it’s probably worth taking another look at it, and I guess unpaper is going to be the test case for it. You can expect a series of blog posts on the topic at this point.

The first thing I noticed while reading the way OpenMP handles shared and private variables, is that the const indication is much stronger when using OpenMP. The reason is that if you tell the code that a given datum is not going to change (it’s a constant, not a variable), it can easily assume direct access from all the threads will work properly; the variable is shared by default among all of them. This is something that, for non-OpenMP programs, is usually inferred from the SSA form — I assume that for whatever reason, OpenMP makes SSA weaker.

Unfortunately, this also means that there is one nasty change tha tmight be required to make code friendlier to OpenMP, and that is a change in prototypes of functions. The parameters to a function are, for what OpenMP is concerned, variables, and that means that unless you declare them const, it won’t be sharing them by default. Within a self-contained program like unpaper, changing the signatures of the functions so that parameters are declared, for instance, const int is painless — but for a library it would be API breakage.

Anyway, just remember: adding multithreading is not the silver bullet you might think it is!

P.S.: I’m currently trying to gauge interest on a volume that would collect, re-edit, and organize all I written up to now on ELF. I’ve created a page on leanpub where you can note down your interest, and how much you think such a volume would be worth. If you would like to read such a book, just add your interest through the form; as soon as at least a dozen people are interested, I’ll spend all my free time working on the editing.

4 thoughts on “More threads is not always a good solution

  1. does something like this help, and save you from having to change the API?int foo(int x){ const int x2 = x; …}

    Like

  2. It should be noted that you can change a parameter to const in a function implementation without changing it in its prototype. This is valid C/C++ code:int foo(int bar);int foo(const int bar) { return bar; }What you cannot do is change a pointer into a pointer to const (int* -> const int*) or a reference into a const reference. You can only make the pointer itself const (int* const).

    Like

  3. Sam, that would work but it’s nasty to write and maintain.Florian, are you sure it works with _all_ compilers? I know it should work with GCC but I remember something similar not working with Sun’s.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s