For A Parallel World: Programming Pointers n.1: OpenMP

A new sub-series of For A Parallel World, to give some development pointers on how to make software make better use of parallel computing resources available with modern multicore systems. Hope some people can find it interesting!

While I was looking for a (yet unfound) scripting language that could allow me to easily write conversion scripts with FFmpeg that execute in parallel, I’ve decided to finally take a look to OpenMP, which is something I wanted to do for quite a while. Thanks to some Intel documentation I tried it out on the old benchmark I used to compare glibc, glib and libavcodec/FFmpeg for what concerned byte swapping.

The code of the byte swapping routine was adapted to this (note the fact that the index is ssize_t, which means there is opening for breakage when len is bigger than SSIZE_T_MAX):

void test_bswap_file(const uint16_t *input, uint16_t *output, size_t len) {
  ssize_t i;

#ifdef _OPENMP
#pragma omp parallel for
  for (i = 0; i < len/2; i++) {
    output[i] = GUINT16_FROM_BE(input[i]);

and I compared the execution time, through the @rdtsc@ instruction, between the standard code and the OpenMP-powered one, with GCC 4.3. The result actually surprised me. I expected I would have needed to make this much more complex, like breaking it into multiple chunks, to be able to make it fast. Instead it actually worked nice running one cycle on each:

A bar graph showing the difference in speed between standard serial bswap and a simple OpenMP-powered variant

The results are the average of 100 runs for each size, the file is on disk accessed through VFS, it’s not urandom; the values are not important since they are the output of rdtsc() so they only work for comparison.

Now of course this is a very lame way to use OpenMP, there are much better more complex ways to make use of it but all in all, I see quite a few interesting patterns arising. One very interesting note is that I can make use of this in quite a few of the audio output conversion functions in xine-lib, if I was able to resume dedicating time to work on that. Unfortunately lately I’ve been quite limited in time, especially since I don’t have a stable job.

But anyway, it’s interesting to know; if you’re a developer interested in making your code faster on modern multicore systems, consider taking some time to play with OpenMP. With some luck I’m going to write more about it in the future on this blog.