Sub-optimal optimisations?

While writing Implications of pure and constant functions I’ve been testing some code that I was expecting to be optimised by GCC. I was surprised to find a lot of my testcases were not optimised at all.

I’m sincerely not sure whether these are due to errors on GCC, to me expecting the compiler to be smarter than it can feasibly be right now, or to the “optimised” code to be more expensive than the code that is actually being generated.

Take for instance this code:

int somepurefunction(char *str, int n)
  __attribute__((pure));

#define NUMTYPE1 12
#define NUMTYPE2 15
#define NUMTYPE3 12

int testfunction(char *param, int type) {
  switch(type) {
  case 1:
    return somepurefunction(param, NUMTYPE1);
  case 2:
    return somepurefunction(param, NUMTYPE2);
  case 3:
    return somepurefunction(param, NUMTYPE3);
  }

  return -1;
}

I was expecting in this case the compiler to identify cases 1 and 3 as identical (by coincidence) and then merge them in a single branch. This would have made debugging quite hard actually (as you wouldn’t be able to discern the two case) but it’s a nice reduction on code, I think. Neither on x86_64 nor on Blackfin, neither 4.2 nor 4.3 actually merge the two cases leaving the double code in there.

Another piece of code that wasn’t optimised as I was expecting it to be is this:

unsigned long my_strlen(const char *str)
  __attribute__((pure));
char *strlcpy(char *dst, const char *str, unsigned long len);

char title[20];
#define TITLE_CODE 1
char artist[20];
#define ARTIST_CODE 2

#define MIN(a, b) ( a < b ? a : b )

static void set_title(const char *str) {
  strlcpy(title, str, MIN(sizeof(title), my_strlen(str)));
}

static void set_artist(const char *str) {
  strlcpy(artist, str, MIN(sizeof(artist), my_strlen(str)));
}

int set_metadata(const char *str, int code) {
  switch(code) {
  case TITLE_CODE:
    set_title(str);
    break;
  case ARTIST_CODE:
    set_artist(str);
    break;
  default:
    return -1;
  }

  return 0;
}

I was expecting here a single call to my_strlen(), as it’s a pure function, and in both branches it’s the first call. I know it’s probably complex code once unrolled, but still gcc at least was better at this than intel’s and sun’s compilers!

Both Intel’s and Sun’s, even at -O3 level, emit four calls to my_strlen(), as they can’t even optimise the ternary operation! Actually, Sun’s compiler comes last for optimisation, as it doesn’t even inline set_title() and set_artist().

Now, I haven’t tried IBM’s PowerPC compiler as I don’t have a PowerPC box to develop on anymore (although I would think a bit about the YDL PowerStation, given enough job income in the next months — and given Gentoo being able to run on it), so I can’t say anything about that, but for these smaller cases, I think GCC is beating other proprietary compilers under Linux.

I could check Microsoft’s and Borland’s Codegear’s compilers, but it was a bit out of my particular scope at the moment.

If I did think a bit before about supporting non-GNU compilers for stuff like xine and unieject, I start to think it’s not really worth the time spent on that at all, if this is the result of their compilations…

One thought on “Sub-optimal optimisations?

  1. Did you report them to gcc bugzilla as missed optimizations? I guess that’s a good way to either get them fixed or get pointed to a reason why this can’t be done.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s