Anybody who ever programmed in C with a half-decent compiler knows that warnings are very important and you should definitely not leaving them be. Of course, there are more and less important warnings, and the more the compiler’s understanding of the code increases, the more warnings it can give you (which is why using -Werror
in released code is a bad idea and why it causes so many headaches to me and the other developers when a new compiler is released).
But there are some times where the warnings, while not highlighting broken code, are indication of more trivial issues, such as suboptimal or wasteful code. One of these warnings was introduced in GCC 4.6.0, and relates to variables that are declared, set, but never read, and I dreamed of it back in 2008.
Now, the warning as it is, it’s pretty useful. Even though a number of times it’s going to be used to mask unused results warnings it can show code where a semi-pure function, i.e. one without visible side effects, but not marked (or markable) as such because of caching and other extraneous accesses, more about it in my old article if you wish, is invoked just to set a variable that is not used — especially with very complex functions, it is possible that enough time is spent processing for nothing.
Let me clarify this situation. If you have a function that silently reads data from a configuration file or a cache to give you a result (based on its parameters), you have a function that, strictly-speaking, is non-pure. But if the end result depends most importantly on the parameters, and not from the external data, you could say that the function’s interface is pure, from the caller perspective.
Take localtime()
as an example: it is not a strictly-pure function because it calls tzset()
, which – as the name leaves to intend – is going to set some global variables, responsible to identify the current timezone. While these are most definitely side effects, they are not the kind of side effects that you’ll be caring for: if the initialization doesn’t happen there, it will happen the next time the function is called.
This is not the most interesting case though: tzset()
is not a very expensive funciton, and quite likely it’ll be called (or it would have been called) at some other point in the process. But there are a number of other functions, usually related to encoding or cryptography, which rely on pre-calculated tables, which might be actually calculated at the time of use (why that matters is another story).
Now, even considering this, a variable set but not used is usually not going to be troublesome in by itself: if it’s used to mask a warning for instance, you still want the side effect to apply, and you won’t be paying the price of the extra store since the compiler will not emit the variable at all.. as long as said variable is an automatic one, which is allocated on the stack for the function. Automatic variables undergo the SSA transformation, which among other things allows for unused stores to be omitted from the code.
Unfortunately, SSA cannot be applied to static variables, which means that assigning a static variable, even though said static variable is never used, will cause the compiler to include a store of that value in the final code. Which is indeed what happens for instance with the following code (tested with both GCC 4.5 – which does not warn – and 4.6):
int main() {
static unsigned char done = 0;
done = 1;
return 1;
}
The addition of the -Wunused-but-set-variable
warning in GCC 4.6 is thus a godsend to identify these, and can actually lead to improvements on the performance of the code itself — although I wonder why is GCC still emitting the static variable in this case, since, at least 4.6, knows enough to warn you about it. I guess this is a matter of missing an optimization, nothing world-shattering. What I was much more surprised by is that GCC fails to warn you about one similar situation:
static unsigned char done = 0;
int main() {
done = 1;
return 1;
}
In the second snippet above, the variable has been moved from function-scope to unit-scope, and this is enough to confuse GCC into not warning you about it. Obviously, to be able to catch this situation, the compiler will have to perform more work than the previous one, since the variable could be accessed by multiple functions, but at least with -funit-at-a-time
it is already able to apply similar analysis, since it reports unused static functions and constant/variables. I reported this as bug #48779 upstream.
Why am I bothering writing a whole blog post about a simple missed warning and optimization? Well, while it is true that zeroed static variables don’t cause much trouble, since they are mapped to the zero-page and shared by default, you could cause huge waste of memory if you have a written-only variable that is also relocated, like in the following code:
#include
static char *messages[] = {
NULL, /* set at runtime */
"Foo",
"Bar",
"Baz",
"You're not reading this, are you?"
};
int main(int argc, char *argv[]) {
messages[0] = argv[0];
return 1;
}
Note: I made this code for an executable just because it was easier to write down, and you should think of it as a PIE so that you can feel the issue with relocation.
In this case, the messages
variable is going to be emitted even though it is never used — by the way it is not emitted if you don’t use it at all: when the static variable is reported as unused, the compiler also drops it, not so for the static ones, as I said above. Luckily I can usually identify problems like these while running cowstats
, part of my Ruby-Elf utilities if you wish to try it, so I can look at the code that uses it, but you can guess it would have been nicer to have already in the compiler.
I guess we’ll have to wait for 4.7 to have that. Sigh!
have you considered trying out clang in your tinderbox? for your last example, clang (2.9 and also trunk r130114) is able to eliminate the messages array for good even at -O1 and its static analyzer would also warn about this kind of problem (just try it out on linux/fs/btrfs ;).
Uhm, no I haven’t considered the idea of a clang-based tinderbox yet, I guess it might be an option at some point.But I’ll try 2.9 out on a few projects of mine at this point, might help! 🙂
Thank you for this blog entry. Very interesting.Typo: s/is not a very expensive funciton/is not a very expensive function/