That innocent warning… or maybe not?

Anybody who ever programmed in C with a half-decent compiler knows that warnings are very important and you should definitely not leaving them be. Of course, there are more and less important warnings, and the more the compiler’s understanding of the code increases, the more warnings it can give you (which is why using -Werror in released code is a bad idea and why it causes so many headaches to me and the other developers when a new compiler is released).

But there are some times where the warnings, while not highlighting broken code, are indication of more trivial issues, such as suboptimal or wasteful code. One of these warnings was introduced in GCC 4.6.0, and relates to variables that are declared, set, but never read, and I dreamed of it back in 2008.

Now, the warning as it is, it’s pretty useful. Even though a number of times it’s going to be used to mask unused results warnings it can show code where a semi-pure function, i.e. one without visible side effects, but not marked (or markable) as such because of caching and other extraneous accesses, more about it in my old article if you wish, is invoked just to set a variable that is not used — especially with very complex functions, it is possible that enough time is spent processing for nothing.

Let me clarify this situation. If you have a function that silently reads data from a configuration file or a cache to give you a result (based on its parameters), you have a function that, strictly-speaking, is non-pure. But if the end result depends most importantly on the parameters, and not from the external data, you could say that the function’s interface is pure, from the caller perspective.

Take localtime() as an example: it is not a strictly-pure function because it calls tzset(), which – as the name leaves to intend – is going to set some global variables, responsible to identify the current timezone. While these are most definitely side effects, they are not the kind of side effects that you’ll be caring for: if the initialization doesn’t happen there, it will happen the next time the function is called.

This is not the most interesting case though: tzset() is not a very expensive funciton, and quite likely it’ll be called (or it would have been called) at some other point in the process. But there are a number of other functions, usually related to encoding or cryptography, which rely on pre-calculated tables, which might be actually calculated at the time of use (why that matters is another story).

Now, even considering this, a variable set but not used is usually not going to be troublesome in by itself: if it’s used to mask a warning for instance, you still want the side effect to apply, and you won’t be paying the price of the extra store since the compiler will not emit the variable at all.. as long as said variable is an automatic one, which is allocated on the stack for the function. Automatic variables undergo the SSA transformation, which among other things allows for unused stores to be omitted from the code.

Unfortunately, SSA cannot be applied to static variables, which means that assigning a static variable, even though said static variable is never used, will cause the compiler to include a store of that value in the final code. Which is indeed what happens for instance with the following code (tested with both GCC 4.5 – which does not warn – and 4.6):

int main() {
  static unsigned char done = 0;

  done = 1;
  return 1;
}

The addition of the -Wunused-but-set-variable warning in GCC 4.6 is thus a godsend to identify these, and can actually lead to improvements on the performance of the code itself — although I wonder why is GCC still emitting the static variable in this case, since, at least 4.6, knows enough to warn you about it. I guess this is a matter of missing an optimization, nothing world-shattering. What I was much more surprised by is that GCC fails to warn you about one similar situation:

static unsigned char done = 0;

int main() {

  done = 1;
  return 1;
}

In the second snippet above, the variable has been moved from function-scope to unit-scope, and this is enough to confuse GCC into not warning you about it. Obviously, to be able to catch this situation, the compiler will have to perform more work than the previous one, since the variable could be accessed by multiple functions, but at least with -funit-at-a-time it is already able to apply similar analysis, since it reports unused static functions and constant/variables. I reported this as bug #48779 upstream.

Why am I bothering writing a whole blog post about a simple missed warning and optimization? Well, while it is true that zeroed static variables don’t cause much trouble, since they are mapped to the zero-page and shared by default, you could cause huge waste of memory if you have a written-only variable that is also relocated, like in the following code:

#include 

static char *messages[] = {
  NULL, /* set at runtime */
  "Foo",
  "Bar",
  "Baz",
  "You're not reading this, are you?"
};

int main(int argc, char *argv[]) {
  messages[0] = argv[0];

  return 1;
}

Note: I made this code for an executable just because it was easier to write down, and you should think of it as a PIE so that you can feel the issue with relocation.

In this case, the messages variable is going to be emitted even though it is never used — by the way it is not emitted if you don’t use it at all: when the static variable is reported as unused, the compiler also drops it, not so for the static ones, as I said above. Luckily I can usually identify problems like these while running cowstats, part of my Ruby-Elf utilities if you wish to try it, so I can look at the code that uses it, but you can guess it would have been nicer to have already in the compiler.

I guess we’ll have to wait for 4.7 to have that. Sigh!

Important names, pointless names

So I recently said that i wanted to tackle the problem of QA warnings about missing sonames (Shared Object Names) and to do so I needed to give you some details about shared objects … I’ll add that you probably want to read my explanation about sonames that I wrote last October, since it covers some basis I’d rather avoid repeating here.

The sonames are, as I said before, the names that the dynamic linker will search for when loading a program, or another library, linked to the former. This is because the value of the DT_SONAME tag will be copied into the DT_NEEDED entry for the linked programs; this is why the soname is particularly important. Especially if it’s properly used with ABI-versioning so that libfoo.so.0 and libfoo.so.1 are used for two different, incompatible ABIs that cannot be used interchangeably.

At this point is clear that the soname main importance is to generate proper NEEDED entries; so what happens when it’s missing? In that case, the basename of the library file (which usually is just libfoo.so) is used as NEEDED; since -lfoo will translate to searching libfoo.so without further extension, even when the library is providing (manually) the version numbers, they won’t be encoded into the NEEDED entry at all, and that is, as you might guess, bad.

But is it, always?

While for a general-purpose library (I don’t mean just a library like glib, but rather a library in the broader sense, counting in also specific libraries like libpng or libcurl) the lack of a soname is always a bad thing, there are a few situations where it’s not too important, and can easily be ignored, to avoid spending time where it’s not useful: whenever ABI versioning is not important. This assertion gets then split into further cases, as I’ll try to explain now.

The first case is when you won’t be listing the library as NEEDED ever; this is the case for almost all cases of plugins I know of. They get loaded through dlopen() rather than by the implicit handling of the dynamic linker. This makes sonames and needed entries totally pointless. In these cases, it shouldn’t be a problem even if the library lacks a soname.

The second case relates to internal shared libraries, that might or might not be worth it, as I wrote before. Internal libraries are shipped with the binaries that links to them. They are rebuilt whenever the package is rebuilt. They are only there to reduce the code duplication between binaries. In these cases, sonames, and shared object versioning, are of no use. While it won’t cause trouble to have them, they won’t be necessary.

So, how perfect is the Gentoo QA test? Quite a bit. It won’t cause you to waste time with most of the plugins and most of the internal libraries; on the other hand, there will be false negatives: currently, it only takes the files corresponding to the glob {,usr/}lib*/lib*.so* … this is correct, most of the time, but not always. If the package will add configuration scripts to link to libraries in other paths if the package will install new paths into ld.so.conf, so adding more paths to the default search, where NEEDED entries would be required. These are unfortunately cases it’s not trivial to look out for.

So what about the private libraries that are installed into the /usr/lib path (or similar) for which no soname is required but the warning is printed out? Well you have different ways to proceed: you might just add the soname, which might be more work than it’s worth; you might move the library in a sub-directory and then use rpath to add it to the search path just of the package’s binaries; or you might just use the QA_SONAME_${ARCH} variable to ignore the warning (although I would argue that this should be then reported as a workaround warning by repoman).

My favourite solution here would be to actually use rpath to move the library out of the generic linker search path. While this adds one further search for all the libraries those binaries link to, it reduces the size of the main search path’s content… and you should know that the amount of files in a directory relates to the slowness of accessing the directory itself. While I don’t have any benchmark at hand, I find that much, much easier to deal with (and it’s actually one of the reasons why I’d like for the .la files to disappear: they artificially increase the size of the library paths without good reason, in my opinion).

But if you want to bring upstream the issue, it might be a good time to ask whether those libraries are worth it or not, following the indications I wrote in the article linked at the top of this post.

Security considerations: Robe of Glib Security

Stupid pun against the Robe of Glib Tongues which is an item found in the Oblivion game. You can read it either literally or related to the actual glib library.

So yesterday the advisory was published of a (very far-fetched) glib security issue I found some time ago. I’m no security expert, although I have written about the topic from time to time. I sincerely wasn’t really in the known about security tasks before joining Gentoo, it has been the contact with our security team, dealing with packages with security issues, and then working with xine that let me understand a bit better what the problems are with security and software. The way I found this issue has been quite.. homebrew.

At the time I was working on benchmarking glib to evaluate whether to use them or FFmpeg’s routines in lscube projects (at the end I went for glib’s mostly because we ended up using a lot of glib code already). During that evaluation, I noticed a crash in glib when using a huge text file, and after looking that up, it shown that it was really an overflow.

I’ll leave you to the advisory for the details of what the problem is exactly, but I’m going to tell you that doing a quick check around the glib sources, similar situations are all too common, mixing the 32-bit guint type with the platform-specific gsize type, which can cause similar overflows and similar security issues. Does this mean that glib is unsafe? To an extent, yes.

The problem here is that we have a quite commonly used library, that is indeed used in a vast range of applications and is probably quite widely available on embedded applications too, including Nokia’s devices. This library is in dire need of a security audit, and I hope this particular security issue, almost insignificant for the real users, would work fine as a warning light for it.

I’m sure somebody will scream that glib is just overcomplex and that developers should just reinvent the wheel every time they need to just to make sure they don’t suffer from others’ security bugs; on the other hand I’d like to say that the same problem that I found on glib by chance can easily be present in other software; and often times it is. Even FFmpeg suffers from similar problems, and yet reimplementing its features does not make it much better; for instance xine has had problems with code that implements the same demuxers as FFmpeg, even though with partially rewritten code. Similarly, VLC took the RealPlayer/RTSP library from xine, and they both suffer from the same kind of problems even though both have been evolving independently.

My suggestion, if anybody is concerned about similar overflows in their software, is to enable the -Wconversion flag, and make sure that there are no conversion problems; this should also find whenever strlen() return values are stored in non-size_t variables (which would then be smaller than the retuned value). The day when all the software will build with -Werror=conversion, will be the day when this kind of overflows will be quite less likely to appear. And while this can often be a very far fetched security issue, it does show that there is not enough security knowledge in our ecosystem.

Future proof your code: don’t use -Werror

When checking today’s failure logs from the tinderbox, to check if there are packages failing with readline 6.0 beside GDB, I’ve noticed a few more failures since last time, mostly related to the new gcc 4.3.3 ebuild and the fact that -D_FORTIFY_SOURCE=2 is now enabled by default. Which by the way is what makes the whole thing too noisy for my taste .

While there are a few cases where the code is explicitly being rejected by the compiler for being wrong, most of the failures seems to be extra warnings that morph into failures because of the use of -Werror in released code. I think I talked about this kind of problem in passing in the past, but never wrote a full entry about it. I think the time has come for that.

The -Werror flag for GCC (used also by ICC and equivalent to Sun’s -errwarn=%all) is often considered useful to make sure one’s code is solid enough to build without warnings. This is good since warnings often enough result in errors, and as I noted yesterday having too many warnings can cause new ones to be ignored and thus create a domino effect to the point the whole software is screwed. But, it’s not a good idea to unconditionally set it up in the released code.

Why do I say that? Because things change, and especially, warnings are added. The fact that a new compiler is stricter and considers a particular piece of code as something to warn about is not going to change the quality of the software per-se; and while it’s true that fixing the warnings early can save from failing further down the road, users often enough just need the thing to build and work when they need it, and would prefer not to have to fix a few warnings first.

So we have two opposite considerations: enabling -Werror can allow the developers and the users interested in the total correctness of the program to identify new warnings earlier, and at the same time the remaining set of users who don’t care about correctness but just want the thing to work would like -Werror disabled. What’s the solution? First of all, learn to use particular -Werror= flags, (see this old post of mine for some information about it), and then you should make the thing optional.

See this is what makes free software quite powerful sometimes, optionality. Just add a switch, a knob, a line to comment in and out, so that -Werror is used by default on developers builds but not on the normal user releases. For most non-autoconf-based build systems, -Werror is just passed along with the rest of the CFLAGS so it’s easy to deal with that, for autoconf-based systems, it’s not rare that it’s added at the end of the script, unconditionally. Why does that happen? Because passing it to the ./configure call, like any other compiler flag will almost certainly cause some autoconf checks to fail. No more no less.

True, sometimes what is warning in a version of GCC becomes error in the one after, so it’s not really a solution if the warnings are not taken care of. But that’s the very reason why GCC introduces them as warnings usually! It gives time to the developers to act on them before rejecting them out of the blue. Of course it would be nicer if GCC also added an extra specification like “this will become an error with release X.Y.Z” but still even that is often ignored so it does not really matter.

This becomes even more important for ebuild developers since having -Werror enabled does not really work well with Gentoo, since we might add new stricter GCC versions, or even one more entry in CFLAGS to enable further warnings (for instance I have -Wformat=2 -Wstrict-aliasing=2 -Wno-format-zero-length in mine), which would then cause packages to fail out of the blue. Unfortunately it seems that quite a bit of packages still use -Werror in their default build and not all Gentoo maintainers took care of removing it beforehand.

So please, don’t use -Werror in released code, make it optional, use it during development, but not in released code. And not in ebuilds either.

Warning: noise!

The compiler warnings are one of the most important features of modern optimising compilers. They replaced, for the good part, the use of the lint tool to identify possible errors in source code. Ignoring too many warnings usually mean that the software hasn’t been looked at and properly polished before release. More importantly, warnings can be promoted to errors with time, since often they show obsolete code structure or unstable code. An example of this is the warning about the size mismatch between integers and pointers, which became an error with the GCC 3.4 release, to make sure that code would run smoothly on 64-bit machines (which started to become much more common).

Usually, a new batch of warnings is added with each minor or major GCC release, but for some reason the latest micro release (GCC 4.3.3) enabled a few more of them by default. This is usually a good sign since a stricter compiler usually mean that code improves with it. Unfortunately I’m afraid the GCC developers have been a little too picky this time around.

You might remember how the most important warnings cannot become errors even if support for turning particular warnings in errors is present with recent GCC releases. But before that I also had some other problem with GCC warnings, in particular warnings against system headers .

That latter problem was quite nasty since by default GCC would hide from me cases where my code defined some preprocessor macro that the system headers were also going to define (maybe with different values). For that reason, I started building xine-lib with -Wsystem-headers to make sure that these case could be handled. It wasn’t much of a problem since the system headers were always quite clean after all, without throwing around warnings and similar. But this is no longer the case, with GCC 4.3.3 I started to get a much lower signal-to-noise ratio, since a lot of internal system headers started throwing warnings like “no previous prototype”. This is not good and really shows how glibc and gcc aren’t really well synchronised (similarly, readline and gdb, but that’s a story for another day).

So okay I disabled -Wsystem-headers since it produces too much noise and went looking for the remaining issues. The most common problem in xine-lib that the new gcc shows is that all the calls to asprintf() ignore the return value. This is technically wrong, but it should lead to little problems there since a negative value for asprintf mean that there is not enough memory to allocate the string, and thus most likely the program would like to abort. As it is, I’m not going to give them too much weight but rather remind myself I should branch xine-lib-1.2 and port it to glib .

Unfortunately, while the “warn about unused return value” idea is quite good for many functions, it is not for all of them. And somehow the new compiler also ignores the old workaround to shut the warning up (that would be a cast to void of the value returned by the function); while, again, technically good to have those warnings, sometimes you just don’t care whether a call succeeded or not because either way you’re going to proceed in the same way, thus you don’t spend time checking the value (usually, because you check it somehow later on). What the “warn about unused return value” should really point out is if there are leaks due to allocation functions whose return value (the address of the newly-allocated memory area) is ignored, since there is no way you can just ignore that without having an error.

One quite stupid place where I have seen the new compiler to throw a totally useless warning is related to the nice() system call; this is a piece of code from xine-lib:

#ifndef WIN32
  /* nice(-value) will fail silently for normal users.
   * however when running as root this may provide smoother
   * playback. follow the link for more information:
   * [snip]
   */
  nice(-1);
#endif /* WIN32 */

(don’t get me started with the problems related to this function call, it’s not what I’m concerned with right now).

As you can see there is a comment about failing silently, which is exactly what the code wants; use it if it works, don’t if it doesn’t. Automagic, maybe, but I don’t see a problem with that. But with the new compiler this throws a warning because the return value is not checked. So it’s just a matter to check it and eventually log a warning so to make the compiler happy, no? It would be if it wasn’t for the special case of the nice() return value.

On success, the new nice value is returned (but see NOTES below). On error, –1 is returned, and errno is set appropriately.

[snip]

NOTES

[snip]

Since glibc 2.2.4, nice() is implemented as a library function that calls getpriority(2) to obtain the new nice value to be returned to the caller. With this implementation, a successful call can legitimately return –1. To reliably detect an error, set errno to 0 before the call, and check its value when nice() returns –1.

So basically the code would have to morph in something like the following:

#ifndef WIN32
  /* nice(-value) will fail silently for normal users.
   * however when running as root this may provide smoother
   * playback. follow the link for more information:
   * [snip]
   */
  errno = 0;
  res = nice(-1);
  if ( res == -1 && errno != 0 )
    lprintf("nice failed");
#endif /* WIN32 */

And just for the sake of argument, the only error that may come out of the nice() function is a permission error. While it certainly isn’t an enormous amount of code needed for the check, it really is superfluous for software that is interested in just making use of it if they have the capability to do so. And it becomes even more critical to not bother with this when you consider that xine is a multithreaded program, and that the errno interface is … well… let’s just say it’s not the nicest way to deal with errors in multithreaded software.

What’s the bottom line of this post? Well, I think that the new warnings added with GCC 4.3.3 aren’t bad per-se, they actually are quite useful, but just dumping a whole load of new warnings on the developers is not going to help, especially if there are so many situations where you would just be adding error messages in the application that might never be read by the user. The amount of warnings raised by a compiler should never be so high that the output is filled with them, otherwise the good warnings will just get ignored causing software to not improve.

I think for xine-lib I’ll check out some of the warnings (I at least found some scary code in one situation) and then I’ll disable locally the warn-unused-return warning, or sed out the asprintf() and nice() return values ignoring. In either case I’m going to have to remove some true positive to avoid having to deal with a load of false positives I shouldn’t be caring about. Not nice.

Update (2017-04-28): I feel very sad to have found out over a year and a half later that Michael died. The links in this and other posts to his blog are now linked to the archive kindly provided and set up by Jan Kučera. Thank you, Jan. And thank you, Michael.

Update (2021-02-17): I’ve snipped away the broken URL to Miguel Freitas’s page that is now not found. You can find an archived copy — courtesy of the Wayback Machine.

GCC features and shortcomings

As any other Free Software developer, I think I have a love/hate relationship with the GNU Compiler Collection or, as it’s most commonly called, GCC. While GCC is a very good modern compiler, with tons of features, warning heuristics and very good optimisations, it’s very slow, and it’s not exactly foolproof when it comes to warnings.

In particular, I already wrote a couple of time that I dislike very much the way GCC cannot identify value sets but never used even though it should be trivial to do during SSA (and indeed the variables are not emitted in the final code), but it’s not just that.

Starting from version 4.3, GCC added a new support for warnings, the -Werror= option; with earlier versions you could turn every warning into an error with -Werror, or you had a couple of cases for -Werror-implicit-declaration for instance. With -Werror= you can set a specific class of warnings as being errors, and not turn the rest into errors. This is very good since some warnings like -Wreturn-types are very truly errors, rather than just warnings, and it’s indeed a good idea to stop the build if they are raised.

So -Werror= is good, and I started using it in more than a couple of my projects, to make sure that code is not injected that could break something. On the other hand, -Werror= requires that you know the name of the -W flag that enables a given warning. It’s very easy to do by using -fdiagnostics-show-option:

% gcc -x c -Wall -Wextra -fdiagnostics-show-option -c -o /dev/null - < In function ‘foo':
:2: warning: control reaches end of non-void function [-Wreturn-type]

Cool, now we know that to turn that warning into error we have to pass -Werror=return-type. Now this should be enough to turn any warning into errors, you’d think, but it’s unfortunately not the case. Take for instance the following case:

% gcc -x c -fdiagnostics-show-option -o /dev/null -c - <: In function ‘foo':
:2: warning: return makes integer from pointer without a cast

You can notice here two particular things, the first is the obvious one, that gcc is not reporting a warning option near the warning itself, which disallows us to use -Werror= and the other that I didn’t pass any -W flag to enable the warning. This is one of the warnings that are considered most important to gcc, so important that they are always enabled even when developers and users don’t go around asking for them, so important that the only way to disable it is to pass the -w flag to disable all warnings, as there is no -Wno- flag that disables them. But for these very same reasons, it is not possible to turn them into errors!

As you might guess is a paradoxical situation: the most important, most useful warnings (that most likely mean trouble) cannot be turned into errors because there is no way to disable them. And yet, there seems to be no development on this side, at least judging from the bug I reported .

Sometimes I find GCC is funny…

Variables assigned and never used

I have written in the past that I sometimes miss the Borland C compiler when it comes down to warnings, because there was at least one that it really was interesting and that GCC lacks: warning about variables whose value is set and never used.

The problem is for instance in this example C code:

int foo(int n) {
  int t = 123;

  t = bar(n);
  return t+n;
}

As you can guess, that t = 123 part is totally useless since t is replaced right afterward. Obviously without optimisation, GCC will emit the assignment anyway:

foo:
.LFB2:
        pushq   %rbp
.LCFI0:
        movq    %rsp, %rbp
.LCFI1:
        subq    $32, %rsp
.LCFI2:
        movl    %edi, -20(%rbp)
        movl    $123, -4(%rbp)
        movl    -20(%rbp), %edi
        movl    $0, %eax
        call    bar
        movl    %eax, -4(%rbp)
        movl    -20(%rbp), %edx
        movl    -4(%rbp), %eax
        addl    %edx, %eax
        leave
        ret

But the assignment will rightfully disappear once optimisations are turned on, even at just the first level:

foo:
.LFB2:
        pushq   %rbx
.LCFI0:
        movl    %edi, %ebx
        movl    $0, %eax
        call    bar
        leal    (%rbx,%rax), %eax
        popq    %rbx
        ret

This is all nice and fine but the problem is that GCC does not warn even if it does remove the variable. But it’s not the only one, even Sun Studio Express and the Intel C Compiler don’t warn about such a case. Interestingly enough, Sun Studio’s lint tool in “enhanced” mode does report the issue:

assigned value never used
    t defined at test-ssa.c(2)  :: set at test-ssa.c(2) :: reset at test-ssa.c(4)

too bad that autotools don’t integrate lint-like tools too easily (but maybe I can tie it in the FFmpeg buildsystem at least…).

Why am I so upset about this particular warning missing, you ask? Because it should be quite easy to implement considering that each compiler is most likely using SSA form for optimisation scans. And in SSA form, it’s trivial to see the code from before this way:

int foo(int n) {
  int t1 = 123;

  int t2 = bar(n);
  return t2+n;
}

and notice that t1 is unused in the whole function. Indeed, it’s what the compiler already does, to optimise out the assignment, the problem is: it just does not warn you that’s what it’s doing. Sincerely, how difficult could it be for a GCC hacker to add that warning? — I’m sure I wouldn’t be able to myself, since I don’t know GCC well enough, and it’s likely a mess, but it doesn’t sound like a difficult warning to implement, to me.

Warnings, to keep from doing the wrong thing

In the IT world we’re obviously full of practises that, albeit working, are very much hinted against because risky, broken on different setups or just stupid. Many of these practises are usually frequent enough either because they can be easy to apply without knowing, or because they were documented somewhere and people read and spread it.

These practices, in most compiled programming languages, when using optimising compilers, are guarded by warnings, almost-errors that are printed on the error stream of the compiler itself when it identifies a suspect construct. If you use Gentoo, you know them very well as you certainly see lots of them (unless you have -w in your CFLAGS).

Lots of people ignore warnings, because either fixing them is too much work as it would require changing a huge part of the code, or because they are not stopping them from compiling. Much more rarely it happens that the code actually works fine and the warning is bogus; it’s not unheard of though. Also the more advanced the warning, the more probabilities it might be implemented wrong.

On the other hand, the vast majority of warnings are put there for a good reason, and should actually be properly taken care of. These warnings could have been used to make a program 64-bit safe years before 64-bit systems started to be widespread, or might have made sure the code for a project written years before GCC 4.3 were to build correctly with the latest version of the compiler. Of course they are not the one and absolute solution, as many changes might not have had warnings before (like the std:: namespace change), but it could have helped.

But I don’t want to talk about compiler warnings today, but rather about Portage warnings. Since a few versions, thanks also to the availability of Zack and Marius, Portage started throwing warnings after a successful merge, giving you insight with possible problems with an ebuild or with the software the ebuild uses. These are pretty useful as they can catch for you if a ./configure switch was renamed, or removed, after a version bump; and they might tell you if the software is doing something risky and you should warn upstream about that. (Why should you warn upstream? Well, packagers often see lots more code than the daily programmer it’s not uncommon that a programmer might not know about an issue that a packager might know (because of the distribution policy about the problem).

In addition to these that might be setup-dependent, repoman also started warning about suspect (R)DEPEND and other issues with the ebuilds. Hopefully, even if repoman will probably become slower by piling up checks in it, it will be nice to make sure developers know what they are committing in.

This is particularly important because there are quite a few sub-optimal ebuilds in the tree already, and while it’s difficult to find and fix all of them, it’d be quite nice if we could avoid introducing new ones.

Unfortunately, I start to worry that it might not be as feasible as I hoped, because there is a huge fault in my idea that adding warning will keep people away from the mistakes: there is lack of documentation on these problems. As much as I wish I could count my blog as a source of documentation I know this is far from the truth, but I haven’t been able to start writing docs again yet because I was following this world rebuild closely, at least to understand how to follow my priorities. I know I’ll be working on quite a few things in the future, especially once the hospital is just a memory, and hopefully I’ll be able to write enough doc so that the warnings become clear enough that the whole tree will be safe for everybody to use under whichever circumstances.

System headers and compiler warnings

I wish to thank Emanuele (exg) for discussing about this problem last night, I thought a bit about it, checked xine-lib in this regard, and then decided to write something.

Not everybody might know this, but GCC, albeit reporting tons of useful warnings, especially in newer versions, together with a few warnings that can get easily annoying and rarely useful, like pointer signs, ignores system headers when doing its magic.

What does this mean? Well, when a system library install an header that would trigger warnings, they are by default ignored. This is useful because while you’re working on the application foo you won’t care what glibc devs did. On the other hand, sometimes these warnings are useful.

What Emanuele found was missing by ignoring the warnings caused by the system headers was a redefinition of a termios value in Emacs for Darwin (OS X). I checked for similar issues in xine-lib and found a few that I’ll have to fix soonish.

I’m not sure how GCC handles the warnings suppression, I think it’s worth opening a bug for them to change its behaviour here, though, as the redefinition is a warning caused by the program’s code, not by the system headers.

Now of course I hear the KDE users to think “but I do get warnings from the system headers”, in reference to KDE’s headers. Well, yes:

In file included from /usr/kde/3.5/include/kaboutdata.h:24,
                 from main.cpp:17:
/usr/qt/3/include/qimage.h: In member function ‘bool QImageTextKeyLang::operator<(const QImageTextKeyLang&) const':
/usr/qt/3/include/qimage.h:58: warning: suggest parentheses around && within ||

this warning I took from yakuake build, but it’s the same for every KDE package you merge, more or less. It’s a warning caused by an include, a library include, but in general the same rules apply for those.

Why is not not suppressed? The problem is in how the inclusion of the path happens. Which is probably my main beef against system headers warning suppression: it’s inconsistent.

By default the includes in /usr/include (and thus found without adding any -I switch) get their warnings suppressed. If a library (say, libmad) installs its headers there, it will get its warnings suppressed.

On the other hand if a library installs its headers in an alternative path, like /usr/qt/3/include in the example above, or a more common /usr/include/foobar, then it depends on how that directory is added to the path of searched directories. If it’s added through -I (almost every case) its warnings will be kept; they would be suppressed only if you use -isystem. Almost no software uses that option that, as far as I know, it’s gcc specific.

So whether a library will have the warnings caused by its headers suppressed or not, depends on the path. Nice uh? I don’t think so.

More work!

Good reasons to mix declarations and code: unused variables

As I expect a lot of people not being convinced by what I wrote yesterday about mixing declarations and code, here comes a simple example of where my style works better than the “classic” style of not mixing them:

#include 
#include 

char* oldstyle(const char *foo) {
  size_t foolen;

  if ( foo == NULL )
    return NULL;

  foolen = strlen(foo);

  return strdup(foo);
}

char* newstyle(const char *foo) {
  if ( foo == NULL )
    return NULL;

  const size_t foolen = strlen(foo);

  return strdup(foo);
}

(note: I should really try to find a better way to post code, possibly with syntax highlighting, if anybody knows something, please let me know).

In both these functions the foolen variable is unused, but if you compile this with gcc -Wextra -Wunused, it will only warn you about the one on line 19, that is, the one that is mixed in the code.

This is because foolen is being used at least once in the first function, to assign a value.

Now of course gcc is smart enough to know this and indeed the call to strlen() is not emitted already with -O1 on both functions, but you’ll still have it in the code. I’d suppose it would be a bug that GCC doesn’t warn you in the first case, but I should ask Mike or Mark about that.

Borland sorry, Codegear’s compiler does actually warn you for this style with something alike to “variable assigned a value never used” (I don’t remember the exact wording here). Which is quite nice because it also catches the cases where a variable is declared with an initialisation, but then is set within both branches of an if statement. Not much of a difference for basic types but it can find useless call in the thing. I think Visual C# compiler also does that, for attributes too which is quite more interesting, but there are more considerations about that I suppose.

I wonder how many of these problems are there in the source code of xine-lib and xine-ui, I’m afraid a lot more than currently gcc reports.