Debug

Cleaning up after yourself

I have noted in my post about debug information that in feng I’m using a debug codepath to help me reduce false positives in valgrind. When I wrote that I looked up an older post, that promised to explain but never explained. An year afterwards, I guess it’s time for me to explain, and later possibly document this on the lscube documentation that I’ve been trying to maintain to document the whole architecture.

The problem: valgrind is an important piece of equipment in the toolbox of a software developer; but as any other tool, it’s also a tool; in the sense that it’ll blindly report the facts, without caring about the intentions the person writing the code had at the time. Forgetting this leads to situation like Debian SA 1571 (the OpenSSL debacle), where an “unitialised value” warning was intended like the wrong thing, where it was actually pretty much intended. At any rate, the problem here is slightly different of course.

One of the most important reasons to be using valgrind is to find memory leak: memory areas that are allocated but never freed properly. This kind of errors can make software either unreliable or unusable in production, thus testing for memory leak for most seasoned developers is among the primary concerns. Unfortunately, as I said, valgrind doesn’t understand the intentions of the developers, and in this context, it cannot discern between memory that leaks (or rather, that is “still reachable” when the program terminates) and memory that is being used until the program stops. Indeed, since the kernel will free all the memory allocated to the process when it ends, it’s a common doing to simply leave it to the kernel to deallocate those structures that are important until the end of the program, such as configuration structures.

But the problem with leaving these structures around is that you either have to ignore the “still reachable” values (which might actually show some real leaks), or you receive a number of false positive introduced by this practice. To remove the false positive, is not too uncommon to free the remaining memory areas before exiting, something like this:

extern myconf *conf;

int main() {
  conf = loadmyconf();
  process();
  freemyconf(conf);
}

The problem with having code written this way is that even just the calls to free up the resources will cause some overhead, and especially for small fire-and-forget programs, those simple calls can become a nuisance. Depending on the kind of data structures to free, they can actually take quite a bit of time to orderly unwind it. A common alternative solution is to guard the resource-free calls with a debug conditional, of the kind I have written in the other post. Such a solution usually ends up being #ifndef NDEBUG, so that the same macro can get rid of the assertions and the resource-free calls.

This works out quite decently when you have a simple, top-down straight software, but it doesn’t work so well when you have a complex (or chaotic as you prefer) architecture like feng does. In feng, we have a number of structures that are only used by a range of functions, which are themselves constrained within a translation unit. They are, naturally, variables that you’d consider static to the unit (or even static to the function, from time to time, but that’s just a matter of visibility to the compiler, function or unit static does not change a thing). Unfortunately, to make them static to the unit you need an externally-visible function to properly free them up. While that is not excessively bad, it’s still going to require quite a bit of work to jump between the units, just to get some cleaner debug information.

My solution in feng is something I find much cleaner, even though I know some people might well disagree with me. To perform the orderly cleanup of the remaining data structures, rather than having uninit or finalize functions called at the end of main() (which will then require me to properly handle errors in sub-procedures so that they would end up calling the finalisation from main()!), I rely on the presence of the destructor attribute in the compiler. Actually, I check if the compiler supports this not-too-uncommon feature with autoconf, and if it does, and the user required a debug build, I enable the “cleanup destructors”.

Cleanup destructors are simple unit-static functions that are declared with the destructor attribute; the compiler will set them up to be called as part of the _fini code, when the process is cleared up, and that includes both orderly return from main() and exit() or abort(), which is just what I was looking for. Since the function is already within the translation unit, the variables don’t even need to be exported (and that helps the compiler, especially for the case when they are only used within a single function, or at least I sure hope so).

In one case the #ifdef conditional actually switches a variable from being stack-based to be static on the unit (which changes quite a bit the final source code of the project), since the reference to the head of the array for the listening socket is only needed when iterating through them to set them up, or when freeing them; if we don’t free them (non-debug build) we don’t even need to save it.

Anyway, where is the code? Here it is:

dnl for configure.ac

CC_ATTRIBUTE_DESTRUCTOR

AH_BOTTOM([#if !defined(NDEBUG) && defined(SUPPORT_ATTRIBUTE_DESTRUCTOR)
           # define CLEANUP_DESTRUCTOR __attribute__((__destructor__))
           #endif
          ])

(the CC_ATTRIBUTE_DESTRUCTOR macro is part of my personal series of additional macros to check compiler features, including attributes and flags).

And one example of code:

#ifdef CLEANUP_DESTRUCTOR
static void CLEANUP_DESTRUCTOR accesslog_uninit()
{
    size_t i;

    if ( feng_srv.config_storage )
        for(i = 0; i < feng_srv.config_context->used; i++)
            if ( !feng_srv.config_storage[i].access_log_syslog &&
                 feng_srv.config_storage[i].access_log_fp != NULL )
                fclose(feng_srv.config_storage[i].access_log_fp);
}
#endif

You can find the rest of the code over to the LScube GitHub repository — have fun!

Mailbox: does GCC miscompile at -O0?

Jeremy asks via email:

I’m curious about the mention of gcc -O0 producing broken code in this article: do you have a reference to a particular bug for that? I think the article would benefit from being more specific about which versions of gcc produce broken code for -O0, otherwise readers will be scared off from using it for ever.

What he refers to is the Gentoo backtracing guide I wrote that gives some indications about using Portage features and ebuild support to get proper backtraces (stack traces), and how to avoid debug code instead.

If you really need a precise backtrace, you should use -O1 (using the optimisations of level 0 is not recommended as there are known issues with gcc building invalid code).

I’ll update the guide today if I can find the time, so you’ll probably find a more correct phrase up there, soonish.

The issue here is actually very complex, which is why I think a reply on the blog is worth the hassle. The first note here to do is GCC itself does not really miscompile with -O0 optimisations. The problems lie in the code of the software itself, which makes it much more difficult to deal with — when the problem is in the compiler you can isolate the problem and wait till it’s fixed to make sure you can use the feature; if the problem is in the code that gets compiled, then we cannot really be sure it all works, not unless you build and test everything and even then it might not be easy.

Anyway, -O0 does a little more than just disabling some optimisations, it changes a lot in the way code is handled, first of all by changing the preprocessor definitions:

flame@yamato ~ % diff -u <(echo | gcc -O0 -dM -E -) <(echo | gcc -O1 -dM -E -)
--- /proc/self/fd/11    2010-06-14 02:59:32.973351250 +0200
+++ /proc/self/fd/12    2010-06-14 02:59:32.974351316 +0200
@@ -31,6 +31,7 @@
 #define __UINTMAX_TYPE__ long unsigned int
 #define __linux 1
 #define __DEC32_EPSILON__ 1E-6DF
+#define __OPTIMIZE__ 1
 #define __unix 1
 #define __UINT32_MAX__ 4294967295U
 #define __LDBL_MAX_EXP__ 16384
@@ -89,7 +90,6 @@
 #define __UINT16_MAX__ 65535
 #define __DBL_HAS_DENORM__ 1
 #define __UINT8_TYPE__ unsigned char
-#define __NO_INLINE__ 1
 #define __FLT_MANT_DIG__ 24
 #define __VERSION__ "4.5.0"
 #define __UINT64_C(c) c ## UL

Beside showing the second big problem that I’ll talk about (disabling inline functions), it should remind you of my corner case testing from two years ago: disabling optimisation explicitly causes headers to change as they hide or show different interfaces, “proper” or “fast”. As you might guess already, this makes it slightly harder to actually track down a crash, or get an absolute stack trace, because it’s the (already pre-processed) code that changes itself; the crash might very well be only happening in the codepath taken when the optimised functions are used.

This gets combined with no inlining of functions: some of the system headers will replace definitions of some functions with inline wrappers when it is possible. These inlines often are faster, but they also take care of adding the warnings with _FORTIFY_SOURCE (the feature of recent GCC and GLIBC that adds warnings at build-time for probably-wrong behaviour of source code, ignoring return values, providing wrong parameters, or causing buffer overflows).

The two combined make using the system library (GLIBC) quite different between optimised and non-optimised builds; it’s almost like having two different basic C libraries, as you might guess, it’s important, to know where something’s crashing, to use the same C library to get the trace. In quite a few cases, rebuilding with -O0 can stop something from crashing, but it’s almost never caused by the compiler miscompiling something, it’ s usually due to the code only working with the “proper” interfaces, but failing when using “less proper” interfaces that are faster, but rely on the rules to be followed literally.

One example of this are the functions ntohs and ntohl: they are present as functions in the C library but they are also present as macros in the GLIBC includes, so that they can be expanded at build-time (especially when used on constants). This causes quite some change on the handling of temporary variables, if you play with pointers and type-punning (very long story now) so the behaviour of the ntohs or ntohl calls differs, and upstream is likely to have tested only one behaviour, rather than both.

Finally, there are possible build failures tied with the use of -O0, because, among other things, the Dead Code Elimination pass is hindered. Dead Code Elimination is a safe optimisation, so it’s executed even at -O0, its objective is to remove code that is never going to be executed, like an if (0) code. Using if (0) rather than adding #ifdef all over the code is a decent way to make sure that the syntax of the code is correct, even for those parts that are not enabled (as long as you have declarations of functions and the like). Unfortunately, without optimisations enabled, DCE will only work over explicit if (0) blocks, and not indirect ones. So for instance if you got code like this:

int bar() {
#if ENABLE_FOO
  int a = is_foo_enabled_runtime();
#else
  int a = 0;
#endif

  if ( a )
     process_foo_parameters();

  process_parameters();

  if ( a )
   foo();
  else
   not_foo();
}

DCE will only drop the if (a) branches when constant propagation is also applied, and will produce code with undefined references to process_foo_parameters() and foo() when -O0 is used for the build. This trick is often used in the wild to write code that is readable and validated for syntax even when features are not enabled, without using a huge quantity of #ifdef statements. And as I just shown, it can easily break when built with -O0. Among the others, FFmpeg uses this kind of code, so it cannot really be compiled at an absolute -O0.

Note: this mostly applies to GCC, as other compilers might do constant propagation even at -O0 given that it is a fairly stable and safe optimisation. Clang, though, behaves the same way.

While it can be argued that software shouldn’t rely on specific features of the C compilers but just on the C language specification, it is quite impossible for us to vouch that all the software out there will build and work properly when built with no optimisation at all, for the reasons I just explained. Gentoo of course is ready to accept bugs for those packages, but don’t expect most of us to go out of our way to fix it in the tree if you don’t provide a patch; alternatively, talk with upstream and get them to fix their code, or work it around somehow. If you find a package that fails to build or (worse) fail to work properly when built with -O0, please open a bug against the tracker after tracking down where the issue here. If you give us a patch we’ll likely merge it soon, otherwise it would still be feasible to track down the packages with problems.

The end of the mono-debugger saga

So after starting inspecting and finding the problem last night I finally had a tenative patch that makes mdb work fine.

Indeed, I simply implemented some extended debuglink file support into the Bfd managed wrapper, which finds sections and symbols in the debuglinked file whenever they are not found in the original file. This solves my problem, although it might not be complete yet, since I have written it in 20 minutes. I’ve attached the version for trunk on my bug report and I’ll add my backport to 2.4.2 to my overlay today. After a bit of testing, I hope to get it in main tree too.

Speaking of testing, the mono-debugger ebuild had a test restriction, with no bug referenced; I’m quite sure that the tests that do fail are the ones that should have told us that mono-debugger wouldn’t have worked on the default Gentoo install at all. I’ll probably have to add some logic to warn the user about split-debug setups (please not that our default of stripping files of debug information does not strip the symbol table of libpthread.so, otherwise also gdb wouldn’t work at all; and lets mdb work fine, so it’s only a problem with split-debug).

After the debugger finally started to work, I also found another problem: mono itself does not seem to load libraries requested by DllImport through the standard dlopen() interface, but it looks for them in particular directories; which don’t include all the possible directories at all. This became a problem because the current default version of libedit in Gentoo does not have a soname, and it caused mono to find a libedit.so that was not a library at all (but rather an ldscript). But that’s a problem for another day, and my solution is just to use a newer libedit version that works fine.

Now I’ll go back to my tinderbox, and in the next few days you’ll probably see a few more posts about different topics than Mono… even though I have a few patches to post there as well.

The debugged debugger — part 2

So after my last night’s post I finally found the problem.

Actually, my mixing in the new system libbfd sidetracked me for about an hour, because the same symptoms were caused by an API change that I didn’t maintain correctly; after that I was able to use both system and internal libbfd with the same exact results.

I started adding printing checkpoints within both in the C# Bfd wrapper and in the C glue code that called into libbfd; it’s not really an easy thing, because, well, libbfd is probably one of the most over-engineered libraries I have ever seen. It really provides a lot information for a lot of different executable and binary formats, but to do that it increases tremendously the complexity; indeed that’s one of the reasons why gold is much faster than standard ld and why I preferred to write my own Ruby-Elf rather than binding the Bfd interface and build up from that (which could have been more complete under a few circumstances).

At any rate, I was lucky to have enough knowledge about ELF files to identify the issue at the end, most people who wouldn’t have seen ELF would have given up along the way. At the end I cut down the chase to noticing that it was trying to load the symbol table (.symtab, which includes internal local symbols — symbols marked static and thus not exported), and found none. Since it wouldn’t be able to find any symbol you’d be surprised if it were to actually match the nptl_version variable I talked about yesterday.

Going down on that line, it turned out that, albeit Mono splits debug symbols in a different file (.mdb), mdb does not support the feature that allows to do that with ELF files: our splitdebug. I actually was wondering if that was the problem from the start, but then I ruled it out because Fedora also uses the same feature, and there mono-debugger starts fine. I now replaced “work fine” with “starts fine” as you’ll see in a moment.

So if mdb does not support split debug files, how on earth can it work on Fedora? Well, the symbol it’s trying (and failing) to identify here is nptl_version from libpthread.so.. a quick check on the laptop told me that Fedora does not strip .symtab from libpthread.so! I was actually afraid that Fedora weren’t stripping .symtab at all, but then I started using the /usr/bin/mono object as a reference, and there you cannot find the .symtab section at all: Fedora has a special case for libpthread.

Now, the quick solution would be of course to just not strip libpthread.so of its .symtab either, so that mdb could start properly; the problem with that solution is that you wouldn’t be able to get backtrace or anything else out of the unmanaged code because it wouldn’t be loading that at all. On distributions that use split debug (Gentoo if requested, Fedora, and I have no idea what else), mono-debugger would start, if libpthread.so has .symtab, but it won’t work with any object that has .symtab on the debug file; which is our case. So I’ll try to find time to actually fix it in mono-debugger; because it is a bug in mono-debugger, or maybe a missing feature, not a problem with “roll your own optimization flags” as Miguel wanted it to be.

Maybe this will convince them that maybe they should try to give credit to other distributions as well? Who knows, I hope so because I see that at least for what concerns building and packaging, mono-debugger has a huge space for improvement, and I’d like to help out with that, if they allow me.

Post scriptum: I was also able to make mono-debugger use the system libedit, the result is less spectacular than using system libbfd, but it’s still nice:

flame@yamato mono-debugger-2.4.2 % qsize mono-debugger
dev-util/mono-debugger-2.4.2: 25 files, 21 non-files, 2021.133 KB
flame@yamato mono-debugger-2.4.2 % qsize mono-debugger
dev-util/mono-debugger-2.4.2: 25 files, 21 non-files, 1561.300 KB

Now if only I could get it to work …

The neverending fun of debugging a debugger

In the previous post of mine I’ve noted that I found some issues with the Mono-implemented software monosim. Luckily upstream understood the problem and he’s working on it. In the mean time I’ve had my share of fun because mono-debugger (mdb) does not seem to work properly for me. Since I also need Mono for a job task I’m working on, I’ve decided to work on fixing the issue.

So considering my knowledge of Mono is above the average user, but still not that high, I decided to ask on #mono (on gimpnet). With all due respect, the developers could really try to be friendly, especially with a fellow Free Software enthusiast that is just looking for help to fix the issue himself:

 thread_db is a libc feature I think to do debugging
 Chances are, you are no an "interesting" Linux distro
 One of those with "Roll your own optimization flags" that tend to break libc
 miguel_ miguel
 miguel, yes using gentoo but libc and debugging with gdb are fine...
 I knew it ;-)
 Yup, most stuff will appear to work
 But it breaks things in subtle ways
 and I can debug the problem libc side if needed, I just need to understand what's happening mono-side
 You need to complain to the GDB maintainers on your distro
 All the source code is available, grep for the error message
 Perhaps libthread_db is not availabel on your system
 it is available, already ruled the simple part out :)
 and yes, I have been looking at the code, but I'm not really that an expert on the mono side so I'm having an hard time to follow exactly what is trying to do

As you can see, even if Miguel started already with the snarky comments, I tried keeping it pretty lightweight; after all, Lennart does have his cheap shots at Gentoo, but I find him a pretty decent guy after all…

Somebody else, instead, was able to piss me off in a single phrase:

 i thought the point with gentoo was that if you watch make output scrolling, you can call yourself a dev ;)

Now, maybe if Mr Shields were to actually not piss other developers off without reason, he wouldn’t be badmouthed so much for his blogs. And I’m not one of those badmouthing him, the Mono project or anything else related to that up to now. I actually already stated that I like the language, and find the idea pretty useful, if with a few technical limitations.

Now, let’s get back to what the problem is: the not-very-descriptive error message that I get from the mono debugger (that thread_db, the debug library provided by glibc, couldn’t be initialised) is due to the fact that glibc tries to check if the NPTL thread library is loaded first, and to do that it tries to reach the (static!) variable nptl_version. Since it’s a static variable, nm(1) won’t be able to see it, although I can’t seem to find it with pfunct either; to be precise, it’ll be checking that the version corresponds too, but the problem is that it’s not found in the first place.

Debugging this is pretty difficult: the mono-debugger code does not throw an exception for the particular reason that thread_db couldn’t be initialised, but simply states the obvious. From there, you have to backtrace manually in the code (manually at first because mono-debugger ignored all the user-provided CFLAGS, included my -ggdb to get debug information!), and the sequence call is C# → C (mono-develop) → C (thread_db) → C (mono-develop) → C# → C (internal libbfd). Indeed it jumps around with similarly-called functions and other fun stuff that really drove me crazy at first.

Right now I cut the chase at knowing that libbfd was unable to find the libpthread.so library. The reason for that is still unknown to me, but to reduce the amount of code that is actually being used, I’ve decided to remove the internal libbfd version in favour of the system one; while the ABI is not stable (and thus you would end up rebuilding anything using libbfd at any binutils bump), the API doesn’t usually change tremendously, and there usually is enough time to fix it up if needed; indeed from the internal copy to the system copy, the only API breakage is one struct’s member name, which I fixed with a bit of autotools mojo. The patches are not yet available but I’ll be submitting them soon; the difference with an without the included libbfd is quite nice:

flame@yamato mono-debugger-2.4.2 % qsize mono-debugger
dev-util/mono-debugger-2.4.2: 25 files, 21 non-files, 4944.144 KB
flame@yamato mono-debugger-2.4.2 % qsize mono-debugger
dev-util/mono-debugger-2.4.2: 25 files, 21 non-files, 2020.972 KB

In the package there is also an internal copy of libedit; I guess because it’s not often found in distributions, but we have it, and on Gentoo/FreeBSD it’s also part of the system, so…

Now, no doubt that this hasn’t brought me yet to find what the problem is, and it’s quite likely that the problem is Gentoo specific since it seems to be working fine both on my Fedora and other systems. But is the right move for the Mono team to diss off a (major, I’ll have to say) developer of a distribution that isn’t considering removing Mono from their repository?

Cleaning up

Following the debug for yesterday’s problem I decided to spend some time to analyse the memory behaviour of feng with valgrind, and I noticed a few interesting things. The main one is that there are quite a few places where memory is allocated but is never ever freed, because it’s used from a given point till the end of the execution.

Since memory gets deallocated when the process exits; adding explicit code to free that type of resources is not necessary; on the other hand, it’s often suggested as one of the best practises in development since leaks are never a good thing, code may be reused in different situations where freeing the resource is actually needed, and most importantly: not freeing resources will sidetrack software like valgrind that is supposed to tell you where the leaks are, which either forces you to reduce the level of detail (like hiding all the still-reachable memory) or to cope with false positives.

With valgrind, the straightforward solution is to use a suppression file to hide what is known to be false positives; on the other hand, this solution is not extremely practical: it is valgrind specific, yet valgrind is not the only software doing this; it does not tell you if you were, for any reason, fiddling with pointers during the course of the program, and it separates the knowledge about the program from the program itself, which is almost never a good thing.

Now of course one could just always free all the resources; the overhead of doing that is likely minimal when it comes to complex software, because a few calls to the free() functions and friends are probably just a small percentage of the whole code of the software; but still if they are unnecessary there is no real reason to have them compiled in the final code. My approach is perhaps a bit more hacky, but I think it’s a nice enough approach: final memory cleanup is done under preprocessor conditional, checking for the NDEBUG preprocessor macro, the same macro that is used to disable assertions.

In a few places, though, I’ve been using the GOnce constructor (more or less equivalent to pthread_once) that allows me to execute initialisation when actually needed instead of having initialisation and cleanup functions to call from the main function. While this works pretty well, it makes it tricky to know when to actually free resources. Luckily most modern compilers support the destructor attribute, which basically is a way to say that a given function, that can very well be static, should be called before closing down. Relying on these destructors for actual program logic is not a very good idea because they are not really well supported, but for leak testing and debugging they can be quite useful instrumentation.

I’ll try to provide more detailed data once I make feng not report leaks any longer.

Debug code and debug information

There are many different but common misconceptions about debugging that are spread among those users who, having never learnt a programming language, cannot understand properly the difference between debug code and debug information. Some of these misconception causes misunderstanding with Gentoo’s way of handling the two things as separate and distinct feature of a software.

First of all, let’s start with what the -g, -ggdb and -g3 options are supposed to do. These three flags are used to add debug information, in form of either stabs or DWARF data, depending on the architecture, to the compiled files, may they be object files, shared objects or final executables. This data is used by debuggers like gdb to provide a meaningful backtrace, and is added to some special sections of the file. The difference between a file that is built with these options and a file that is built without those can be removed by using the strip command, since they don’t go touching the actual executable code or data entries. The only software that is susceptible to break with -g3 is the dynamic loader, and even that I’m not sure why.

The various level of debugging information are used to provide various level of backtracing, starting from giving the name of the functions called, and arriving to have the line numbers, the source lines, and macro expansion (which is especially useful when debugging stuff like scanelf, that is composed of a huge amount of macro-based meta-functions. Even when the full debug information is enabled in files, it’s not hindering performance, if not during the first scan and read of the ELF files, since the loader does not load the debug information by default, they are not in sections that are allocated in memory at runtime at all. (This is something you can easily understand once you know the difference between allocated and non-allocated ELF segments).

Debug code, instead, means adding special instruction in the executable code for debugging purposes; the most simple example is the assert() macro used to make sure that unexpected code paths are not taken; although this is often misused as a way to enact limitations in functions, the original idea behind assertions was to make the program die in a way easy to debug when a condition supposed to be always true was instead false. These checks wouldn’t be needed during standard usage, or should be handled gracefully if they indeed happen, so the assertions would just need to be taken out of the built code at that point, which is exactly what -DNDEBUG does. (On an autotools-related note, the AC_HEADER_ASSERT autoconf macro not only checks for the correct header, but also provides an easy to use --disable-assert option for the configure script to disable assertions altogether). Unfortunately nowadays assertions are often used to check the behaviour of code at runtime, even though an error would then cause the abort of the software, which makes it more difficult to just disable them altogether for final users.

But debug code can be much more complex, and might slow down operation a lot; it might be logging data extensively, it might fill the terminal with pointless information, it might check every and each step during processing. This type of code must certainly not be enabled for users’ runtime or their work would be greatly hindered.

Now that the distinctions are made, you can see why splitdebug/strip FEATURES are distinct from the debug USE flag. If you want to just get a backtrace for a crash you got during execution, you need debug information, you don’t need debug code; if possible, debug code might actually stop the software from crashing; as could reducing the optimisation flags. For users, it’s more than likely than the debug USE flag wouldn’t be useful at all; for developers who know what to do, this fine-grained control is most likely the best option they have.

So please next time you think about mixing the debug USE flag and the splitdebug/strip FEATURES in the same idea, try to think of what exactly you want to achieve. And no, disabling -O2 is not always a good idea to have a meaningful backtrace, especially since as I said, -O0 might make stuff not build, so you shouldn’t be ready to just enable that unconditionally to get a backtrace for a bug report.