Compounded issues in GLIBC 2.12

I’ve recently ranted about the fact that GLIBC 2.12 was added to portage in a timeframe that, in both my personal and professional opinion, was too short. I’ve not dug too much into what the problems with that particular version, and why I think it needed to stay in package.mask for a little longer.

The main reason why we’re seeing failures is because the developers have been, for a few versions already, cleaning up the headers so that including only one of them won’t cause a number of other, near-unrelated headers to be included. This is the same problem as I described with -O0 and libintl.h almost two years ago.

Now, on FreeBSD, and Mac OS X by reflection, a number of these cleanups have been done a long time ago, or the problem was never introduced: they both try to stick to the minimal subset of interfaces you need to bring in. This is why half the time what you have to do to port something to FreeBSD is just adding a bunch of header inclusions. This should mean the whole situation is easily handled, and should already be fixed in most situations, but that’s far from the case. Not only, for no good reason at all, a number of projects protect the inclusion of extra headers with #ifdef calls for FreeBSD and Mac OS X, but one of the particular headers cleaned up causes a much worse problem than the usual “foo not defined” or constants not found.

The problem is that the stat.h inclusion has been dropped by a number of headers, which means it has to be added back to the code itself. Unfortunately, since C allows for implicit declarations (Portage does warn of them!), this means that a call of S_ISDIR(mymode) and similar will appear to the compiler like an implicit function declaration, and will then emit a requirement for the (undefined) symbol S_ISDIR… which of course is not a function but a macro defined in the header file. Again, this is not as troublesome as it looks: the linker will catch the symbol as undefined and halt the build, in the best of cases. And just to make sure, Portage logs these problem further, stating that they can create problems at runtime — of course that would be more useful if developers actually paid attention and fixed them, but let’s not go there for now.

The real problem come when this kind of mistake is present in shared objects; by default ld will allow shared objects to have undefined reference, especially if they are plugins (since then the symbols may be resolved by the host program, if not by a library that they can be linked to). Of course, most sane libraries will be using --no-undefined for the final link… but not all of them do so. A common problem situation is Ruby and its extensions – at least outside of Gentoo, given that all our current Ruby ebuilds force --no-undefined at extension link time. The only warning you have there is the implicit-declaration of the fake-function, but that’s far from being difficult to oversee.

And before you suggest that, no -Werror-implicit-declaration is not going to be a good idea: most of the autoconf tests fail if that is passed, which result in a non-buildable system; that’s why it’s one of the few flags I play with on all my autoconf-based projects!

Also, as Ruby itself proved – ask me again why I always rant when I write about Ruby – there are ways around the warning that don’t constitute a fix even for the most optimistic of the developers.

But then, yuo have the million euro question: how severe is this bug? The only way to judge that is to understand what symptoms it causes. Given that’s not a security feature, you’re not going to have security bugs here, but you have:

  • build failures: obnoxious but manageable; things just failing to build are not going to worry me most of the time; when the fix is as easy as adding a missing include line, I have no reserve against going ~arch;
  • runtime load/startup failure: already a bit less acceptable; failing to load a plugin/extension with a “undefined symbol” error is rarely something users will like to see; it’s bad but not too bad;
  • runtime abortions: now this is what I’m upset about; having missing symbols at runtime mean that you cannot either trust the software you built, nor that which starts up cleaning; aborting at runtime means that your software might be in the middle of some transactions when it reaches the error situation, and could even corrupt your data; this is made more likely given the fact that it’s part of the stat(2) handling code!

There is a good thing though: -Wl,-z,now which is part of the default settings for hardened profiles, or that can be set at runtime by setting LD_BIND_NOW=1 in the environment, will ensure that all the symbols are bound when starting up the process, rather than lazily while the code executes; it can reduce the risk of the missing symbol to be hit in the middle of the transaction. Unfortunately it does not work the same way with extensions for languages like Ruby and Python, but at least alleviate a bit the problem.

Combine what I just wrote with the fact that even a package part of the system set (m4, used by autoconf, not something you can live without) failed to build with this glibc version by the time it went unmasked, showing a lack of system rebuild on the system of the developer choosing to unmask it, and you might guess why I ended up using such strong words in my previous post.

autoconf-2.64: the phantom macro menace

While I should write it down on autotools mythbuster in a most generic fashion, since I found this today, I wanted to write down these notes for other developers. With autoconf-2.64 there can be a problem with “phantom macros” as in macros that are called, but seem not to produce any code.

In particular, I noticed a configure failure in recode today. The reported error out is the following:

checking for flex... flex
checking lex output file root... lex.yy
checking lex library... -lfl
checking whether yytext is a pointer... yes
checking for flex... (cached) flex
./configure: line 10866: syntax error near unexpected token `fi'
./configure: line 10866: `fi'

Looking at the actual configure code, you can easily see what the problem is around line 10866:

if test "$LEX" = missing; then
  LEX="$(top_srcdir)/$ac_aux_dir/missing flex"
  LEX_OUTPUT_ROOT=lex.yy
  else


fi

In sh, you probably know already, “else fi” is invalid syntax; but what is the code that produces this? Well, looking at configure.in is not enough, you need to also check an m4 file for the package:

# in configure.in:
ad_AC_PROG_FLEX

# in m4/flex.m4
## Replacement for AC_PROG_LEX and AC_DECL_YYTEXT
## by Alexandre Oliva 
## Modified by Akim Demaille so that only flex is legal

# serial 2

dnl ad_AC_PROG_FLEX
dnl Look for flex or missing, then run AC_PROG_LEX and AC_DECL_YYTEXT
AC_DEFUN(ad_AC_PROG_FLEX,
[AC_CHECK_PROGS(LEX, flex, missing)
if test "$LEX" = missing; then
  LEX="$(top_srcdir)/$ac_aux_dir/missing flex"
  LEX_OUTPUT_ROOT=lex.yy
  AC_SUBST(LEX_OUTPUT_ROOT)dnl
else
  AC_PROG_LEX
  AC_DECL_YYTEXT
fi])

So there are calls to AC_PROG_LEX and AC_DECL_YYTEXT macros, so there should be code in there. What’s happening? Well, maybe you remember a previous post where I listed some user-advantages in autoconf-2.64 :

Another interesting change in the 2.64 release which makes it particularly sweet to autotools fanatics like me is the change in AC_DEFUN_ONCE semantics that makes possible for macros to be defined that are executed exactly once. The usefulness of this is that often times you get people to write bad autoconf code, that instead of using AC_REQUIRE to make sure a particular macro has been expanded (which is usually the case for macros using $host and thus needing AC_CANONICAL_HOST), simply call it, which would mean the same check is repeated over and over (with obvious waste of time and increase in size of the generated configure file).

Thanks to the AC_DEFUN_ONCE macro, not only it’s possible to finally define macros that never gets executed more than once, but also most of the default macros that are supposed to work that way, like AC_CANONICAL_HOST and its siblings, are now defined with that, which means that hopefully even untouched configure files will be slimmed down.

Of course, this also means there are more catches with it, so I’ll have to write about them in the future. Sigh I wish I could find more time to write on the blog since there are so many important things I have to write about, but I have not enough time to expand them to a proper size since I’m currently working all day long.

Indeed the two macros above are both once-expanded macros, which means that autoconf expand them before the rest of the now-defined macro. Now, the solution for this is using M4sh properly (because autoconf scripts are not pure sh, they are M4sh, which is a language in the middle between sh and m4). Instead of using if/then/else, you should use AS_IF; indeed changing the above macro to this:

AC_DEFUN(ad_AC_PROG_FLEX,
[AC_CHECK_PROGS(LEX, flex, missing)
AS_IF([test "$LEX" = missing], [
  LEX="$(top_srcdir)/$ac_aux_dir/missing flex"
  LEX_OUTPUT_ROOT=lex.yy
  AC_SUBST(LEX_OUTPUT_ROOT)dnl
], [
  AC_PROG_LEX
  AC_DECL_YYTEXT
]])

allows autoconf to understand the flow of the code and produces the proper sh code (this is true sh code) in the final configure file:

checking for flex... flex
checking for flex... (cached) flex
checking lex output file root... lex.yy
checking lex library... -lfl
checking whether yytext is a pointer... yes

(see how the two checks for flex are both up the list of checks?).

Unfortunately there are more problems with recode, but at least this documents the first problem, which I’m afraid is going to be a common one.

Testing the corner cases

One interesting thing of using chroots to check things out is that often enough you stumble across different corner cases when you get to test one particular aspect of packages.

For instance, when I was testing linking collisions, I found a lot of included libraries. This time testing for flags being respected I found some other corner cases.

It might some funky, but it has been common knowledge for a while that gcc -O0 sometimes produced bad code, and sometimes it failed to build some packages. Unfortunately it’s difficult to track it down to specific problems when you’re “training” somebody in handling the compiler. Today, I found one of these cases.

I was going to merge sys-block/unieject in my flagstesting chroot so I could make sure it worked properly, for this, it needed dev-libs/confuse, which I use for configuration files parsing. All at once, I found this failure:

 i686-pc-linux-gnu-gcc -DLOCALEDIR="/usr/share/locale" -DHAVE_CONFIG_H -I. -I. -I.. -Wall -pipe -include /var/tmp/portage/dev-libs/confuse-2.6-r1/temp/flagscheck.h -MT confuse.lo -MD -MP -MF .deps/confuse.Tpo -c confuse.c  -fPIC -DPIC -o .libs/confuse.o
confuse.c: In function 'cfg_init':
confuse.c:1112: warning: implicit declaration of function 'setlocale'
confuse.c:1112: error: 'LC_MESSAGES' undeclared (first use in this function)
confuse.c:1112: error: (Each undeclared identifier is reported only once
confuse.c:1112: error: for each function it appears in.)
confuse.c:1113: error: 'LC_CTYPE' undeclared (first use in this function)
make[2]: *** [confuse.lo] Error 1
make[2]: Leaving directory `/var/tmp/portage/dev-libs/confuse-2.6-r1/work/confuse-2.6/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/var/tmp/portage/dev-libs/confuse-2.6-r1/work/confuse-2.6'
make: *** [all] Error 2

This was funny to see as I did merge confuse lately on my main system on Yamato, and I do have nls enabled there, too. It didn’t fail, so it’s not even a glibc cleanup-related failure.

Time to dig into the code, where is setlocale used in confuse?

#if defined(ENABLE_NLS) && defined(HAVE_GETTEXT)
    setlocale(LC_MESSAGES, "");
    setlocale(LC_CTYPE, "");
    bindtextdomain(PACKAGE, LOCALEDIR);
#endif

As I used this before, I know that locale.h provides setlocale() function, but usually it’s included through gettext’s own libintl.h header file, so where is that included? A common problem here would be to have different preprocessor tests between the include and the use so that one is applied but not the other.

#if defined(ENABLE_NLS) && defined(HAVE_GETTEXT)
# include 
# define _(str) dgettext(PACKAGE, str)
#else
# define _(str) str
#endif
#define N_(str) str

It seems to be entirely fine, the only problem would be if libintl.h didn’t include locale.h, but why would it then work on the rest of the rest of the system?

The focal point here is to check why libintl.h includes locale.h in one place and not the other. Let’s then look at the file itself:

/* Optimized version of the function above.  */
#if defined __OPTIMIZE__ && !defined __cplusplus

[...]

/* We need LC_MESSAGES for `dgettext'.  */
# include 

[...]

#endif  /* Optimizing.  */

So not for any kind of assurance, but just because there’s a technical need, libintl.h brings in the declaration of setlocale(), and only if you have optimisations enabled. Guess what? my chroot has no optimisations enabled, as I don’t need to execute the code, but just build it.

The fix here is very easy, just include locale.h explicitly; I’ll be sending a patch upstream and submitting one to Gentoo, but it puts an important shadow over the correctness of Free Software when building with optimisations disabled. I suppose this is one other thing that I’ll be testing for in the future, in my checklist.

Amazon UK: you fail.

It so happens I like(d) Amazon but they really have failed me now, probably for the last time.

First of all you have to know that the majority of Italian credit cards aren’t like the one around in US, rather than paying all at once at the end of the month, they tend to force you to pay in installments, which is quite fine as long as I’m waiting to be paid, but are not good if I have the money at hand, as it adds interests that I could avoid otherwise.

Sure there are pre-paid cards, but it’s an hassle to handle, they are quite limited in payment, can’t be confirmed in PayPal, and usually allows you to charge them only with a fixed amount of money, so you can’t just load 234.34 euro. And each charge costs you money too.

So when I did see that my bank’s (UniCredit Banca) debit card switched to a smartcard with embedded the 19 digits Maestro code, and that Amazon UK supposedly accepted Maestro debit cards, I decided to get a new debit card, replacing my old one that didn’t have the Maestro code. This took me two weeks to have my new Bancomat card working with its PIN. Incidentally, about at the same time I had trouble with my prepaid card, so I returned that too, leaving myself with just the MasterCard credit card.

When I finally decided to place an order, my Maestro card was rejected: they wanted either a start date or an issue number, and Maestro cards in Italy don’t have either. Googling around, seems like Irish cards also don’t have those, and they actually seem to be an UK-only feature. So I contacted Amazon customer care. (I also tried placing an order with Play.com, but although they first accepted the code, they rejected it afterward, which pissed me off in a different way)

The first mail from Amazon UK customer care suggested me to set the payment method to cheque, and then call in to get it changed to Maestro, giving the card’s details to the phone operator. So I did, this morning I called the UK non-free number, thanks to VoIP at least I paid very little, but I still paid! After 20 minutes, the operator told me that unfortunately they only accept Maestro cards with an issue number or a start date, and suggested asking the bank for the data. So I did.

The bank of course doesn’t know anything about issue numbers or start date, what they provided me should be quite enough, it just isn’t for Amazon UK.

Interestingly, for both email requests and phone calls, Amazon asks you if the customer care sastified you, so I told them I wasn’t satisfied because their own documentation repeats that Maestro cards are accepted, without stating that only some Maestro cards are accepted.

And then I got a more interesting reply, credit cards are accepted almost worldwide, but debit cards, as well as cheques, need to be pounds-based, that’s why they require the two attributes that are not part of my card. So I again asked them to make it clear in the documentation that Maestro cards in eery other part of the globe are not accepted.

The last message I received didn’t make sense to me, as they repeated where I could find the Issue Number for my card (I can’t, I told them!) and referred to a combo menu I talked about, I didn’t talk about a combo menu.

Now, I’ve had to cancel my order, and this makes me angry as I was hoping to get Devil May Cry 4 soon enough, as I completed Ratchet and Clank and I need something new to vent off. And Amazon made me even more angry so I’d need more venting off.

Amazon you most likely have lost a customer.

Failures… maybe giving up?

Tonight I feel quite depressed; I’m not sure myself why, I just can’t sleep.. I slept a few hours, but in the past weeks, every night was a bad dream, and every time I finally was able to wake up, it was a new salvation, and these few hours were also tense; I woke up, and I tried to focus my mind to stay awake as much as possible, even though I know it’s not good not to sleep. But I cannot sleep.

I suppose that one of the reasons I cannot sleep is that, well, I feel like I’m a total failure lately. Don’t get me wrong, I always knew I wasn’t a very good developer on many fields, but I’ve understood now that the only thing in my life as a free software developer that wasn’t a failure was Gentoo, where I was able to actually get something done, and somehow done right, but I resigned from there, and that means I made it into a failure even the only time I was going not to fail.

I think I have a special feeling for failing projects; NoX-Wizard started crumbling down when I’ve joined… and in a few months even Fabrizio, who was the admin of the project, gave up on it; I’ve been working on a CMS of my own, before the sprouting of CMS everywhere, it had quite a few edges, for what I could tell, things that were later implemented by more complex CMSes, but I was never able to release anything usable; ATMOSphere lasted just the time needed to get out of high school, and I wasn’t able to actually make it scale as a true working project. I blamed most of these failures to inexperience, but it doesn’t seem like getting more years of experience helped me on this. The most recent example is xine, which is a project that is going to fail unless the development flow is totally changed and new developers are brought on board, which is what me and the few developers left are trying to do, but it will take some time I’m afraid, and it’s also a high risk, as if we are not able to handle the switch correctly, the project will probably wither and die at the same time.

I’ve been able to provide patches here and there, but they are usually trivial fixes, or boring stuff that required just to be done, and nobody cared (not sure why, but I often ended up caught in doing those little things that everybody know are needed, but nobody wants to do, this since I joined my first UltimaOnLine shard, Dragons’ Land, and I was able in 24 hours to fix bugs standing there since months… nothing esoteric really, just a couple of fixes here and there… and at the end, that project died too). My projects are mostly dead or dying, I’m not even sure myself why sometimes.

Gitarella was promising, but then beside me not having time for updating it in a while, nobody else seemed interested in it. I received no feedback about it, and I don’t know of anybody actually running it, so I don’t know what to do with it. Rust got me one feedback (David’s), and I do hope it won’t die just yet – I just didn’t have time lately, but I will restart working on it tomorrow’s night).

I even chosen a dying software for my blog, as Typo doesn’t seem that much alive.

Ignorance is bliss

Now that I feel this way, I’m tempted to shut down Farragut, give back Prakesh to its owner (well, this I should probably do already now that I don’t have to care about Gentoo/FreeBSD), and sell away Enterprise.. then I could buy an iMac, and just use that for the rest of my life, without having to care about development anymore, without having to fail once again.

I’ve considered changing distribution, to cut the ties with Gentoo as much a I could, but I can’t really find anything that provides me with what I need, which first of all is a good multimedia support without patent-crap stuff (which rules out most of distributions already) and a vast selection of packages for development. I can’t really see myself using anything but Gentoo anymore, and this is driving me crazy because I have to stand here seeing stuff going kaput without being able to do anything to correct them.

I’m not sure what is letting me down so much, maybe it’s seeing the months spent on ALSA going down the toilet or maybe it’s the dreams that are upsetting me all too much. I should really reconsider my life, right now I feel like it’s being a total waste of chemical energy without anything good coming out of it.

If there was a “Most Useless Free Software Developer Award”…

… I would have won it many times.

So, yesterday I was so happy around to have removed 10MiB and 300KiB of memory waste from picoxine, that I gone a step further, and worked on supporting mmap-ed (memory mapped) files input in xine. What I hoped was to not having to read the file by hand piece by piece, but rather pass the whole mmap to the decoder and be done with it.

I worked on it and I was able to get something usable, after a while, but… the block reading does not work as I hoped, it works only for MPEG TS, thus breaks for anything but FLAC (I’m not sure why, to be honest).

At the end, the MMAP implementation is in xine-lib, it does not have any regression, up to now, but it’s not the improvement I hoped, either. Right now instead of read() calls you have memcpy() calls that copies from the MMAP the data for the decoder to parse, it’s still a good waste of memory I’m afraid.

Another problem are the buffers used for the audio decoding, that I’m not sure how are well reused and freed and so on, unfortunately from picoxine I cannot play more than one file with a time between them, and I cannot get an useful massif output from Amarok, because QImage::create() seems to do some spikes of memory usage which reduce the scale of the graph. By the way: such impressive usage of memory seems to really be one of the causes for which Konqueror and similar are slow; I haven’t looked up Qt’s sources, but likely if you can reduce the memory usage of QImage::create(), KDE would be quite faster.

Last night I also tried to implement new rtp/rtsp support through lu_zero’s library, but resulted in a moot point that I will probably never be able to get something useful from, like for the WavPack decoder … sigh, I should probably give up on adding more stuff and just think of doing the little tweaks that gives you 300KiB out of 200..

I’m depressing myself thinking of how many useless project I started, I continued with eager of preparing something useful, and now are there waiting for something, probably their death. Stuff like rubytag++ is in my GIT repository bitrotting, as the original library itself (TagLib) had enough holes to make my bindings useless for what I hoped to do, and upstream seems not to care at all (I still haven’t received any reply from Wheeler, to my many bug reports and considerations); ruby-hunspell is probably a lost cause, too, as the unicode conversion thing I’m still unable to find a way to bind; gitarella is there, I cannot continue it alone, because I have not enough time, and there is still lots of stuff to add; KDigest in Ruby is still incomplete, I forgot why in the first place now…

Okay, better not to think of the failures, at least I can be proud of my ebuilds :)