What’s wrong with Gentoo, anyway?

Yesterday I snapped and declared my intent to resign from Gentoo together with stopping the tinderbox and leaving the use of Gentoo either. Why did that happen? Well, it’s a huge mix of problems, all joined together by one common factor: no matter how much work I pour into getting Gentoo working like it should be, more problems are generated by sloppy work from at least one or two developers.

I’m not referring about the misunderstandings about QA rules, which happens and are naturally caused by the fact we’re humans and not being of pure logic (luckily! how boring it would be otherwise, to always behave in the most logical way!). Those can upset me but they are still after all no big deals. What I’m referring to is the situation where one or two developers can screw up the whole tree without anybody being (reasonably) able to do a thing about it. We’ve had to two (different) examples in the past few months, and while both have undeniably bothered QA, users, and developers alike, no action has been taken in any of these cases.

We thus have developer A, who decided that it’s a good idea to force all users to have Python 3 installed on their systems, because upstream released it (even when upstream consider it still experimental, something to toy with), and who kept on ignoring calls for dropping that from both users and developers (luckily, the arch teams are not mindless drones, and wouldn’t let this slide to stable as he intended in the first place). The same developer also hasn’t been able to properly address one slight problem with the new wrapper after months from the unleashing of that to the unstable users (unstable does not mean unusable).

Then we have developer B who feels like the tree’s saviour, the only person who can make Gentoo bleeding edge again… while most of if not all of the rest the developer pool is working on getting Gentoo more stable and more maintainable. So, among the things he went on doing, there was a poorly-performed Samba bump (suboptimal was the term he used — I ended up having to fix the init scripts myself because they weren’t stopping/restarting properly, as the ebuild and the init scripts went out of sync regarding paths), some strangely incomplete PostgreSQL changes, and a number of minor problems with the packages.

Of the two, I was first upset most by the former, but on the long run, the latter is the one who drove me mad. Let’s not dig too much on the stance about --as-needed (cosmetics — yeah because being able to return from a jpeg bump with less than 100 packages, rather than the whole world, is just cosmetics), and the fact that he’s ignored most of the QA issues with the packages he touched. Instead look at the behaviour with a package of mine (alas, I made the mistake of let this one slip with just a warning, I should have taken the chance to actually defer it to devrel…): vbindiff.

The package is something I added a while ago because from time to time it comes out useful. I’m in metadata.xml; I’m definitely not an unresponsive maintainer. Yet, while my last bump was on June 2008, the version in tree was not the latest one up to last September (2009). Why? A quick glance at the homepage shows that the beta4 release was mostly fixing a Win32 bug, and introducing a way to enable debug-mode. So what happens? Our mighty developer decides to go on and bump the package; without asking me; with nobody asking him; without a mail, a nod or anything. I literally notice this as emerge tries to upgrade a package I know I maintain. You’d expect for the debug support to be present in the ebuild then, and you’d find a debug USE flag if you checked now indeed, but that’s something I added myself afterwards, as the damage of pointlessly bumping something was already done.

Now, why did that happen? Well, he admitted he just went through the dev-* categories, without considering maintainers declared in metadata, and blindly bumped ebuilds when the latest version available on the site was higher than the one in tree. Case in point he had to open the vbindiff site and thus the release notes regarding Win32 and --enable-debug would have been clearly visible, if he cared to even read part of them. Whoever tried doing serious ebuild business should know that most of the time even the upstream-provided release notes are not something to go on by… Interestingly enough, his bleeding-edge hunger didn’t make him ask for a new stable, and we currently have a very old one.

So there we have your developer B, the super-hero, the last good hope of the bleeding edge, who bumps packages without consulting the guy who maintain them (and is around almost 247) and without even caring to use them at all. Why did I let it slip? Because I was most focused on trying to stop developer A at the time is probably the right answer. I did issue a reprimand reminding him to not touch someone else’s packages, and to learn using package.mask for things like Samba. I was hoping he would listen. Oh boy, was I ever so wrong.

Speaking a second again about Samba, did I tell you yet that the split into multiple packages was done, straight to ~arch, without any plan to follow-up to convert dependencies? Wonder why the whole thing is now stalemated again. Maybe the arch teams don’t see it all too well to have the same kind of dependency breakage in stable as there was/is on unstable right now.

First-hand information about our developer B wants him to be inlined with a zealot point of view regarding the Mono project — you’d then guess that dotnet stuff would be the last thing he’d be touching, but instead, without any questioning, ignoring the fact I stated at FOSDEM that I was going to look into that as soon as I had time, the fact that I stated before multiple times that I was already working on un-splitting the gtk-sharp packages, and the fact that I took contact with the Mono developers (again at FOSDEM) to try following upstream more closely. Oh and the one thing that pissed me off about that bump? Beside the fact that tomboy now refuses to work? Remember this patch? It was dropped; without even mailing me if I had or could make a version for the latest version. It was dropped in unstable (or, how it should be called if this kind of stuff is allowed to continue, unusable).

And the cherry on top? As I said, this developer touched Samba, PostgreSQL, now Mono… there are three aliases for these things (samba, pgsql-bugs and dotnet), who the bugs are assigned to… he’s on none of them! And before somebody tries to argue that, I’m pretty confident he’s not following the aliases on the Bugzilla (plus, given he also argued that the problem was with leaving security-vulnerable stuff in the tree – which by the way means having working, complete, safe ebuilds to be able to mark stable, and he doesn’t seem to be able to come up with any of those – the most important security bugs don’t get sent to watchers). How does he suppose to see the bugs coming? Oh but by wrangling the bug himself! Yeah, after all developers don’t file bugs themselves assigning them straight to the maintainers by procedure, do they? (fun fact: Bugzilla queries report at most 5K bugs, so that list is a very much limited result from what I was hoping to get); nor do other developers ever wrangle it would be silly, and there is no Arch Tester to speak of, right?

You can now see most of the pictures, and why I’m mostly upset with developer B. What made me snap yesterday were remarks that insisted that I was just “whining” and “not doing enough” as bugs kept piling up. What the heck? I constantly had over 1000 bugs (over 1300 today) for the past year or so, I know very well that bugs keep piling up! And I’ve been doing all I can do outside of my work hours (while I have to thank some people, including Paul, David, Simon, Andrew and Bela for their contributions, I’m not paid to do Gentoo work; and while I do get to use it, and thus contribute back to, for some of the jobs I take, it’s definitely not the same as working on Gentoo), including the whole RubyNG porting and improvement trying to make sure we can actually get to a point where unmasking Ruby 1.9 will not break any user whatsoever. Am I really doing too little? ”Not enough”?

Okay so the proper way to handle this, with the current procedures, would be to take this up to the Developers’ Relations so that they could act on it; QA can only ask infra to restrict commit access if we’re expecting a grave and dangerous breaking of the tree, or misuse of commit rights. So why didn’t I bring this up to devrel? Well, the main reason is that devrel nowadays, as far as I can tell, is exactly three people: Petteri, Denis and Jorge, and of the three the only one who’s for preventive suspension of commit rights is Denis (this has been proven with the case about developer A above); one out of three does not really sound much of a chance for this to improve the situation. And if – again as happened with developer A – DevRel then decided that the right action would be to issue a reprimand, that would amount to scolding the developer and asking to work more with others… well, it wouldn’t change a thing.

The whole QA system has to change! We’ve got to write down guidelines, rules, and laws, and be conservative in applying them. You shouldn’t go around breaching them and then appealing when QA finds you out of line, you should talk with QA if you feel the rule is misapplied to your case in any way.

So here you go, in a nutshell, why my preservation instinct right now is telling me to flee. I’m not sure yet if I’ll outright flee or just give it time for the situation is addressed and then decide. The reason is: I still like the Gentoo system, and since I rely on it for my work I cannot leave it alone; if I were to move to anything else I would have to spend (waste?) even more time to fix the same issues anyway, and I’d much rather get Gentoo working right. But I cannot do this alone, I cannot do this especially if I have support neither from developers nor users. So please voice your concern.

If you feel like Gentoo needs the better QA, if you feel like we shouldn’t be translating unstable to unusable, then please ask for it. I’m not saying that we should become stale like Debian stable, but if it takes a few months to get something straight, then it should take its time and not be forced through (that’s what the Ruby team has been doing all this time to work with Ruby 1.9 and Ruby EE and other implementations as well!). If you use Twitter, identi.ca, Digg, Reddit, Slashdot, whatever, get this post running. Maybe I’m subverting the process, but to quote BBC’s NewsQuiz, “Trial by media is the most efficient form of justice” (this was in reference to the British MP expenses scandal last year), and right now my only concern is effectiveness.

You should refuse stable

While Donnie thinks about improving Gentoo management, I already said that I’m going to keep myself to the technical side of the fence, working on improving Gentoo’s technical quality, which I find somehow lacking, and not because of a lack of management. Maybe just for the other way around, there are too many people trying to get the management part working and they fail to see that there is a dire need for technical skills work.

Today I started (not to willingly to be honest) the day with a mail from Alexander E. Patrakov who CCed me on a Debian bug about GCC 4.4 miscompilation; while Ryan is the one who has been working on GCC 4.4, I guessed I could do what Alexander suggested, since I got the tinderbox set up.

To do this I simply set up one more check in my bashrc’s src_unpack hook, and used the tinderboxing script that Zac provided me with to run ebuild/unpack phase for all the latest versions of all packages. Now, besides the fact that this has the nice side effect of downloading the sources even of the stuff that was missing up to now, I found some things that I really wouldn’t have expected to.

Like calling econf (and thus ./configure during unpack phase). Which is bad in so many ways, because it disallows to fiddle with the configure in hook, and in my case wastes time when doing just a search. What worries me, though, is not this mistake, but rather the fact that one of the two ebuilds with it I found up to now went stable a few days ago!

Now, I can understand that arch teams are probably swamped with requests, but it would be nice if such obvious mistakes in ebuild were spotted before the stuff goes stable. For instance, I like the way Ferris always nitpicks on the ebuilds, including the test suites, since it actually allows to spot things that might have escaped the maintainer, who’s likely used to the thing, or has a setup where the thing works already. I don’t care if it stops my stable requests for a few days or even months, but if there is a problem I missed, I like to know that beforehand.

So please, you should refuse stable if something is not right with an ebuild, and even if it’s not a regression, for trivial stuff like that you should refuse it without hesitation.

Future proof your code: don’t use -Werror

When checking today’s failure logs from the tinderbox, to check if there are packages failing with readline 6.0 beside GDB, I’ve noticed a few more failures since last time, mostly related to the new gcc 4.3.3 ebuild and the fact that -D_FORTIFY_SOURCE=2 is now enabled by default. Which by the way is what makes the whole thing too noisy for my taste .

While there are a few cases where the code is explicitly being rejected by the compiler for being wrong, most of the failures seems to be extra warnings that morph into failures because of the use of -Werror in released code. I think I talked about this kind of problem in passing in the past, but never wrote a full entry about it. I think the time has come for that.

The -Werror flag for GCC (used also by ICC and equivalent to Sun’s -errwarn=%all) is often considered useful to make sure one’s code is solid enough to build without warnings. This is good since warnings often enough result in errors, and as I noted yesterday having too many warnings can cause new ones to be ignored and thus create a domino effect to the point the whole software is screwed. But, it’s not a good idea to unconditionally set it up in the released code.

Why do I say that? Because things change, and especially, warnings are added. The fact that a new compiler is stricter and considers a particular piece of code as something to warn about is not going to change the quality of the software per-se; and while it’s true that fixing the warnings early can save from failing further down the road, users often enough just need the thing to build and work when they need it, and would prefer not to have to fix a few warnings first.

So we have two opposite considerations: enabling -Werror can allow the developers and the users interested in the total correctness of the program to identify new warnings earlier, and at the same time the remaining set of users who don’t care about correctness but just want the thing to work would like -Werror disabled. What’s the solution? First of all, learn to use particular -Werror= flags, (see this old post of mine for some information about it), and then you should make the thing optional.

See this is what makes free software quite powerful sometimes, optionality. Just add a switch, a knob, a line to comment in and out, so that -Werror is used by default on developers builds but not on the normal user releases. For most non-autoconf-based build systems, -Werror is just passed along with the rest of the CFLAGS so it’s easy to deal with that, for autoconf-based systems, it’s not rare that it’s added at the end of the script, unconditionally. Why does that happen? Because passing it to the ./configure call, like any other compiler flag will almost certainly cause some autoconf checks to fail. No more no less.

True, sometimes what is warning in a version of GCC becomes error in the one after, so it’s not really a solution if the warnings are not taken care of. But that’s the very reason why GCC introduces them as warnings usually! It gives time to the developers to act on them before rejecting them out of the blue. Of course it would be nicer if GCC also added an extra specification like “this will become an error with release X.Y.Z” but still even that is often ignored so it does not really matter.

This becomes even more important for ebuild developers since having -Werror enabled does not really work well with Gentoo, since we might add new stricter GCC versions, or even one more entry in CFLAGS to enable further warnings (for instance I have -Wformat=2 -Wstrict-aliasing=2 -Wno-format-zero-length in mine), which would then cause packages to fail out of the blue. Unfortunately it seems that quite a bit of packages still use -Werror in their default build and not all Gentoo maintainers took care of removing it beforehand.

So please, don’t use -Werror in released code, make it optional, use it during development, but not in released code. And not in ebuilds either.

Do we need the older versions of automake?

One interesting problem relatd to autotools is the presence in portage of many different versions in multiple slot for automake, one per minor version. Historically the reason for this is that different software requires different autotools version in their bootstrap scripts and thus we needed to have them around. While this is mostly still the case, autotools lately have improved a lot, and especially software designed to work with automake 1.8 mostly work out of the box with version 1.10, with some exceptions.

As of today’s standards for autotools in portage (the use of autotools eclass to begin with), the only reasons to have older automake versions around are the rare cases where the build system does not work with modern automake, and to avoid running the whole lot of eautoreconf when just changing a Makefile.am file. But even this does not work that well because sometimes you just prefer to rebuild everything for the sake of it.

I guess one thing we could be doing is trying to migrate as much packages as possible to automake 1.10 and make sure that version older than 1.8 are well masked (those would be ancient enough). My reason to keep the newer ones is to take into consideration eventual stricter checks in 1.10 that might not work for some less-than-recent software, and also because a lot of KDE-related ebuilds using the admin/ directory from the KDE 3 build system (which is an aberration built over autotools, rather than plain autotools) fail to work with newer automake versions, and is not trivial to port them over.

Since the older autotools version may be useful for other things, like working on very old projects that really cannot be ported, I guess removing them from portage is not really an option, but they should come with huge “warning” messages, and the developers should really think a lot before deciding to restrict the automake version to something that is not “latest”.

Now, the “latest” option is certainly not entirely safe, when 1.11 will be added to portage, “latest” will pick that up and might cause quite some stir since it could break older software which would need to be either restricted to 1.10 or fixed up (the latter preferred), but there is a solution for that too. Given that all the software that actually use autotools depend on them, it’s not so complex to just set up a chroot tinderbox like mine and build all the software that uses automake to ensure that it builds with the new version, before unleashing it in ~arch.

Unfortunately there are quite a few problems with that: the first is that since we consider implicit all the dependencies over system packages, which means we cannot really just find a list of software that uses a particular piece of software that is considered system, like autotools, gettext or flex. This is something I really have a beef with since it makes it non-trivial to just test all the software that would be involved with a particular tool change without rebuilding the whole tree over.

The second is that not all developers think that it’s worth masking even for jut a day or a week tools that break backward compatibility, with the result that for about a week the tree is broken (at least for the most common packages, I still find failures related to gcc 4.1 in the tree, and I found more about 4.3 after I unleashed my log data miner). I’m afraid the problem is that the compartmental work in Gentoo calls for a tighter coordination between teams, if we want not to break stuff for too many users at once.

Another issue is that since there are lots of packages in the tree that are nearly never used, and they thus suffer from a clinical case of bitrotting. While it’s not nice to just kill all of them once they have problems, it’s also impossible to say that a package can enter tree, be unmasked or marked stable only when all the packages are fixed. Just look at the GCC 4.3 tracker to see how many packages still don’t build with GCC 4.3; it would be unfeasible to expect that all of the are fixed before it goes stable, considering how much time it’s taking even for bugs that have patches to fix the issues.

I should find time to commit those patches, but either I do that or I work on refining the log mining script so that I can file more bugs to the point, trying to avoid duplicates. The boring bit is I cannot rely on just checking the open bugs, since some bugs like the pre-stripped files found ones got fixed without revbump, and thus the packages didn’t hit in my tinderbox. And just so you know, ccache held up its myths: it works nicely for a work like a tinderbox, since the same package is built multiple times (when it fails and it’s a dep of something else) it can reuse the built code each time. But even with this setup which is the perfect setup for ccache, it hasn’t got over 2:1 ration in three and a half tinderbox executions. Just to say.

A couple of thoughts about package splitting

In my post regarding remote debugging (which I promised to finish with a second part, I just didn’t have time to test a couple of things), I’ve suggested the idea I’d like to have some kind of package splitting in Portage, to create multiple binary packages out of a single source package and ebuild, similarly to what distributions based on RPM or deb do (let’s call them RedHat and Debian, for historical reasons).

Now, I want to make sure nobody misunderstand me: I don’t intend to propose this as a way of removing the fine-grained control USE flags give us; I sincerely love that; and I also love not having to worry about installing -dev and -devel packages on my machines to be able to build software, even outside of the package manager’s control. I really find these two are strengths of Gentoo, rather than weakness, so I have no intention to fiddle with them. On the other hand, I think there are enough uses that would allow for an even finer control on binpkg level.

I’ve already given a scenario in my post about remote server debugging, but let’s try to show something different, something I’ve actually been thinking about myself. Yes I know this is a very vested interest for me, but I also think this is what makes Free Software great most of the time; we’re not solutions looking for problem, but usually solutions to problem one had at least at one point in time. Just like my writing support for NFS export on the HFS+ filesystem in Linux.

So let me try to introduce the scenario I’ve been thinking about. As it happens, I tend to a series of boxes in many offices for friends and friends of friends in my spare time, on the side. It’s not too bad, it does not pay my bills, but it does pay for some side things, which is good. Now since these offices usually use Windows, even though I obviously install Firefox as the second step after doing the system updates, it’s not unlikely that every other time I go there I have to clean up the systems. I think there are computers I’ve wiped up and reinstalled a few times already. I’ve now been thinking about setting up some firewalls based on Snort or similar. Since I am who I am, these would end up being Gentoo-based (as a side note, I’m tempted to set it up here so I can finally stop having trouble with Vista-based laptops that mess up my network). Oh and please, I know it might sound very stupid considering there are solutions good for this already, but considering how much I’m paid and the amount of money they are ready to spend (read: near to none), I would find it nicer to be paid to work on some Gentoo-related stuff than be paid to just look up and learn how to use already made equipment. Of course if you have suggestion, they are welcome anyway.

So anyway, in this situation I’d have to set up boxes that would usually feel very embedded-like: a common basis, the minimum maintenance possible, upgrades when needed. Donnie’s idea of using remote package fetching and instant deletion is not that good for this because it still requires a huge pipe to shove the data around; not only I don’t have so much upload bandwidth to employ for binpkging a whole system with debug information, it would also be a hit that most of my users wouldn’t like to have, on their bandwidth (if they want to use BitTorrent or look up p0rn from the office is not my problem).

With this in mind, I’d sincerely find it much nicer to be able to split packages, Portage-side, into multiple binary packages that can be fetched, synced, or whatever else, independently, as needed. As I proposed, a binpkg for the debug information files, but also a binpkg for documentation (including man and info pages), one for development data (headers, pkg-config), and maybe one for the prepared sources, that I want to talk about in a moment. With an environment variable it shouldn’t be much of a problem to choose which ones of these split binary packages to install in the system without explicit request; with a default including all of them but the debug informations and the sources. This would also replace the INSTALL_MASK approach as well as noinfo, noman, nodoc FEATURES. It wouldn’t be like a logical split of a package in multiple entries in the system, but rather a way to choose which parts to install, complementary to USE flags.

As for packaging the sources as I said above, there are two interesting points to be made for that, or maybe three. The first problem is that when you have to distribute a system based on Gentoo, you cannot just provide the binaries; since many packages are released under the GNU GPL version 2, even if you didn’t change the sources at all you should be distributing them alongside the binaries; and we modify a lot of sources. For license compliance we should also provide the full set of sources from which the code is derived. This is especially tricky for embedded systems. By packaging up the sources used for the builds, embedded distributors would be able to just provide all the -src subpackages as the full sources for the system.

The second point is that you can use the source packages for debugging too. Since there is, as far as I know,no way to fully embed the source code of software in the debug section of the files generated from that, the only way for GDB to display the source code lines during debugging is having the source files used for build available during the debugging session. This can easily be done by packaging up the sources and installing them in, say, /usr/src/portage/ when they are needed, from a subpackage.

A final point would be that by packaging sources in sub-packages, and distributing them, we could be reducing the overhead for users to unpack (maybe with uncommon package formats) and prepare sources (maybe with lots of patches and autotools rebuilding). Let’s say that every 6 hours a server produces md5-based source subpackages for all the ebuilds of the tree, or a subset of them. Users would then use those sources primarily, but still having the ebuilds to provide all the data and workflow so that the original untouched source would be enough to compile the package. Of course this would then require us to express dependencies on a per-phase basis, since then autotools wouldn’t be required at buildtime at all.

Okay I guess I’m really dreaming lately, but I think that throwing around some ideas is still better than not doing so, they can always be picked up and worked on; sometimes it worked.

For A Parallel World. Theory lesson n.2: handling broken ebuilds

Up to now in my series I’ve written about fixing upstream projects and I’ve given hints on how to design a properly parallel-safe build system. I haven’t written anything yet about handling the ebuilds.

While my proposal for replacement of simple makefiles would take care of most minor parallel make issues, it is limited to fixing very broken build systems since for totally non-complex software, parallel make is not an issue at all; most problems happen with complex custom rules. For all the most complex cases, you need to fix the build system appropriately, patch it down and so on.

But before you can get to that you have to take care of handling the ebuild correctly. While it’s certainly not a cool thing for owners of multicore systems to serialise a build, it’s also not good for them to have a package failing, even if it’s during a limited time before the build system is fixed. But if you add -j1 to an emake call, while you review what the problem is, there is a huge chance that the problem will remain there, hidden.

So when you have to deal with such a problem, my suggestions are these:

  • make sure that you check for if the build fails with parallel make; this involves checking it multiple time at multiple levels on a true multicore system; the reason for this is that parallel build issues are race conditions, they might require specific conditions to show up;
  • if you can identify for sure there is a parallel make issue, open a bug for it; even if it’s your package, open a bug for it, it will help you track it down; having a bug for each failure is very important since you need to know that there is a bug to ensure it’s fixed;
  • add -j1 to the ebuild’s emake call; this is a temporary measure, an hack, something that you should never rely on; but having it there will prevent build failures until you can fix the original bug;
  • write a comment referencing the bug you just open where -j1 was added, this will ensure that finding the reason for the non-parallel make will just require a bug lookup rather than searches and searches;
  • when you commit the ebuild, make sure the ChangeLog also references the bug number, make it as noticeable as possible that there is still a bug and you’re just working it around;
  • and the critical part of this: keep the bug open! Some developers hate having bugs open and would rather close everything even when they work around the bug rather than fix it, and wait for upstream or someone else to fix it properly, this is a mistake here: you have to leave the bug open.

When you add -j1 to an ebuild, you’re doing so as a contingency measure, to avoid users complaining that the package does not build; but you’re more than likely to have users complaining that the package does not use parallel make either, and they are right on that, it should. By closing the bug, you’re telling them to “go away” since the package builds, which is not what you should be doing. Instead you should acknowledge that there is a bug, and that it has to be fixed.

So if you find a bug from me about parallel make issue and I changed the ebuild to force -j1, don’t dare closing the bug or I might really get annoyed… if you don’t know how to fix it, just ask me, savvy?

Fixing CFLAGS/LDFLAGS handling with a single boilerplate Makefile (maybe an eclass, too?)

So, in the last few weeks I’ve been filing bugs for packages that don’t respect CFLAGS (or CXXFLAGS) using the beacon trick. Beside causing some possibly false positives, the testing is going well.

The problem is that I found more than a couple of packages that either do call gcc manually (I admit I’m the author of a couple of ebuilds doing that) or where the patch to fix the Makefile would be more complex than just using a boilerplate makefile.

So what is the boilerplate makefile I talk about? Something like this:

$(TARGET): $(OBJS)
        $(CC) $(LDFLAGS) -o $@ $^ $(LIBS)

Does it work? Yes it does, and it will respect CFLAGS, CXXFLAGS and LDFLAGS just fine, the invocation on an ebuild (taking one I modified earlier today) would be as easy as:

src_compile() {
    emake CC="$(tc-getCC)" 
        TARGET="xsimpsons" 
        OBJS="xsimpsons.o toon.o" 
        LIBS="-lX11 -lXext -lXpm" || die "emake failed"
}

Now of course this would suck if you had to do it for every and each ebuild, but what if we were to simplify it to an eclass? Something like having an ebuild just invoking it this way:

ESIMPLE_TARGET="xsimpsons"
ESIMPLE_OBJS="xsimpsons.o toon.o"
ESIMPLE_LIBS="-lX11 -lXext -lXpm"

inherit esimple

For slightly more complicated things you could make it use PKG_CONFIG too…

ESIMPLE_TARGET="xsimpsons"
ESIMPLE_OBJS="xsimpsons.o toon.o"
ESIMPLE_REQUIRED="x11 xext xpm"

inherit esimple

so that it would call pkg-config for those rather than using the libraries directly (this would allow to simplify also picoxine’s ebuild for instance that uses xine-lib).

Even better (or maybe I’m getting over the top here ;)), one could make the eclass accept a possibly static USE flag that would call pkg-config --static instead of standard pkg-config and append -static to the LDFLAGS, so that the resulting binary would be, well, static…

If anybody has comments about this, to flesh it out before it could actually be proposed for an eclass, it would be a nice time to say here so we can start with the right foot!

Bitrotting

When I went looking for unit testing frameworks, I was answered by more than one people (just Tester publicly though), that I shouldn’t care even if there are no new releases of check or cunit, that as long as they do what they have to, it’s normal they don’t get developed. Sincerely, I don’t agree: projects that don’t get maintained actively start to bitrot.

On a related note, I always find it quite intriguing when you can apply organic concepts to software, like bitrot and softdiversity. It mean that even human creations of pure logic abide to laws their creators didn’t explicitly take interest in.

Anyway, the bitrotting process is very natural, and can easily be seen. If you try to use and build the sources of a project written ten years ago or so, it’ll be very difficult for you to. The reason is that dependencies evolve, deprecate interfaces, improve other interfaces, tools also change and try to do a better work, reducing the chances for users to shoot themselves in the feet and so on. Even a software written using no additional dependency but the C library and the compiler will probably not work as intended, unless it relies purely and squarely on standards and it was written so that each part of the code was 100% error-free. As you can guess, it’s an inhuman work to make sure that everything is perfect and it’s rare to provide such a certainty.

But it’s not really limited to this: software is rarely perfect as it is, but hardware also evolves. Software written with a clear limit on the amount of memory to use, used on a modern system is going to be suboptimal; software written for a pure sequential execution, is not going to make proper use of modern multi-core systems. Software features also improve, and you obviously want to make sure that you make the best use of the software so that your performance don’t get hindered by obsolete techniques and features (think of fam versus inotify).

Sometimes bitrotting attacks also very commonly used tools, like strace, and just sometimes they get taken over, like dosfstools. Some other times, the bitrotting attacks more the buildsystem and the syntax part of the code rather than the actual code behind it, like it’s the case for FooBillard (which if I’ll ever have spare time, I’ll import in git and fix up — it’s the only game I actually would like to play on Yamato and I cannot because it doesn’t build here).

But bitrotting does not stop at complete complex software projects, it also applies to small things like ebuilds: a lot of ebuilds I filed bugs for are broken because nobody tended to them in a lot of time: the compiler got stricter and they didn’t get tested. While Patrick did already a few sweeps with his tinderbox on software for the stable tree, there were and still are lots of packages in ~arch that didn’t get tested, and so I’m getting to fill bugs related to glibc 2.8 and gcc 4.3 once again. But it’s not just that, mistakes in DEPEND/RDEPEND variables, problems with ebuilds not using die to terminate on failure, and so on.

There are of course problems with other things, like packages that don’t use slot dependencies yet (and let’s not even start about use dependencies that were introduced so recently — by the way should this mean I can finally have my cxx, perl and python USE flags turned off by default?), but those are quite limited. Instead, I found that the problem with -O0 building that I found quite some time ago is not that common, although I admit I’m not sure whether that’s due to more packages actually knowing to include locale.h or if it’s just that -O0 is not respected.

Hopefully, one day these sweeps will be so common that the glibc 2.8 problems would be found and reported in the first week of entering portage, so that the developers are also fresh enough to know how to deal with those errors. On the other hand, after I’ll be done with this and I’ll have enough free time (since I’m also working on lscube, and FFmpeg, and so on), I’ll see to fix the packages in maintainer-needed for which I reported bugs, it’ll help direct people to the right thing to do, I hope.

Warnings, to keep from doing the wrong thing

In the IT world we’re obviously full of practises that, albeit working, are very much hinted against because risky, broken on different setups or just stupid. Many of these practises are usually frequent enough either because they can be easy to apply without knowing, or because they were documented somewhere and people read and spread it.

These practices, in most compiled programming languages, when using optimising compilers, are guarded by warnings, almost-errors that are printed on the error stream of the compiler itself when it identifies a suspect construct. If you use Gentoo, you know them very well as you certainly see lots of them (unless you have -w in your CFLAGS).

Lots of people ignore warnings, because either fixing them is too much work as it would require changing a huge part of the code, or because they are not stopping them from compiling. Much more rarely it happens that the code actually works fine and the warning is bogus; it’s not unheard of though. Also the more advanced the warning, the more probabilities it might be implemented wrong.

On the other hand, the vast majority of warnings are put there for a good reason, and should actually be properly taken care of. These warnings could have been used to make a program 64-bit safe years before 64-bit systems started to be widespread, or might have made sure the code for a project written years before GCC 4.3 were to build correctly with the latest version of the compiler. Of course they are not the one and absolute solution, as many changes might not have had warnings before (like the std:: namespace change), but it could have helped.

But I don’t want to talk about compiler warnings today, but rather about Portage warnings. Since a few versions, thanks also to the availability of Zack and Marius, Portage started throwing warnings after a successful merge, giving you insight with possible problems with an ebuild or with the software the ebuild uses. These are pretty useful as they can catch for you if a ./configure switch was renamed, or removed, after a version bump; and they might tell you if the software is doing something risky and you should warn upstream about that. (Why should you warn upstream? Well, packagers often see lots more code than the daily programmer it’s not uncommon that a programmer might not know about an issue that a packager might know (because of the distribution policy about the problem).

In addition to these that might be setup-dependent, repoman also started warning about suspect (R)DEPEND and other issues with the ebuilds. Hopefully, even if repoman will probably become slower by piling up checks in it, it will be nice to make sure developers know what they are committing in.

This is particularly important because there are quite a few sub-optimal ebuilds in the tree already, and while it’s difficult to find and fix all of them, it’d be quite nice if we could avoid introducing new ones.

Unfortunately, I start to worry that it might not be as feasible as I hoped, because there is a huge fault in my idea that adding warning will keep people away from the mistakes: there is lack of documentation on these problems. As much as I wish I could count my blog as a source of documentation I know this is far from the truth, but I haven’t been able to start writing docs again yet because I was following this world rebuild closely, at least to understand how to follow my priorities. I know I’ll be working on quite a few things in the future, especially once the hospital is just a memory, and hopefully I’ll be able to write enough doc so that the warnings become clear enough that the whole tree will be safe for everybody to use under whichever circumstances.

My checklist when fixing packages

As I wrote I’ll be trying to write more documentation about what I do, rather than doing stuff. This is because I’m simply too tired, and I should rest and relax rather than stress myself.

So after playing some Lego Star Wars I’ve decided to take a look to what I need to document for PAM. There was an easy bug to fix so I decided to tackle that down; tackling that down I decided to look if I was missing anything and I noticed that sys-libs/pam could use a debug USE flag. Unfotunately, not only it does not build with debug USE flag enabled, but it also fails with it disabled because the configure file was written by someone who yet again fail at using AC_ARG_ENABLE.

But this was just one of the two things I noticed today and I wished to fix if I didn’t have to rest, so I decided to write here a small checklist I follow when I have to check or fix packages:

  • If the package is using autotools, I make sure they can be rebuilt with a simple autoreconf -i. Usually this fails when macros are present in the m4 directory (or something like that), or if it misses the gettext version specification for autopoint.
  • If the package supports out-of-sourcetree builds, I create an “enterprise” directory and build from there (usually it involves a ../configure). A lot of packages fail at this step because they assume that source directory and build directory are one and the same.
  • If the package uses assert() I make sure it works with it disabled (-DNDEBUG); this is usually nice to link to the debug USE flag to remove debugging code.
  • I check the resulting object files with cowstats (check Introducing cowstats for more information about this), and see if I can improve the situation with some trivial changes.
  • I check the resulting object files with missingstatic (another script in ruby-elf).
  • If the package uses automake I make sure the _LDFLAGS variables don’t contain libraries to link to (would break --as-needed).
  • I check for possible bundled libraries we don’t want to use.
  • I check for possible automagic dependencies that we don’t want in the ebuild.
  • I run a build with all the useful warnings enabled, and see if there is something that needs to be fixed.

Such a checklist, if done from start to end, may generate a fair amount of patches that have to be sent upstream. It usually requires to check them on their development repositories too so that the patches are not obsoleted already.

As you can guess by now, it’s not exactly the quickest of the tasks, and it depends a lot on the power of the development box I’m working on. Unfortunately using a faster remote box does not always help because, even if Emacs’s tramp is quite neat, it does not make it easy to access the sources for editing. And having the sources locally and mounting them remotely doesn’t resolve either, as the build would then stall on getting the sources.

My plans were to get either the YDL PowerStation or a dual quad-core Opteron system (yes I know it’s overkill, but I don’t want to have to upgrade system every three years). It wouldn’t have been that bad, I just needed to take a couple of jobs during summer and early fall, and I could afford them. Right now, though, the situation looks pretty bad. I’m not sure whether I can get a new job done before fall, and even though medical care in Italy is paid for by the government, there are a few expenses I had to make (like for instance an over-quota for my Internet connection to download the software to view my CAT scans while I was in the hospital — long story, I’ll write about that another day), and the visit next Tuesday is in private practice (so I’ll have to pay for it).

If you care about a package, try to apply these checks on it, and see if upstream can pick up some improvements :) Every little drop is an advantage!