The automake and libtool clash

I’ve been warned today by my friend lavish that there are packages, such as ettercap and tvtime that are failing because of a clash between libtool and older automake versions. Indeed, that is the case and I’m now on the case to resolve it.

The problem here is that libtool 2.4 requires that you use it with at least automake 1.9, so anything using 1.7 (ettercap) or 1.8 (tvtime) will fail with a number of aclocal errors. There are internal changes forcing this, it’s nothing new and nothing extraordinary. Unfortunately, autotools.eclass allows maintainers to require an older automake version. This has been extensively used up to now to avoid future-version compatibility with automake… but it’s now biting our collective asses with this.

The solution, obviously, is to make sure that ebuilds are tested against a more modern version of automake, and that is usually quite quick when using 1.8 or later, older versions might be more of a problem; turns out that I suggested to start working on this back in June 2009 — magic? ESP? Or simple knowing my tools by now?

I’m now going to work through the tree to take care of these issues, since they are quite important, but also relatively easy to fix. Plus they require autotools expertise and I’m positive that at last half the involved maintainers would be asking me to fix them anyway.

Now, the question is why didn’t my tinderbox catch this beforetime? The answer is easy: *it is not running*… between the problems and the gold f-up – and few to nobody interested in helping me – I haven’t ran the tinderbox for the past two weeks.

And yet this is the exact kind of breakage it is supposed to find. So please consider how much work I’ve been pouring in to protect us all from this stuff happening, and don’t blame me if I actually come asking for help from time to time to pay for the hardware — there are requests for specific CPU and memory hardware to upgrade the workstation on the page I listed above, if you feel like contributing directly at running it; I should add soon a request for two Caviar Black drives where most builds happen (the two Samsung SpinPoint I have start to report relocated sectors, they are at the end of their life here), or two RE4 for the operating system/work data (which is now filled to its limits).

Edit: since the system locked-up during the build, likely caused by the HDDs failing, I had to order the new ones… at a moment where I’m €1550 short to pay taxes, I had to spend another €190 for the new disks. Can you tell now why I’d wish that at least the rest of the developers would help me reducing the overhead of tinderbox by at least doing some due diligence in their packages, rather than complain if I ask for help to pay the bills&hardware of the tinderbox? — No, having the Foundation reimburse me of the expenses is not going to help me any further since I’d have to pay extra taxes over that.

Anyway, for now I think I’ll be limiting the tinderbox to two cores out of the eight I have available, and keep running it in the background, hopefully it will not cause the rest of my system (which is needed for my “daily” job — I say “daily” because I actually spend evenings and nights working on it) to hang or perform too badly.

Anyway, 6am, I haven’t slept, and I still have a number of things to do…

Don’t try autoconf 2.66 at home just yet!

I have to thank Arfrerver for making me notice this with the bug about Ruby 1.9 he reported.

The GNU project released autoconf 2.66 two days ago. Very few notable changes are present in it, just like a few were listed before, so I didn’t go out of my way to test it beforehand. My bad! Indeed there is one big nasty change with it for which I’d say to all of you to put off the update until I write it so. Hopefully it won’t get unmasked in Gentoo for a while either.

There are two main problems with this release; the first is due to the implementation of a stricter macro to ensure the parameters given to it is not variable over executions:

**** The macro AS_LITERAL_IF is slightly more conservative; text containing shell quotes are no longer treated as literals. Furthermore, a new macro, AS_LITERAL_WORD_IF, adds an additional level of checking that no whitespace occurs in literals.

well, whatever the idea about this was, it seems to have broken the AC_CHECK_SIZEOF macro: if you pass it [void*] as parameter, it’ll report it not being a literal (while it is) causing the following error:

flame@yamato test % cat configure.ac
AC_INIT([foo], [0])

AC_CHECK_SIZEOF([void*])

AC_OUTPUT

flame@yamato test % autoconf
configure.ac:3: error: AC_CHECK_SIZEOF: requires literal arguments
../../lib/autoconf/types.m4:765: AC_CHECK_SIZEOF is expanded from...
configure.ac:3: the top level
autom4te-2.66: /usr/bin/m4 failed with exit status: 1

This would be bad enough. But the nastier surprise I got when running autoreconf over the feng sources, the build system of which I wrote myself, and if I may say so, is very well engineered:

flame@yamato feng % autoreconf -fis
configure:6275: error: possibly undefined macro: AS_MESSAGE_LOG_FDdnl
      If this token and others are legitimate, please use m4_pattern_allow.
      See the Autoconf documentation.
autoreconf-2.66: /usr/bin/autoconf-2.66 failed with exit status: 1

The problem here is almost obvious, and it’s related to the dnl entry at end of the macro name; the dnl keyword is used as (advanced) comment delimiter in autoconf scripts, meaning “Discard up to New Line” and is often used to keep on multiple lines commands that should be kept togever, like is in many languages. A quick check at the configure files brings in this:

        as_fn_error $? "Package requirements (glib-2.0 >= 2.16 gthread-2.0) were not met:

$GLIB_PKG_ERRORS

Consider adjusting the PKG_CONFIG_PATH environment variable if you
installed software in a non-standard prefix.

Alternatively, you may set the environment variables GLIB_CFLAGS
and GLIB_LIBS to avoid the need to call pkg-config.
See the pkg-config man page for more details." "$LINENO" AS_MESSAGE_LOG_FDdnl

You can easily see that the problem here is with the pkg-config macros (pkg.m4). Funnily enough there is no change related to the errors reporting that is listed in the autoconf news file so I wasn’t expecting this. The problem is further down the path of pkg-config files but it’s not important to fully debug it right now, it’s actually quite easy to fix, in pkg-config itself, but here’s the catch.

Since the pkg.m4 macro file is way too often bundled with the upstream packaging, and its presence overrides the copy from the system, even fixing pkg-config will not fix all the software that carries outdated copies of the macro file.

This is almost the same problem with libtool 1 vs libtool 2 macro files with the difference that this is going to be much much more common. If you’re a package maintainer, you can do something already before this even hits the users: remove the pkg.m4 file during the src_prepare() phase; you’re already depending on pkg-config in the ebuild for it to work at build-time, and since we don’t split the macro file from the command itself, you can simply rely on its presence on the system.

In the mean time, I’m not sure if I want to start testing with it just yet or if we should be waiting for 2.67…

On the time taken to stable stuff

In my previous post about the possibility of me leaving and why, a few people commented on the “staleness” of the stable (and unstable, to some extent) trees in Gentoo. Now, I won’t argue that there are no problem; I actually said so myself a few months ago. But I’d like to clarify a few points related to the process of marking packages as stable.

First of all, we have to try to discern about two different types of staleness: single-package staleness versus systematic staleness; the latter case is what we had regarding Perl, and it’s much more complex than most users think. It wasn’t just a matter of making Perl itself to work, it also involved making sure that the packages using Perl worked properly. This is also not as easy done as it’s said: while perl-cleaner can take care of re-install the packages that link to or extend Perl, it does not take care of Perl-written scripts; even looking at the reverse dependencies doesn’t suffice as we’re still omitting system-set dependencies (and guess what? Perl is in the system set!).

And even if we were able to track down all the packages using Perl, directly or indirectly, in the tree, and I could get all of them passed through the Tinderbox (I did), we wouldn’t be too sure about its absolute solidity because half those scripts don’t have testsuites (see also this post by Ryan on the subject of tests). On the whole, though, we caught a few important failures, and we could asses that the tree was mostly ready, and so, finally, Perl 5.10 entered the tree, yai! I don’t think the road to stable is going to take much more time, by the way; I’d be temped to try soonish to use it for xine’s bugzilla (at least mirrored on this server first of course).

A similar issue happens with Ruby 1.9. We’re taking our time to get it unleashed even in unstable; right now it’s tightly masked. Why? We’ve been struggling with lots of trouble, that came then down to the Ruby-NG eclasses and that is one very important piece of the puzzle for us, as it allows to properly support multiple Ruby implementations without breaking the dependency tree or the general solidity of the system. It’s not perfect yet, and needs polishing, but for instance a couple of days ago I committed changes to the eclasses that ensures that scripts installed for a single implementation won’t break when a different one is selected (partly covers the problem described here but not entirely; to cover that properly we’re going to take a bit more time, I’m afraid, as we’re going to revisit the whole idea of selected Ruby implementation). And if you think that Ruby 1.9 is ready for prime-time right now, there’s a reality check waiting for you. Yes I rant about Ruby, but I think I also have a positive stance as I have patches (and a lot of those are merged upstream and even released).

You might have noticed my use of the first-person-plural pronoun (“we”). I’m going most likely to take a break from active Gentoo work, and try to reduce it for a while to the areas I’m interested in, trying not to feel too pressured about it. For instance I unsubscribed from the Gemcutter feed that tells me when Ruby packages are released so I don’t feel the urge to bump them. On the other hand, I don’t think it’s feasible for me to leave Gentoo, at least for what concerns Ruby and other things I work on. Worse comes to worse, I’ll get a frontend machine and use Fedora or something on that, with Gentoo as the backend server. Obviously if nothing can be fixed regarding the issue I brought up, I’m not going to stick for long, but I hope I got enough people to think about the problem that it can be solved — in Utopia at least.

So I have just shown you two systemic staleness problems in Gentoo (one which is partly solved, one that is actually caused by an external lack of stability that we are trying to resolve at the roots). What about the single-package staleness? Well there are many examples of that and the problems can range between very wide areas. People forget to ask stuff to be marked stable; developers might not think anybody needs that stuff stable anyway, packages might require specific hardware to be marked stable but no developer with such hardware can do that (think about the EntropyKey software that I maintain(ed) in the tree: you cannot say whether it works or not without having an hardware key yourself; I don’t know of any other Gentoo developer having them, so what would happen if I left?), or they might have complicated testing procedures that are difficult to reproduce.

On these matters, the amount of people working on the stabling process is not a binding factor; throwing more people at the problem is not going to solve it any sooner (by the way for this last phrase of mine, I’ll most likely be posting something in the next few days, again to make some points on why did I reach the bad point of snapping). Not unless you throw the right people at the right problem. The problem here is not really the stabling part, it might actually take very little time, the problem is that we have to document things, such as the testing procedures. Sometimes we have thorough testsuites, most of the time we don’t (in the case of Ruby, even when we have, they can be… tricky). I tried something, some time ago but it didn’t turn out what I was hoping for, at the end I actually stopped working on finishing that one because my half-easter egg, half-free culture community collaboration (alliterations…) crashed down in flames as the source I wanted to use, Jamendo, couldn’t get his own facts straight.

I don’t want for this post to go too deeply into the technical problems of testing, as this is better discussed separately, and most people interested in the topic I’m writing about might not be interested in the technical details. Let’s just say that I have seen a huge improvement in tests in the past few months. And further kudos to two teams who I know are documenting post-build testing procedures to indicate arch teams what to look at when testing their packages: Java and Emacs teams.

Now comes what might disappoint a few users, those users who think and asserts that the solution to staleness is the reckless commit of half-broken ebuilds, like Samba. I’m going to argue that the opposite is true (and I’m again borrowing a line out of NewsQuiz… I might have been listening too much to that program; my actual post style yesterday was probably deeply influenced by the newly-restarted Real Time with Bill Maher instead, but I digress).

First of all we have to agree on one point: staying a lot behind upstream sucks. Sucks for users and sucks for upstream as well. As Joost, from Sabayon, said to me earlier today (I’m following The Other Diego’s philosophy that today starts when I wake up, and ends when I fall asleep), upstream will be bothered if users won’t be testing their recent versions at all, and would rather stick to old, known-broken, already-fixed versions. Having been (and still being) on both side of the fences, upstream and downstream I can tell you that the best feeling is that when you can actually have distributions always using your latest, greatest code. This is, though, not always that simple, or feasible at all, because of upstream’s own actions, but again this is a topic for a different day.

Back to our reckless commits we go. Let’s take the example of Samba, since that’s what a commenter named, and something that, I think, is showing best what the trouble “Developer B” consists of. One of his justification is that the current stable Samba is vulnerable; I’m afraid to tell you all guys that it might well be true. I use a conditional here, because I didn’t have the time, nor the will, to track down whether it’s actually true or just speculation — it is, though, true that our Security team, also understaffed, hadn’t had time to deal with all the lower-level security issues in a few weeks; I’m pretty sure they’ll catch up soon. Now, if the problem was security, we should be striving to get the new ebuilds stabled soon, shouldn’t we? And to do that, you should be working actively to reduce the amount of bugs in those ebuilds.

Neither seem to be happening; the stable tracking bug reports actually that x86 is waiting, and last I checked with them, they were actually tempted to go with 3.3 still. This is quite understandable, as 3.4 is now fully split, but unpolished and without any plan on how to migrate from monolithic to split – at the time X.Org went through the splitting up, Donnie planned up months ahead, now of course that consisted of over a hundred packages, maybe even a few hundreds, and this is a much smaller scale, but the very fact that “Developer B” when asked about a migration plan replied me that it was too boring to do should set the mood straight on the issue. Not that it’s going to matter anyway, 3.5, and maybe even 3.4, is going to be monolithic again. Yes you’re going to get blocker, removal and so on again on unstable, oh joy!

And these bugs are assigned to a team that does not include our mysterious “Developer B” as he didn’t add himself to the alias, as I said before. Is he CCed on any of those? Nope; okay this might be QA’s (my!) fault, as I should have noticed earlier that he wasn’t on the alias and either reported that to devrel, or added him forcefully. Now, as most of these issues are important you’d expect that he’d be working on finishing this task, rather than going off and, oh, go on bumping another subsystem that he doesn’t even use. But he won’t care; why? Because he admitted many times he does not even use Samba! Nor Mono! Again, try to wrap your mind around this concept: how can he be improving the situation for theirs users, not being one himself? Not feeling the pain, nor sharing the gain?

Any kind of reckless bump and non-trivial change in a subsystem will require a long time to deal with, and the more you tend to stray away from the upstream-sanctioned behaviour, the more you’re going to suffer when it’s time to follow their lead. When you have to make big changes you compromise. One of these compromise in Ruby land has been that of trying to get the latest non-ported ebuild stabled if an old stable was present, before moving fully toward Ruby-NG. It’s going to have some growing pains, and yes, you have to use fully unstable (for the Ruby ebuilds, not for the whole tree, of course!) for it to work for now, but we are usually quite conservative on making sure that it works as intended.

When you make big changes, and you don’t plan, nor compromise, on how to deal with them on the long run, you’re just going to suffer, or you might just end up with an even more stale stable tree than you started with. On the other hand, it might be much, much worse if the stable tree gets broken badly, because packages that haven’t been planned ahead are moved there to remove the staleness. And by the way, this does not mean that I’m not saddened by the fact that to use Gentoo properly on things like vserver, xen or lxc guests, you’re basically forced to use some unstable packages, as OpenRC is the only one that works, and Baselayout 1 is definitely rotting in tree. Unfortunately I’m also quite sure that there are packages that are not fixed for that yet.

Anyway, to cut this post short so I can also get some sleep and do more useful work tomorrow, I’d like to point out to the concept of marginal cost that I was introduced to by a splendid book by Richard Dawkins on evolution. The marginal cost of stabling something depends on many factors; one of them is the amount of changes since the previous revision (which is why major version bumps, or total changes in the system’s packaging, make it harder to stable it), another is regressions from the previous version (dropping patches that no longer apply, but are still not fixed upstream increases the marginal cost tenfold). Our perfect setup is to always have a very low marginal cost for stabling, and that means not changing the ebuilds in any drastic way unless strictly needed.

But if we take the example of the recurrent laryngeal nerve that Dawkins uses in his book as a proof of evolution, we can easily see that we’re not in a biological evolution scenario, we can make drastic changes when needed to solve a situation that is blatantly out place. In such cases, though, we’re going to increase our marginal cost for stabling… and have to accept a longer stable delay. And that will bring us to various possible ways to tackle that, which are too technical for most of the people reading this in the first place, and that I’ll discuss in the next days instead.

LXC’s unpolished code

So I finally added lxc 0.6.5 to the tree for all of the Gentoo users to try it if they care; on the bright side, the lxc.mount option seem to finally work fine, and I also found the problem I complained about a few days ago. It is this latter problem that made me feel like writing this half-rant post.

Half-rant because I can see they are going to great extents to improve it from release to release, but still a rant because I think it should have been less chaotic to begin with. On the other hand, I could probably be more constructive if I went on to provide patches… I’ll probably do that in the future if I free myself, but if you follow my blog you know I’m quite swamped already between different things.

Starting from the 0.6.4 release, they dropped the “initialisation” task, and just let you run lxc-start with the config file. It was definitely a nice way to do it, as the init command wasn’t doing anything heavy that shouldn’t be done on first startup anyway. It was, though, a bit of a problem for scripts that used the old method as the simple lxc-start -n myguest (the init step wouldn’t be needed after a restart of the host) would mess up your system badly, as it would spawn a new container using / as the new root… overriding your own system bootup. Thankfully, Andrian quickly added a check that refused to proceed if a configuration file was not given. This does not save you from being able to explicitly mess your system up by using / as your new root, but at least avoids possible mistakes when using the old-style calls.

So what about the 0.6.5 problem? Well the problem came to be because 0.6.5 actually implements a nice feature (contributed by a non-core developer it seems): root pivoting. The idea is to drop access to the old root, so that the guest cannot in any way access the host’s filesystem unless given access to. It’s a very good idea, but there are two problems with it: it doesn’t really do it systematically, but rather with a “try and hope” approach, and it failed under certain conditions, saying that the original root is still busy (note here, since this happens within the cgroup’s mount namespace, it doesn’t matter to the rest of the system).

At the end, last night I was able to identify the problem: I had this line in the fstab file used by lxc itself:

none /tmp tmpfs size=200m 0 0

What’s wrong with it? The mountpoint. The fstab (and lxc.mount commands) are used without previous validation or handling, so this is not mounting the /tmp for the guest, but the /tmp for the host, within the guest’s mount namespace. The result is that /tmp gets mounted twice (once inherited by the base mount namespace, once within the guest’s namespace, but it’s only unmounted once (as the unmount list keeps each mount point exactly once). This is quite an obvious error on my part, I should have used /media/chroots/tinderbox/tmp as mountpoint, but LXC being unable to catch the mistake in mountpoint (at least warning about it) is a definite problem.

Another thing that makes me feel like LXC really needs some polishing is that you cannot just run the commands from the source directory: the build system uses autoconf and automake, but the authors explicitly backed away from libtool as it’s “Linux-only” (which really doesn’t say much about the usefulness of libtool in this case). given I’m not even sure whether the liblxc library is supposed to be ABI stable or not (they have never bumped the soname, but that is suspicious), it might really be better if they used libtool and learnt out to handle it. Also, it uses badly recursive Makefiles, it would probably take just a second to build if I remade the build system as a standard non-recursive autotools package, like udev.

Oh well, let’s hope for the future releases to improve polishing, bit by bit!

Tip of the day: if Samba 3.4 fails to work…

I fought with this today… if you are running Gentoo ~arch you probably noticed that the current Samba support is “definitely suboptimal” (to use the words of the current maintainer) and indeed it failed to work on me once again (third time; the first was a broken init script; the second was missing USE deps so I was quite upset). If you find yourself unable to log-in Samba, you need to consider two possible problems.

First problem: the Samba password system seems to have either changed or moved so you have to re-apply the password to your user (and re-add the user as well!). To do so you have to use the smbpasswd command. Unfortunately this will fail to work when the system has IPv6 configured. And here comes the next problem.

Samba is likely having trouble upstream to deal with IPv6; indeed it comes down to having the smbpasswd command trying to connect to 127.0.0.1 (IPv4), but the smbd daemon is only opening :: (IPv6), so it’ll fail to connect and won’t let you set your password. To fix this, you have to change the /etc/samba/smb.conf file, and make sure that the old IPv4 addresses are listened to explicitly. If you got static IPs this is pretty simple, but if you don’t, you’ll have a little more complex situation and you’ll be forced to restart samba each time the network interface changes IP, I’m afraid (I haven’t been able to test that yet).

[global]
interfaces = 127.0.0.1 wlan0 br0
bind interfaces only = yes

As you can see we’re asking for some explicit interfaces (and the localhost address) to be used for listening; since samba uses the IPv4 localhost address for the admin utilities you explicit that to make sure it listens to that. For some reason I cannot understand, when doing this explicitly, samba knows to open different sockets for both IPv4 and IPv6, otherwise it’ll open it for IPv6 only.

I’m not even going to fight with upstream about this, I’m tired and I’m tracking down a bug in Gtk#; a nasty one that crashes the app when using custom cell renderers, and I already fixed iSCSI Target for kernel 2.6.32 (as well as version-bumped it).

The hardware curse

DSCN2374

Those reading my blog or following me on identi.ca might remember I had some problems with my APC SmartUPS, with a used up battery that didn’t work properly. After replacement, I also had some troubles (which I’m not yet sure are worked out now to be honest); and at the end I settled for getting a new one to replace or work side-by-side it if it’ll work out properly. This is why in the photo above you can see two UPSes and one box (Yamato), even though there are actually three here (just one other turned on right now, though).

I’m not sure what has caused it, but since a little before I actually started activity as a self-employed developer and consultant, I’ve had huge hardware failures: two laptops, the very same week (my mother’s iBook and my MacBook Pro), the drum of the laser printer, the external iomega HD box (which was already a replacement for the same model failing last November), and lastly the (software) failure of the PCI soundcard.

Around the same time I also ended up needing some replacement for hardware that was now sub-optimal for my use (the Fast Ethernet switch was replaced by a Gigabit one because now there is Merrimac – my iMac – always turned on, and it makes heavy use of networking (especially with iSCSI), the harddisk, which there themselves replaced just last March, and helped out by one out of two disks in the external box, started being too small (thus why I got an extra one, a FreeAgent Xtreme running on eSATA, for one more terabyte of storage), my cordless phone required me to get another handset so that my mom’s call wouldn’t get to bother me, my cellphone (just recently) is being phased for a Nokia E75 so I could get a Business account (it was a Nokia E71 before), I got an HP all-in-one so that I had an ADF scanner to go in pair with the negative film scanner for archive purposes, and some more smaller things to go with that. I should also update, again, the router: after three years of good service, my 3Com starts to get the hit of age, and also starts to hit on limitations, including the very artificial limit of 32 devices listed in the WLAN MAC-addresses (and the fact that it doesn’t support IPv6).

Then there have been costs much more tied to work (not like the stuff I have mentioned is not part of my job anyway), like proprietary software licenses (Parallels Desktops, Visual Studio 2008, and soon Microsoft Office, Avira and Mac OS X Snow Leopard) and the smartcard reader. And of course rents for the new vserver (vanguard) and phone bills to pay. Given the amount of work my boxes do during the day, I’ll soon switch the power company over to me rather than my family and pay for that too, unless I decide to move the office on a real office (possibly one I can stay at any hour), and just keep one terminal at home or something like that (but then, what would I keep?). Oh and obviously there are a few more things like business cards and similar.

Now, all these are work expenses, so they are important up to a point; I actually get paid well enough to cover for these at the moment, even though I have now a quite funky wishlist which includes both leisure-related and work-related items (you’re free to help me with both, j/k). The problem is that I would have been much better off if I didn’t have all this mess. Especially considering that, as I have said before, I really wish I could get out of home soon.

But anyway, this is still work time!

A tinderbox is not enough — Reprise

I have already declared why a tinderbox is not enough but I think I should reprise this topic and write again why me, Patrick, and Ryan can’t find all the issues, even if we all put our efforts together.

The first problem is the sheer amount of combinations of packages: the different USE flags enabled, the different arches, the different order of merge, the different packages installed (or the way they are installed), all these differences combine to produce a way too high number of combinations to be able to test it in a lifetime. Of course we can probably find them most outstanding bugs quite quickly at a first pass with default USE flags; thanks to EAPI 2 and USE dependencies we can also have a decently clean track record of what needs to be enabled. On the other hand, it would be interesting to try disabling all the optional supports (but the ones that are strictly needed) and see how the ebuilds behave.

Then there is the problem with architectures, while in the past the architecture with the most keywords was x86, I”m not really sure if this is still true nowadays with the increment of amd64-based systems, I know I don’t usually keyword for ~x86 all the stuff I add to Portage, since I don’t run x86 anywhere. And while such packages probably can be compiled and used on x86, there are some that don’t, and there are issues that don’t apply to x86 at all but just other systems.

There are problems with packages that provide kernel modules, because they tend to break badly between one release and the other (myself I only help maintaining one of such packages nowadays, iscsitarget, and I’m usually good enough to get it to work properly a day or two after the new kernel is released – which means this weekend I’ll probably be doing another patch). I also had to blacklist a few packages that are only available for 2.4 kernels (why do we have a kernel 2.4 profile but none for 2.6? and why don’t we mask them on a profile level? no idea).

What about alternative packages? Collisions within packages create a bit of a problem when they are solved by blockers instead of allowing side-by-side install (and sometimes you don’t have any other choice than blocking one the other, see the two PAM providers). And there are still lots of packages that fail to merge (and thus in my case are re-added to the tinderbox build queue because they don’t result to be installed!) because of non-blocked collisions, sometimes for simply too-generic names.

Then there is the problem of overlays: my tinderbox can only check packages that are in the main tree (and not masked); all the packages in overlays gets ignored because they are, obviously, not added to the tinderbox. And the sheer amount of overlays make it likely impossible to deal with all of them. Let’s not even start thinking about the combinations that are added by different overlays added and the order in which they are added (which is one extra argument for not splitting our tree in multiple overlays!).

Are you not convinced yet? Well I really wish I had more convincing numbers, but I really don’t; I only know that the amount of work that my tinderbox effort – which is likely the less sophisticated one – is involving, is likely to be just a minuscule part of the effort needed to have a real quality control in Gentoo. And even though I can file, test, apply and close bugs, I cannot solve all the issues, because there are way too many variables in play.

Anyway, I’m taking a break now because my head is tremendously tired, and I’ve been filing bugs, working and scanning documents all day long. I could use some play time instead now…

About the new hospital

When this entry will be posted, I’ll most likely be hospitalised again, this time at least it will be a different hospital in a different city, in a different healthcare unit, with different doctors.

When I was hospitalised last year, I was at the main public hospital in Mestre at the time, the Umberto I (named after one of the Italian kings from our nation’s previous life – a bit of a controversial king too). During the time I was hospitalised, the work on Umberto I maintainance was quite reduced, and you could easily ell that by looking around. The reason for this was that a new hospital was to open in a matter of months, near Zelarino.

Indeed, the inauguration of the new hospital, that incidentally is just a couple of minutes by car from my house, happened just a couple of weeks before I was released, in September 2007.

During my whole hospitalisation, when something seemed substandard, or simply broken, the standard answer was “But you’d have to see with the new hospital!”. The ICU was still using CRTs? New hospital. The amount of computers and CRT monitors and laser printers turned on 247 amount to a good waste of energy? New hospital. The bells in the rooms don’t work correctly? New hospital. The six-beds room doesn’t have enough space to eat? New hospital. Air-conditioning doesn’t work – it’s either too high or too low? New hospital. And so on.

The features of the new hospital also started to be heard about: shops for the people coming to visit, an internal garden for patients to relax in even during Winter, rooms with maximum two beds, and in-room services, LCD TV in each room, and an entirely new Gastroenterology unit so that it won’t have to be shared with General Surgery.

But at the same time, gossips about the absolute failures of the new hospital arrived: very little space to move beds around in the rooms, CAT scan rooms designed without taking in consideration the movement of the little bed on the machine, the parking lot having to be paid for, not only by patients and visitors, but also from personnel, which also had, for what I was told at the time, to pay for the lockers in the locker room; and stuff like that. The opening, originarily expected for January 2008, was moved down the line.

In March, I think, of this year, I had to go to the ER for my migraine, and I asked about the new hospital, since it hadn’t opened yet. I was told that with the rain, happened a few days before, the ER in the new hospital was flooded; which of course is not a good thing, and had to be tended to before opening. The doctors didn’t espect it to be open before June.

Indeed, this summer the new hospital opened, and more problems started to show up. People having to go to the various ambulatories in the new hospital lost themselves in the randomly-numbered maze of corridors, and as soon as the sun started to shine on the hospital, another huge fail started to show itself: the glass panels used to cover the “sail” of the new construction (that is designed to impress), were mounted inside out; instead of keeping the heat to enter with the light, they created a huge greenhouse.

When I was hospitalised at the end of July, I went to the new hospital, as the old one has closed down entirely, the ER being the last part of it. The structure of the new hospital is indeed impressive, and the equipment is all new. I’m not sure why they didn’t just use the newly-bought equipment in the old hospital too but that’s beside the point. The ER is totally new, it looks like the one from Scrubs or House. Luckily the doctor in the ER (new one, too) was also good. I was impressed to find a doctor speaking fluently both English and French; it’s news as the last time I visited the old ER, they tried to speak English so badly that I was almost tempted to stand up and translate for them, if I wasn’t in pain.

But the outstanding impression didn’t last long. While the new “observation” unit is really cool, both in equipment and personnel, the Gastroenterology unit was still merged with General Surgery (with all the problems coming from that), because there are a lot of staff members on vacation, and the corridors are too long for the reduced number of nurses to keep up with the three stations that the two units have. This also means that the nurses are always running, they take a huge lot of time to answer to calls, and they cannot just give patients enough attention, like the few words they could spare before so that you didn’t feel totally isolated (I didn’t suffer from this problem, I had my cellphone, and Internet connection, but I can understand the feeling). Also, something probably happened between the nurses, because some that were a bit cold and detached last year were now warm and caring, and one that last year was very positive and funny was now bitchy.

But it’s not just a staff problem. The new hospital has less beds than the old one, so patients are released as soon as they are able to stand around by themselves. Which is nice if you don’t want to stay, but it creates a bit of a problem, if they come home and then they have to come back to the hospital. It also means that there are always needs for new beds for people that come from the ER.

And the rooms haven’t really improved that much. Yes there are no more six beds bedrooms, which were quite a mess to stay in when all six the beds were filled, and a curtain was added so that the two patients can have a little more privacy, but there are still structural problems. First in all, the TVs are still missing, although there are the screws and the aerial connections for them, on the wall and ceiling, as well as having headphones connections on the nurse bell remote. But it gets worse. The windows were replaced with a huge glass panel; a fixed glass panel; you cannot just open the window to get a bad smell out; which is far from uncommon in a Gastroenterology Unit, especially when you’re put in the same room as an old man who just had surgery and cannot walk to the bathroom. Also, instead of good old manual blinds, they wanted to make something better, and put on automatic electric blinds, which supposedly should have closed automatically if there was too much sun; I’ll get back on these later. The air conditioning, that was a problem on the old hospital, was even worse here, as it was mostly off; if you add the increased heat because of the glass panels being mounted wrong, you can guess it wasn’t that good. And again, there are “night lights” to move around in the rooms, which are LED-based; but not white or “almost white” LEDs as you might fine in the nearest Chinese Dollar-store; not even the “calming” blue LEDs that I used to use myself; green LEDs, bright green LEDs. And the staff hasn’t learn to use them yet, so sometimes they are not turned on during the night and some other times they are left on during the day — yes of course it is NOT set on a timer.

Speaking about the electrical devices, on the old hospital, every Saturday morning, there was a blackout. Either some weekly test or something that was turned on blew a fuse, but it was there for as long as I was in the hospital, which was a long enough time; even when I returned to the ER on a Saturday morning, months later, the blackout was there.

On the new hospital this changed. Instead of limiting themselves to Saturday morning, the blackout happened every other night I was there; and not just once, but a few times. One night with my sister, we counted five short blackouts in about an hour, between 9pm and 10pm. It wouldn’t have been that bad if: a) when power goes out, the emergency lights turn on; these are fluorescent lights, in front of a tinfoil mirror system to increase the brightness, and they are basically on the eyes of the patient on the door-side of the room. Guess which side of the room I was? And not just that, but also the automated blinds reset themselves, and resetting themselves, they end up doing a huge amount of noise (as they open and close) and decide to stay open afterward, even if they started up being closed. So the blinds were almost never closed during the night; if it wasn’t for the curtain, I would have been woken up by the light from outside.

And this is still not all. The only table in the room, where patients could possibly do something that requires a stable surface, like eat, is along the “window”; but as the beds are put in parallel with the glass for obvious reasons, it falls entirely on the side of the room where just one patient is. The bathroom, on the other hand, is on the side of the door, which means that, during day and night, you’ll have to see the other patient going in and out of the bathroom.

The bathroom itself isn’t bad, it’s quite spacious, and there are good services. Unfortunately having one inside the room also means that you cannot choose between the less dirty stall, especially when you have an old dirty man as roommate.

Outside of the rooms, there has been other problems. The hospital is huge, and this means you have to walk quite a bit to reach the right place. The last morning I spent there I had to take an ultrasound. They asked me if I wanted to go with the wheelchair or if I could walk, and I gladly decided to walk, I didn’t do that in a week, I wanted to, but then I didn’t expect I had to walk through almost half the hospital size in length to find the right ambulatory; we arrived fifteen minutes later than expected. And it’s the same building (there are multiple buildings in the hospital).

What about the shops they opened? Well there’s a restaurant, that has its usefulness considering the time visitors might spend in the hospital to tend to a sick relative; there’s a cellular phones shop, which wouldn’t be bad at all if it wasn’t that it sells only Vodafone, and Vodafone does not cover all the rooms of the hospital (I can understand that if you have to leave a relative at the hospital becahse he’s sick and he didn’t have a cellphone, you’d gladly buy one even overpriced, without leaving the hospital, or if you have a rechargeable contract, and you need to recharge); and then there’s a para-pharmacy. If you’re not used to the term, it’s a pharmacy-lookalike that can only sell non-precription medications; in this case this was an herboristic shop too; when the doctors don’t want to accept the existance of that at all. Now I can understand an herboristic shop in the hospital, myself; but why a para-pharmacy and not a full pharmacy? I expect more than half the people leaving the hospital with prescriptions for some meds, and it would be quite nice to be able to get them before going out.

I enquired about WiFi access, but it was not an option; someone said that it would be a problem with the machines around in the hospital. Someone even suggested I shouldn’t be using my cellphone at all, in the hospital (which makes the presence of a cellular phone store quite stupid then). Just to be clear, more than half of the staff – nurses and doctors – go around the hospital with their cellphone turned on, included the ICU and the “red area” of the ER; and for what concerns WiFi, the old DECT-based cordless phones used in the old hospital by the doctors to be reachable were now replaced with shiny new wireless VoIP phones (I didn’t get to see the producer); the coverage is provided by multiple Cisco access points all over the place.

All in all, there has been quite a few flaws with the new hospital too. And it became even worse because I was assigned a roommate under emergency rather than with a little consideration. A dirty (literally) old man, who spent the nights cursing (aloud), disallowing me to sleep enough, who had relatives visiting who yelled so high, unable to use an handle (every time he got to the bathroom during the night he slammed the door, multiple times, – being new, the door wouldn’t close easily just being slammed – rather than using the handle), and totally inconsiderate of the fact of being in an hospital with a roommate (the last full day I spent there was hell: his wife brought him a radio, with no headphones, he turned it on during visiting time, so I took off with my friend to the tables outside the unit – tremendously hot but at least I wasn’t on the bed – but then he started it back at mid afternoon till 9pm! And just one nurse came in to ask him to lower it at least a bit, at 6pm — for a moment she thought it was me, but then I shown her I had my earphones on. During the night he decided he had to change his clothes at 3am! And wanted to turn on the light for that! He also had to call the nurses twice to know how to turn it on; considering they explained it the night before, and it’s far from difficult anyway; at least the first nurse plainly said that it was not something to do at that hour, the second was quite worse).

Hospitals are never fun, but they can even be worse when there are such issues…

I’m running Gnome

As it turns out, I start to dislike the way the KDE project is proceeding, and I don’t refer to the Gentoo KDE project, but to the whole of KDE project.

I dislike the way KDE 4 is being developed, with a focus on eyecandy rather than on features. This is easily shown by the Oxygen style; not only it is taking up a amount of screen real estate for widgets that remind me of Keramik (and if you remember, one thing that made happy a huge amount of users was the switch from Keramik to Plastik as default style in KDE 3.3), but it’s also tremendously slow. And I’m sure of this, it’s not just an impression: as soon as I switch Qt to use Oxygen, it takes five seconds for Quassel to draw the list of buffers; once I use QtCurve, it takes just one second. I don’t know if this is because Enterprise is using XAA and not EXA, but it certainly doesn’t look like something that the default theme should do.

And no, I’m not expected to use a computer that has less than an year, with a hyper-strength gaming videocard to be able to use KDE.

But this is just one of the issues I have with KDE recently. There are some policies I really, really, dislike in KDE. The first is one I already mentioned quite often and it’s the move to CMake. The only “good” reason to move to CMake is to be able to build under Windows using Microsoft’s Visual C++ compiler; yet instead of just saying “we needed cmake because it’s necessary to build for Windows” I see so many devs saying “cmake is just better than everything else out there”. Bullshit.

The other policy that I dislike regards the way KDE is developed and released as a single, huge, monolithic thing. One of the things that made KDE difficult to package in Gentoo (and other source-based distributions) was the fact that by default the source has to be built in those huge amorphous packages. And if the autotools-based build system of KDE sucked so much, it was also because of that.

But even if we leave alone the way the releases are made, it’s just not possible for everything to fall into a single release cycle kind of thing. There are projects that are more mature and projects that are less. Forcing all of them in a single release cycle makes it difficult to provide timely bugfixes for the mature projects, and makes it impossible for the not-so-mature projects to be tested incrementally. The last straw I could bear to see because of this stupid way of releasing, was knowing that Konversation in KDE 4 will probably lose the IRC-over-SSL support because KSSL was removed from the base libraries.

And now KDE 4.1 is on the verge of release, and Kopete still segfaults once you connect to Jabber. Yet when I tried (multiple times) to gather information about the possible cause in #kopete (so I could at least try to debug it myself), I had no feedback at all; maybe it’s because I run Gentoo, although the same happens on (K)Ubuntu. Yeah not the kind of people I like to deal with.

I’m not saying that I think Gnome is perfect for policies and other things. I dislike the fact that it’s always more Linux-/Solaris-centric than cross-platform centric; but I think KDE4 was a set back for that too, for what I read. And their release method does look a lot more sane.

I started using Linux with KDE 2. I moved to Gnome once KDE 3 was being taken care of. I came back to KDE just a bit before 3.3 release. Now I’m going to try Gnome for a while, and if I like it, I’ll think more than twice before going back to KDE. Yeah sure I liked KDE 3 better than I liked Gnome before that, but it’s just not feasible that I have to switch DE every time they want to make a new release.

Besides, since I last used it, Gnome seems much more mature and nicer to deal with.

Distributions and interactions

If you have read the article I wrote about distribution-friendly projects at LWN (part 1 and part 2, part 3 is still to be published at the time I write this), I tried to list some of the problems that distributions face when working with upstream projects.

One interesting thing that I did oversee, and I think was worth an article on its own, or even a book on its own, finding the time, is how to handle interaction between projects. All the power-users expect that the developers don’t try to reinvent the wheel every time, or if they do they do it for a bloody good reason. Unfortunately this is far from true, but beside this fact, when sharing code between projects it’s quite common that mistakes end up propagating in quite a huge area.

A very good example of this is the current state of Gentoo’s ~arch tree. At the moment there are quite a few things that might lead to build-failure because different projects are interleaved:

  • GCC 4.3 changes will cause a lot of build failures in C++ software because the headers dependencies were cleaned up; similar changes happen every GCC minor release;
  • autoconf 2.62 started catching bad errors in configure scripts, especially related to variable names; similar changes happen every autoconf release;
  • libtool 2.2 stopped calling C++ and Fortran compilers check by default, so packages have to take care of what they do use;
  • curl 7.18 is now stricter in the way options are set in its easy interface, requiring boolean options to be explicitly provided;
  • Qt 4.4 is being split in multiple packages, and dependencies has thus to be fixed.

There are probably more problems thatn these, but these are probably the main ones. Unfortunately the solution to similar problems by a few projects is not to start a better symbiotic relationship between the various project, is to require the user to use a given version for their dependencies… which might be different from the version that the user is told to use for another project… or even worse, they import a copy of the library they use and use that.

Interestingly enough, it seems expat is really a good example of library imported all over, and less understandable than zlib (which is quite small on its own, although it has its share of security issues built-in). I’ve found a few before, and some of them are now fixed in the tree, but in the last two days I found at least four more. Two are in Python itself, returned from the death (yeah two of them, one in celementtree and one in pyexpat), one is – probably, not sure though – in Firefox, and thus in other Mozilla products, I suppose, and the last one is in xmlrpc-c, which has one of the worst build-systems I ever seen and thus makes it quite hard to fix the issue entirely.

Maybe one day we’ll have just one copy of expat in any system, and that would be shared by everybody.. maybe.