Best of each world

One of the thing I love of Gentoo is that we have a very loosely based identity (which is reflected in being called a meta-distribution). While it might sound bad when it comes to actually fixing stuff that only bothers one of the developers’ interests, it generally is good for one thing: it makes it possible to integrate techniques and ideas that come from all the other distributions.

A very little example in this regard is what I’ve been doing myself now for a while: packaging the Fedora backgrounds packages to be used with Gentoo. While we have our share of artists, and Ben’s Gentoo 10.0 backgrounds are gorgeous, we never shone for eyecandy in our main tree. The Fedora background packages, a new one of each for each release, are quite lovely, and I found myself missing them when I moved off the one Fedora installation I had on the laptop (I like the distribution, but it just doesn’t suit me).

After preparing ebuilds for both the latest Fedora 15 Alpha (Lovelock) and the old Fedora 10 (Solar) which was missing from my list, I also noted that we don’t currently have an easy way to install the Gentoo 10.0 backgrounds at all! Easy as pie, thanks to having looked at the Fedora ones for so long, I now have an ebuild in my overlay, so emerge gentoo10-backgrounds will do the right thing, and let you choose the background right away, both in GNOME (which I use) and KDE (that Alex tested).

Gentoo 10.0  background selection in GNOME

I’m probably going to add all the backgrounds ebuilds from my overlay to main tree soonish; I’m also wondering if we should actually increase the amount of themes available directly from our main tree. If you know of other distributions with cool background packages, let me know… I’m of the idea that the coolness of an operating system is too often tied to its appearance, unfortunately, so having Gentoo lack behind other distributions is not such a good thing.

CGROUPS woes

The cgroup functionality that the Linux kernel introduced a few versions ago, while originally being almost invisible, is proving itself having a quite wide range of interests, which in turn caused not few headaches to myself and other developers.

I originally looked into cgroups because of LXC and then I noticed it being used by Chromium, then libvirt (with its own bugs related to USB devices support). Right now the cgroup functionality is also used by the userland approach to task scheduling to replace the famous 200LOC kernel patch, and by the newest versions of the OpenVZ hypervisor.

While cgroup is a solid kernel technique, its interface doesn’t seem so much. The basic userland interface is accessible through a special pseudo-filesystem, just like the ones used for /sys and /proc. Unfortunately, the way to use this interface hasn’t really been documented decently, and that results in tons of problems; in my previous post regarding LXC I mistakenly inverted the cgroup-files I actually confused the way Ubuntu and Fedora mount cgroups; it is Fedora to use /sys/fs/cgroup as the base path for accessing cgroups, but as Lennart commented on the post itself, there’s a twist.

In practice there are two distinct interfaces to cgroups; one is through a single, all-mixed-in interface, that is accessed through the cgroup pseudo-filesystem when mounted without options; this is the one you can find mounted in /cgroup (also by the lxc init script in Gentoo) or /dev/cgroups. The other interface allows access (and thus limit) to one particular type of cgroup (such as memory, or cpuset), and have each hierarchy mounted at a different path. That second interface is the one that Lennart designed to be used by Fedora and that has been made “official” by the kernel developers in commit 676db4af043014e852f67ba0349dae0071bd11f3 (even though it is not really documented anywhere but in that commit).

Now as I said the lxc init script doesn’t follow that approach but rather it takes the opposite direction; this was not intended as a way to ditch the road taken by the kernel developers or by Fedora, but rather out of necessity: the commit above was added last summer, the Tinderbox has been running LXC for over an year at that point, and of course all the LXC work I did for Gentoo was originally based on the tinderbox itself. But since I did have a talk with Lennart and the new method is the future, I added to my TODO list, last month still, to actually look into making cgroups a supported piece of configuration in Gentoo.

And it came crashing down.

Between yesterday and this morning I actually found the time I needed to get to write an init script to mount the proper cgroup hierarchy the Fedora style. Interestingly enough, if you were to umount the hirarchy after mucking with it, you’re not going to mount it anymore, so there won’t be any “stop” for the script anyway. But that’s the least of my problems now. Once you do mount cgroups the way you’re supposed to, following the Fedora approach, LXC stops working.

I haven’t started looking into what the problem could be there; but it seems obvious that LXC doesn’t seem to take it very nicely when its single-access interface for cgroups is instead split in a number of different directories, each with its own little interface to use. And I can’t blame it much.

Unfortunately this is not the only obstacle LXC have to face now; beside the problem with actually shutting down a container (which only works partially and mostly out of sheer luck with my init system), the next version of OpenRC is going to drop support ofr auto-detecting LXC, both because identifying the cpuset in /proc is not going to work soon (it’s optional in kernel and considered deprecated) and because it wrongly identify the newest OpenVZ guests as LXC (since they also started using the same cgroups basics as LXC). These two problems mean that soon you’ll have to use some sort of lxc-gentoo script to set up an LXC guest, which will both configure a switch to shut the whole guest down, and configure OpenRC to accept it as an LXC guest manually.

Where does this leave us? Well, first of all, I’ll have to test if the current GIT master of LXC can cope with this kind of interface. If it doesn’t, I’ll have to talk with upstream to see that they would actually be supported so that LXC can be used with a Gentoo host, as well as a Fedora one, with the new cgroups interface (so that it can be made available to users for use with chromium and other software that might make good use of them). Then it would be time to focus on the Gentoo guests, so I’ll have to evaluate the contributed lxc-gentoo scripts that I know are on the Gentoo Wiki, for a start.

But let me write this again: don’t expect LXC to work nice for production use, now or anytime soon!

Enabling –as-needed, whose task is it?

A “fellow” Gentoo developer today told me that we shouldn’t try to get --as-needed working because it’s a task for upstream (he actually used the word “distributors” to intend that neither Gentoo nor any other vendor should do that, silly him)… this developer will go unnamed because I’ll also complain right away that he suggested I give up on Gentoo when I gave one particular reason (that I’ll repeat in a moment) for us to do that instead. Just so you know, if it was up to me, that particular developer right now would have his access revoked. Luckily for him, I have no such powers.

Anyway, let me try to put in proper writing why it should fall to Gentoo to enable that by default to protect our users.

The reason I gave above (it was a twitter exchange so I couldn’t articulate it completely), is that “Fedora does it already”. To be clear, both Fedora and Suse/Novell do it already. I’m quite sure Debian doesn’t do it, and I guess Ubuntu doesn’t do that either to keep in line with Debian. And the original hints we took to start with --as-needed came from AltLinux. This alone means that there is quite a bit of consensus out there that it should be a task for distributors to look at. And it should say a lot that the problems solved by --as-needed are marginal for binary distributions like the ones I named here; they all do that simply to reduce the time needed for their binary packages to rebuild, rather than to avoid breaking users’ systems!

But I’m the first person to say that the phrase “$OtherDistribution does it, why don’t you?” is bogus and actually can cause more headaches than it solves. Although most of the time, this is meant when $Otherdistribution is Ubuntu or a relative of theirs. I seriously think that we should take a few more hints from Fedora; not clone them, but they have very strong system-level developers working on their distributions. But that’s a different point altogether by now.

So what other reasons are there for us to provide --as-needed rather than upstream? Well, actually this is the wrong question; the one you should formulate is “Why is it not upstream’s task to use --as-needed?”. While I have been pushing --as-needed support in a few of my upstream packages before, I think by now that it’s not the correct solution. It all boils down to who knows better whether it’s safe to enable --as-needed or not. There are a few things you should assess before enabling --as-needed:

  • does the linker support --as-needed? it’s easier said than done; the linker might understand the flag, but supporting is it another story; there are older versions of ld still in active use that will crash when using it; other with a too greedy --as-needed that will drop libraries that are needed, and only recently the softer implementation was added; while upstream could check for a particualr ld version, what about backports?
  • do the libraries you link to, link to all their needed dependencies? one of the original problems with --as-needed when introduced to the tree was that you’d have to rebuild one of the dependencies because it relied on transitive linking, which --as-needed disallowed (especially in its original, un-softened form); how can a given package make sure that its dependencies are all fine before enabling --as-needed?
  • do the operating system at all support --as-needed? while Gentoo/FreeBSD uses modern binutils, and (most of) the libraries are built so that all the dependencies are linked in for --as-needed support, using it is simply not possible (or wasn’t possible at least…), because gcc will not allow for linking the threading libraries in for compatibility with pre-libmap.conf FreeBSD versions; this has changed recently for Gentoo since we only support more recent versions of the loader that don’t have that limitation; even more so, how can upstream know whether the compiler will have the fix already or not?

Of course you can “solve” most of the doubts by running runtime tests; but is that what upstreams should do? Running multiple tests from multiple packages require sharing the knowledge and risks for the tests to get out-of-sync one with the other; you have redundancy of work.. when instead the distributor can simply decide on whether using --as-needed is part of their priorities or not. It definitely is for Fedora, Suse, AltLinux… it should be for Gentoo as well, especially as a source-based distribution!

Of course, you can find some case-by-case where --as-needed will not work properly; PulseAudio is unfortunately one of those, and I haven’t had the time to debug binutils to see why the softer rules don’t work well in that case. But after years working on this idea, I’m very sure that it’s a very low percentage of stuff that fails to work properly with this, and we should not be taken hostage by a handful of packages out of over ten thousands!

But, when you don’t care about the users’ experience, when you’re just lazy or you think your packages are “special” and deserve to break any possible rule, you can afford yourself to ignore --as-needed. Is that the kind of developers Gentoo should have, though? I don’t think so!

Bye Fedora

I’m going to say goodbye to my current Fedora 12 laptop; yes the one for which I wrote that post about Fedora 10 at the time which I then updated for Fedora 11. This is not because the laptop broke down, but rather because I ended up getting my MacBook Pro fixed, and that is again my main laptop. While I did want to have a laptop running Linux to the side of the MBP running Mac OS X, I finally decided it’s pretty pointless for me.

There are multiple reasons for that, some have nothing to do with Fedora, but a few have. Marginally maybe, but they have. The first problem is, once again, the video card. While it’s not like it has been easy with Yamato’s new one I got to say that two and a half months later I’m definitely glad I got it: KMS with 2.6.32 (and GIT userland — need to check whether that’s still needed, but I guess so for a while still) works like a charm, I’m able to use compiz without a glitch, it’s perfectly stable. With the nVidia on-board card of that laptop, it’s a totally different story. The nvidia binary driver for that card is not (yet?) available for Fedora 12, and the nouveau driver is… useless. It’s not just a matter of lacking 3D acceleration, but it’s also totally broken for suspension, which worked fine at least with the proprietary driver instead.

But it goes beyond the hardware support; probably you have all heard about the thunderstorm around Fedora’s original decision to allow any user with console access to install new packages without root password. I actually think that for Fedora’s target, that’s a pretty good move: it limits itself to installing and upgrading signed packages which has thus limited security implications, and it’s just a default. For most users, having console access is as good as having root’s password so it shouldn’t really matter; for desktop usage, that’s pretty much true already. Smarter, more security-paranoid users can easily change that setting. At any rate, the thunderstorm (or crapfest if you prefer) got them so much they changed the default again; too bad. Unfortunately, it seems instead that I got a different problem: my PackageKit interface is totally broken and I cannot use it at all; I got to use yum to upgrade my box which is definitely not so nice.

At first I thought it had to be related to either the fact that I upgraded from F11 or to my use of RPM Fusion but turns out that the PackageKit interface is as much broken on a box that a customer of mine set up for me to install a toolchain chroot for them last week. I ended up using yum there as well; no clue what the problem is with that.

And since I upgraded to F12 I found another problem as well: I already ranted about the fact that I couldn’t get bluetooth dial-up to work with my Nokia phone, and I had to use the cable to work it out; following Adam’s suggestion I also got the JoikuSpot application that turns the phone into an (ad-hoc) hotspot to use it via WLAN without configuring anything. The latter approach is, unfortunately, valid only if you’ve got the power adapter of your phone at hand, since it lasts about an hour on my E75; and the other day (at my customer’s office) I didn’t have it available. I had, though, the cable, left in the bag since the last time I used it, unfortunately when I tried to connect with that, exactly like I did in F11, NetworkManager decided to fail. And of course neither DUN nor PAN seems to be available via bluetooth in F12 as well as F11.

So I’m considering whether I need that laptop or not: the MBP starts up in less than two seconds, thanks to the fact I always leave it in Suspend-to-RAM (and that’s faster than Google’s Chrome OS… I wonder why people seem to challenge the start-up time rather than fixing the suspension support, bah); the MBP lasts more than four hours on its battery; the MBP have a much sleeker design which makes it handier and I don’t have to go around with the clunky power supply (not only because the MBP’s is smaller, but also because I have my mom’s supply downstairs if I’m running low on battery); the MBP (with OSX at least) can connect properly, via bluetooth, to the phone and thus the Internet (most of the times at least). So at the end, I’m not going to use the Compaq for much.

I’ll create a Fedora 12 virtual machine on Yamato for testing my projects there, where most of the previous notes about stuff not working properly will be moot points.

*Post scriptum: I wrote the draft for this article a couple of days ago and in the mean time I set up the Fedora 12 virtual machine I noted in the last paragraph; it was that way, by trying out virtio, that I found the n-th qemu/kvm quirk that made me drop the “proper” qemu. Unfortunately with that new install, from scratch, not update, I found another share of problems.*

*The remote desktop support in GNOME is totally broken: I can see with tcpdump the request arriving, but no reply is given altogether. If you set an hostname in three parts (say, fedora12.qemu.local), Avahi will advertise fedora12.local instead. system-config-services is not installed by default, and the first time I installed it I had to reboot otherwise I would only get crashes. One default cron job causes SELinux to report invalid accesses to /var/lib … all in all, it seems to me like Fedora 11 was way more polished!*

RTSP clients’ special hell

This week, in Orvieto, Italy, there was OOoCon 2009 and the lscube team (also known as “the rest of the feng developer beside me”) was there to handle the live audio/video steaming.

During the preparations, Luca called me one morning, complaining that the new RTSP parser in feng (which I wrote almost single handedly) refused to play nice with the VLC version shipped with Ubuntu 9.04: the problem was tracked down to be in the parser for the Range header, in particular in the normal play time value parsing: the RFC states that I’m expecting a decimal value with a dot (.) as the separator, but VLC is sending a comma (,) which my parser is refusing.

Given Luca actually woke me up while I was in bed, it was a strange presence of mind that let me ask him which language (locale) was the system set in: Italian. Telling him to try using the C locale was enough to get VLC to comply with the protocol. The problem here is that the separators for decimal places and thousands are locale-dependent characters; while most programming languages obviously limit themselves at supporting the dot, and a lot of software likewise use that no matter what the locale is (for instance right now I have Transmission open and the download/upload stats use the dot, even though my system is configured in Italian). Funny that this problem came up during an OpenOffice event, given that’s definitely one of the most known software that actually rely (and sometimes messes up) with that difference.

To be precise, though, the problem here is not with VLC by itself: the problem is with the live555 (badly named media-plugins/live in Gentoo) library, which provides the generic RTSP code for VLC (and MPlayer). If you ever wrote software that dealt with float to string conversion you probably know that the standard printf()-like interface does not respect locale settings; but live555 is a C++ library and it probably uses string streams.

At any rate, the bug was known and fixed already in live555, which is what Gentoo already have, and the contributed bundled libraries of VLC have (for the Windows and OS X builds), so those three VLC instances are just fine, but the problem is still present in both the Debian and Ubuntu versions of the package which are quite outdated (as xtophe confirmed). Since the RFC does not have any conflicting use of the comma in that particular place, given the extension of the broken package (Ubuntu 9.10 also have the same problem), we decided for working it around inside the feng parser, and accepting the comma-separated decimal value instead.

From this situation, I also ended up comparing the various RTSP clients that we are trying to work with, and the results are quite mixed, which is somewhat worrisome to me:

  • latest VLC builds for proprietary operating systems work fine (Windows and OS X);
  • VLC as compiled in Gentoo also work fine, thanks Alexis!
  • VLC as packaged for Debian (and Ubuntu) uses a very old live555 library; the problem described here is now worked around, but I’m pretty sure it’s not the only one that we’re going to hit in the future, so it’s not a good thing that the Debian live555 packaging is so old;
  • VLC as packaged in Fedora fails in many different ways: it goes in a loop for about 15 minutes saying that it cannot identify the host’s IP address, then it finally seem to be able to get a clue, so it’s able to request the connection but… it starts dropping frames, saying that it cannot decode and stuff like that (I’m connected over gigabit lan);
  • Apple’s QuickTime X is somewhat strange; on Merrimac, since I used it to test the HTTP tunnel implementation it now only tries connecting to feng via HTTP rather than using RTSP; this works fine with the branch that implements it but fails badly in master obviously (and it doesn’t look like QuickTime gets the hint of changing to RTSP protocol); on the other hand it works fine on the laptop (that has never used the tunnel in the first place), where it uses RTSP properly;
  • again Apple’s QuickTime, this time on Windows, seems to be working fine.

I’m probably going to have to check the VLC/live packaging of other distributions to see how many workaround for broken stuff we might have to look out for. Which means more and more virtual machines, I’ll probably have to get one more hard drive by this pace (or I could probably replace one 320G drive with a 500G drive that I still have at home…). And I should try totem as well.

Definitely, RTSP clients are a hell of a thing to test.

Fedora, good and bad

In the past few days, since I’ve been spending time at my sister’s house, I’ve used as single system the laptop I bought a few months ago, with runs Fedora 11. This has been my first time, since I started working in Gentoo, that I had to work with just a laptop (if you exclude the hospitalisations) and especially the first time since I started using Gentoo that I had to work with just another Linux distribution.

Indeed, with the already noted exceptions, the last time I had to work with just a laptop was when Defiant (the box I had before Enterprise) died and I had to replace it (with Enterprise); at the time I was limited at working with the iBook G4 and, I think, Tiger (or Panther, I don’t remember to be honest). Luckily the work that I had to do at the time (translating Ian Sommerville’s Software Engineering 7th Edition to Italian) didn’t require me much more and it worked out quite fine with just that laptop.

But still, up to a few months ago all my laptops has been Apple and mostly using Mac OS X (even though I had Gentoo installed in both for a time). Now instead I have a laptop running Fedora; I have also to say that since I started using Gentoo, any other distribution has just been something to try out but never something used on a daily basis, up to now at least.

Now I have to say, I’m not really feeling extremely out of place in Fedora either. The system works mostly well although there are a few things that, I think, Gentoo gets better. The most obvious one is the gstreamer plugins: they are not split at all, they are a single package for each source tarball; this means that if you need, for instance, the plugin to play aac files, you also have to get the one that plays sid files, and that in turn requires you to install the libsidplay library. I guess the USE flag concept here works much better.

Almost all software that I need is one of the repositories, either the official ones or RPM fusion with the exception of the libdvdcss library that has to be found on ATrpms. Even Emacs 23 is now available on the updates, and that makes it much much nicer to use Fedora as development box for me: I cannot stand the graphical interface in Emacs 22.

Interestingly enough, Random mode works here with Rhythmbox, I have to check whether it was fixed upstream and thus fixed in Gentoo as well. It still does not seem to check the “skip when playing random” flag that iTunes add to the files, but I guess either I or someone else can fix that up one day (so that I wouldn’t get BBC Radio shows to play when I’m expecting music!). I also had the pleasure to see that connecting my iPod to the laptop, Rhythmbox is able to play the music from it like it was an external hard drive (using the tags without having to copy and rename the files), which has come very useful to play my music without having to use the earphones.

Connectivity hasn’t been an enormous issue, although it wasn’t a cakewalk either: at least in Fedora 11, NetworkManager does not support Bluetooth DUN (Dial-Up Networking) which means that I cannot use my phone over bluetooth (which would have allowed to leave the phone upstairs, where H3G network is reachable, and move the laptop downstairs), but I have to use the provided cable. This was of course after I updated enough packages so that they didn’t segfault on me while trying to configure the connection. By the way, I have to find out who “owns” the list of providers’ data: the Italian H3G options are only valid for the consumer-side, not the business-side that I use.

The one thing that actually upset me quite a bit, though, was related to the Mono development tools handling in Fedora: while the mono package comes with the mcs compiler, it doesn’t bring in all the development tools. And, at the same time, MonoDevelop does not depend on the mono-devel package with the remaining tools. I installed most of that stuff before coming here (because I didn’t want to use too much traffic from my almost-flatrate), but when I imported an external project into my main one (the vCard library I might have to hack on) it failed to rebuild the project because it was lacking the resource compiler. This really sounds strange to me!

Also, Pidgin here seems to crash much more than on Gentoo (and there goes the theory that Gentoo’s CFLAGS handling makes software crash). And I’m not even using OTR! And the keypad toggle button didn’t work by default, I had to use xbindkeys and a custom script calling synclient (upon Eva’s suggestions) to make it work, and I needed it badly because writing a long text minding the touchpad is quite hard; if anybody wish to send me something useful, order for me an Apple bluetooth keyboard, with US layout, and you’ll make me quite happy, and more productive as well!

All in all, it doesn’t look too bad, although it could use some extra polishing I guess; I’ll see how it goes with Fedora 12, once it’s released (given it’s now in Alpha it shouldn’t be too long). Unfortunately, the one thing that I was hoping for in 11 (the nouveau driver for nVidia cards) didn’t really work here…

The status of some deep roots

While there are quite a few packages that are know to be rotting in the tree, and thus are now being pruned away step by step, there are some more interesting facets in the status of Gentoo as a distribution nowadays.

While the more interesting and “experimental” areas seem to have enough people working on them (Ruby to a point, Python more or less, KDE 4, …), there are quite some deeper areas that are just left to rot as well, but cannot really be pruned away. This includes for instance Perl (for which we’re lagging behind a lot, mostly due to the fact that tove is left alone maintaining that huge piece of software), and SGML, which in turn includes all the DocBook support.

I’d like to focus a second on that latter part because I am partly involved in that; since I like using DocBook and I actually use the stylesheets to produce the online version of Autotools Mythbuster using the packages that are available in Portage. Now, when I wanted to make use of DocBook 5, the stylesheet for the namespaced version (very useful to write with emacs and nxml) weren’t available, so I added them, adding support for them to the build-docbook-catalog script. With time, I ended up maintaining the ebuilds for both versions of the stylesheets, and that hasn’t been always the cleanest thing given that upstream dropped the tests entirely in the newer versions (well, technically they are still there, but they don’t work, seems like they lack some extra stuff that is nowhere documented).

Now, I was quite good as I was with this; I just requested stable for the new ebuilds of the stylesheets (both variants) and I could have kept just doing that, but … yesterday I noticed that the list of examples in my guide had broken links, and after mistakenly opening a bug on the upstream tracker, I noticed that the bug is fixed already in the latest version. Which made me smell something: why nobody complained that the old stylesheets were broken? Looking at the list of bugs for the SGML team, you can see that lots of stuff was actually ignored for way too long a time. I tried cleaning up some stuff, duping bugs that were obviously the same, and fixing one in the b-d-c script, but this is one of the internal roots that is rotting, and we need help to save it.

For those interested in helping out, I have taken note of a few things that should probably be done with medium urgency:

  • make sure that all the DTDs are available in the latest release, and that they are still available upstream; I had to seed an old distfile today because upstream dropped it;
  • try to find a way to install the DocBook 5 schemas properly; right now the nxml-docbook5-schemas package install its own copy of the Relax-NG Compact file; on Fedora 11, there is a package that installs more data about DocBook 5, we should probably use the same original sources; the nxml-docbook5-schemas package could then either be merged in with that package or simply use the already-installed copy;
  • replace b-d-c, making it both more generic and using a framework that exists already (like eselect) instead of reinventing the wheel; the XML/DTD catalog can easily be used for more than just DocBook, while I know the Gentoo documentation team does not want for the Gentoo DTD to just be available as a package to install in the system (which would make it much easier to keep updated for the nxml schemas, but sigh), I would love to be able to make fsws available that way (once I’ll finish building the official schema for it and publish it, again more on that in the future);
  • find out how one should be testing the DocBook XSL stylesheets, so that we can run tests for them; it would have probably avoided the problem I had with Autotools Mythbuster in the past months;
  • package the stylesheets for Xalan and Saxon, which are different from the standard ones; b-d-c already has support for them to a point (although not having to explicit this kind of things in the b-d-c replacement is desirable), but I didn’t have reason to add them.

I don’t think I’ll have much time on working on them in the future, so user contributions are certainly welcome; if you do open any bug for these issue, please do CC me directly, since I don’t intend (yet) to add myself to the sgml alias.

The debugged debugger — part 2

So after my last night’s post I finally found the problem.

Actually, my mixing in the new system libbfd sidetracked me for about an hour, because the same symptoms were caused by an API change that I didn’t maintain correctly; after that I was able to use both system and internal libbfd with the same exact results.

I started adding printing checkpoints within both in the C# Bfd wrapper and in the C glue code that called into libbfd; it’s not really an easy thing, because, well, libbfd is probably one of the most over-engineered libraries I have ever seen. It really provides a lot information for a lot of different executable and binary formats, but to do that it increases tremendously the complexity; indeed that’s one of the reasons why gold is much faster than standard ld and why I preferred to write my own Ruby-Elf rather than binding the Bfd interface and build up from that (which could have been more complete under a few circumstances).

At any rate, I was lucky to have enough knowledge about ELF files to identify the issue at the end, most people who wouldn’t have seen ELF would have given up along the way. At the end I cut down the chase to noticing that it was trying to load the symbol table (.symtab, which includes internal local symbols — symbols marked static and thus not exported), and found none. Since it wouldn’t be able to find any symbol you’d be surprised if it were to actually match the nptl_version variable I talked about yesterday.

Going down on that line, it turned out that, albeit Mono splits debug symbols in a different file (.mdb), mdb does not support the feature that allows to do that with ELF files: our splitdebug. I actually was wondering if that was the problem from the start, but then I ruled it out because Fedora also uses the same feature, and there mono-debugger starts fine. I now replaced “work fine” with “starts fine” as you’ll see in a moment.

So if mdb does not support split debug files, how on earth can it work on Fedora? Well, the symbol it’s trying (and failing) to identify here is nptl_version from libpthread.so.. a quick check on the laptop told me that Fedora does not strip .symtab from libpthread.so! I was actually afraid that Fedora weren’t stripping .symtab at all, but then I started using the /usr/bin/mono object as a reference, and there you cannot find the .symtab section at all: Fedora has a special case for libpthread.

Now, the quick solution would be of course to just not strip libpthread.so of its .symtab either, so that mdb could start properly; the problem with that solution is that you wouldn’t be able to get backtrace or anything else out of the unmanaged code because it wouldn’t be loading that at all. On distributions that use split debug (Gentoo if requested, Fedora, and I have no idea what else), mono-debugger would start, if libpthread.so has .symtab, but it won’t work with any object that has .symtab on the debug file; which is our case. So I’ll try to find time to actually fix it in mono-debugger; because it is a bug in mono-debugger, or maybe a missing feature, not a problem with “roll your own optimization flags” as Miguel wanted it to be.

Maybe this will convince them that maybe they should try to give credit to other distributions as well? Who knows, I hope so because I see that at least for what concerns building and packaging, mono-debugger has a huge space for improvement, and I’d like to help out with that, if they allow me.

Post scriptum: I was also able to make mono-debugger use the system libedit, the result is less spectacular than using system libbfd, but it’s still nice:

flame@yamato mono-debugger-2.4.2 % qsize mono-debugger
dev-util/mono-debugger-2.4.2: 25 files, 21 non-files, 2021.133 KB
flame@yamato mono-debugger-2.4.2 % qsize mono-debugger
dev-util/mono-debugger-2.4.2: 25 files, 21 non-files, 1561.300 KB

Now if only I could get it to work …

PulseAudio and quirks

Seems like even my previous post about PulseAudio got one of the PA-bashers to think I’m a nuisance for their “cause”, whatever that is. For this reason I’d like to try to explain some of the quirks regarding PulseAudio, distributions, quirks and so on. Let’s call this a bit of a backstage analysis of what’s going on about Linux and audio, from somebody that has little vested interested in trying to roll the thing for PulseAudio.

The first problem to address relates to the comments that KDE people find PulseAudio a problem; I guess this has to be decomposed in a series of multiple problems: Lennart is a GTK/GNOME guy, so he obviously provided the original tools for GTK/GNOME. For a while I was interested in writing the equivalents for KDE (3) but I never had the time; now that I also moved to GNOME independently, I sincerely have no intention to write KDE tools for PA… but one has to wonder why nobody in KDE went out of his/her way to try doing this before. It’s not like it had to be part of KDE proper, it would have been okay to be an unofficial standalone application.

There is also another problem: most of the KDE guys who do see problems with PulseAudio are most likely using Phonon with xine-lib backend, configured to use the PulseAudio output plugin. Given I’m the one I wrote most of it originally, I can say that it sucks big time. Unfortunately I have had no time to work on that lately, I hope I might have that time in the future, but the two years I spent between hospitals seriously indebted me to the point I’m doing about 18 hours of work a day on average. For those who do want to use xine-lib with Pulse, I’d like to suggest the long route: set up the ALSA Pulse plugin, and then let xine just use ALSA.

There is of course another problem for KDE: while GNOME historically had no problem with force in dependencies that are Linux-specific or that work most of the time just on Linux (think about HAL adoption for instance), and relied on the actual vendors to do the eventual porting, KDE strives to work most of the time on multiple operating systems, including as of KDE 4 also Mac OS X and Windows. Now you might like this or not, but it’s their choice; and the problem is that while there is some kind of PulseAudio support for Windows, at least OSX is pretty badly shaped (also on my radar).

For what concerns distribution support, it is true that Lennart usually just care about Fedora; you have to accept this as part of the deal given RedHat is – as far as I know at least, Lennart feel free to correct me if I’m wrong – the one vendor paying his bills. Now of course we’d all love to support all the distributions at the same time, but the only way that’s possible is if multiple maintainers do coordinate; I’ve been doing my best to pass all the patches upstream when I’ve added them to Gentoo, and I see Colin Guthrie from Mandriva doing the same. One thing I can “blame” Lennart for (and I told this to him before, too!) is not creating a GIT branch with the cherry-picked patches he applies on the Fedora packaging for us to pick up… and the fact that he doesn’t like neither making releases or leaving access to others to do so.

To be honest, there is little different in this from what other projects do with distributions like Ubuntu when they are paid by Canonical. I think this is obvious, everybody looks at their little garden first. But this is not something that should concern us I guess. Gentoo has been quite out of the loop for what concerns PulseAudio, and I’m sorry, that was mostly my fault. I’m doing my best to let us update as soon as possible, but it’s not just that simple, as I already explained .

Then let me just say something about Lennart’s refusal to support system mode (which is available and advertised in Gentoo since PulseAudio entered the tree): I can’t blame him for that. First, his design for PulseAudio is based on providing something that works for the desktop use case. Something along the lines of Windows’s or OSX’s audio subsystems, neither of which provide anything akin to system mode. And indeed PulseAudio, by design, can handle the same situations, including multi-user setups with fast user switching. The fact that a system mode exists at all is due to the fact that I for one needed something like it on my setup, hacked it around for Gentoo, and then Lennart made my life easier implementing some extra bits on PulseAudio proper, but it was certainly not his idea.

What people complain about usually is the need for an X session (not strictly true, PulseAudio will start just fine in SSH — it would probably be possible to even fix it up so that it would tunnel audio just like you can tunnel X!), and the fact that audio does not continue to work when X exits (also not strictly true, if your audio player is running in screen it would be working just fine; it’s the fact that the media player crashes that makes your audio stop). Additionally people complain about the security problem of wanting to have all the processes to run under the same user, rather than allowing them to be on different users, like mpd.

Well, some complains are valid, other are not: it is true that PulseAudio does not work in multi-seat-multi-user environments, at least not with a single audio device, it is unfortunate and I don’t know if it’ll ever do work in that situation without a system mode. It is also true that running processes as different users for privileges separation does not work without system mode. But both these options are walking quite away from the the desktop design that PulseAudio is implementing; sure they are valid use cases, just like embedded systems (Palm Pre uses PulseAudio if you didn’t notice that before), but they are not what Lennart is interested in himself; at the same time I don’t think he’d be stopping anyone to improve the system mode support for those, as long as it wouldn’t require the desktop setup to make compromises.

Because the idea is, as usual in any software design, the one that you have to take compromises; Lennart wants the best experience for what concern desktop systems, and he compromises that system mode is not part of his plan, and it shouldn’t be hindering him. At the same time, while he does get upset when people ask for support about it, and he wrote why it’s not supported he hasn’t removed it (yet — if I was him, at this point I could have just removed it out of spite!). So colouring him as the master of evil does not seem the very best idea — and especially that makes me picture him in the part of Warren in the Trio, from Buffy’s season six.

Oh and a final note: it doesn’t have to surprise that Lennart and Fedora don’t care about running mpd and other services as different users, there are probably quite a few reasons for this. I cannot speak for Fedora, given I’m not involved in it, but my suppositions are that firstly the ALSA dmix plugin is somewhat scary from a security point of view (for me too) because it uses shared memory between processes from different users to do the mixing, and the second is that Fedora does a lot to use SElinux even on standard desktops. This is much tighter than separating privileges with different users since it forces the processes to behave as instructed. Unfortunately on Gentoo the SElinux support seems to have gone for good, at least to me.

Giving control

One of the issues that I’m trying to tackle with my tinderbox is that we have a varying degree of control among different ebuilds. This is one thing that I think is a major problem in Gentoo: while a lot of users are brought to us by the idea of being able to choose the flags to use for build the software, we are lately slowing down on that as an issue. Not only packages start to feature custom-cflags USE flags (or custom-cxxflags for the Qt packages), but we also strip, filter and randomly mangle flags.

Now, of course there are quite a few compiler flags that we don’t want users to enable, but as Mark has been repeating over and over and over is that if any flag breaks a package which is not intended to, then we should be tackling the issue on the compiler level, fixing that. And on the other hand, I wouldn’t care if users using silly flags get broken software. As for the idea that upstream will not support our users… well they shouldn’t, to begin with; problems should first filter through us; if we had enough people to work on the issues at least.

But even skipping over the flags there are other issues: USE flags, debug information, installation paths, slotting, alternative software and so on. As David said in a previous post there is no way we can test all situations beforehand, even if it’d be quite easier for our users. While binary distributions have a limited setup system which can be tested somewhat easily, there is an infinite amount of variation in Gentoo systems which makes it much more difficult to identify all the issues beforehand (and this is even without factoring in the Gentoo/Alt project, with Gentoo/FreeBSD and the prefix support!).

I can repeat at every post that the key for proper software is testing but this is not going to work when there are so many packages failing tests, with bugs open, and nobody looking at them. I am culprit of this too, there are quite a few packages that I maintain for which I don’t run all the tests properly and I have never finished my uif2iso testsuite which I started working on almost six months ago! We should really start to reject stable for packages failing tests, and bumping the priority of test failure for packages that are stable already.

Of course, it might well be that upstream doesn’t test enough pieces already, and that something in the environment will break their software; shit happens, we can track it down, and upstream can add further tests to make sure this does not happen again! I’m sure that lots of developers do like this idea. And reading Eric’s interview I guess that RedHat and Fedora are working on making use of automated tests more. Why shouldn’t we?

Okay this is one post I have written instead of sleeping, again, at least I have been watching Bill Maher .. love that talk show!