License auditing your code

I’ve already said that I’m working on a new device, whose firmware is Gentoo-based, and it goes easily said that it’s a partly-closed source software. That’s just the way it is: while probably the most part of the software within the device is Free Software, the business logic is behind closed doors. People are getting used to this, and I don’t think it’s entirely a bad thing. I mean, we’re giving back to Free Software in this context in many ways: Luca is working on libav and Aravis me I’m working on Linux drivers and together we’re working on Gentoo so the environment as a whole is gaining something.

Of course, when you are dealing with this kind of devices, you have to take care of auditing the licenses of the software you’re bundling up in the firmware, which is what I’m doing today (well, yesterday for you who read me I guess). It’s not my first time at this game, and as usual my starting point is an UML package diagram.

For those not used to UML, a package diagram is a decent way to identify who makes use of what; thanks to the way the UML is specified you can use two different “stereotypes”, called import and access which show you the execution/linking boundary quite clearly. By giving each project a package, and each library within that package a subpackage, you can easily see how things are connected.

So while working through it, with the two objectives to both reduce the amount of software we had to install (I talked about that yesterday), and to verify that we don’t distribute our closed-source code linked to GPL libraries, I started noticing a few bad things; from one side the license identification in Gentoo is shabby, but that’s nothing new, as I write this, I’m fixing a few ebuilds that report the wrong license information, for instance; from the other side, we have packages like PulseAudio that does not let you understand their licensing in a very clear way.

In the case of PulseAudio, the LICENSE file tells you this:

All PulseAudio source files are licensed under the GNU Lesser General Public License. (see file LGPL for details)

However, the server side has optional GPL dependencies. These include the libsamplerate and gdbm (core libraries), LIRC (lirc module), FFTW (equalizer module) and bluez (bluetooth proximity helper program) libraries, although others may also be included in the future. If PulseAudio is compiled with these optional components, this effectively downgrades the license of the server part to GPL (see the file GPL for details), exercising section 3 of the LGPL. In such circumstances, you should treat the client library (libpulse) of PulseAudio as being LGPL licensed and the server part (libpulsecore) as being GPL licensed. Since the PulseAudio daemon, tests, various utilities/helpers and the modules link to libpulsecore and/or the afore mentioned optional GPL dependencies they are of course also GPL licensed also in this scenario.

[…]

Is this clear to you? It should be: libpulse is the library you implement a PulseAudio client with, and libpulsecore used to be the convenience library only used by the server… but in PulseAudio’s history, this hasn’t been the case for quite a while, with the result that libpulse requires libpulsecore, and that means that if you link GDBM into PulseAudio’s core library … you now have GPL’d libpulse.

This is not the case for all the libraries it uses though: for instance BlueZ is not loaded into the core library so you still only have a PulseAudio daemon GPL’d and not the libraries, as intended.

What’s the catch about this? Well, turns out that Nokia knew about this for a while, since they did contribute a “simple” database in alternative to GDBM (GPL-2) and TDB (GPL-3), which is fine for most embedded usage, if not for desktops — which is exactly what I need here.. of course the ebuilds still force GDBM enabled. I’m fixing that as well.

I’m leaving for later fixing the license specification for other USE flags, it’s a time constraint for now.

I guess that every time I do this I understand how difficult license auditing is, and why people don’t like having multi-license projects or even multiple licenses doing almost the same thing. Oh well.

How down can you strip a Gentoo system?

In my previous post I noted I’m working on a project here in Los Angeles, of which I don’t wan to get much into the details of. On the other hand, what I’m going to tell you about this, for now, is that it’s a device and it’s running Gentoo as part of its firmware.

You can also guess, if you know us, that since both me and Luca are involved, there is some kind of multimedia work going on.

I came here to strip down as much as possible the production firmware so that the image could be as small as possible, and still allow all the features we need on the device. This is not a new task for me, as I’ve done my best to strip down my own router so that it would require the least amount of space as possible, and I’ve also done some embedded firmwares based on Gentoo before.

The first obstacle you have if you want to reduce the size of Gentoo is almost certainly the set of init scripts that come with OpenRC; for a number of reasons, the init scripts for things like loadkeys and hwclock are not installed by the package that install the commands (respectively sys-apps/kbd and sys-apps/util-linux) but rather are installed by OpenRC itself. They are also both enabled by default, which is okay for a general desktop system but fails badly on embedded systems, especially when they don’t even have a clock.

Then you have to deal with the insane amount of packages that form our base system; without going into the details of having man, tar and so on as part of the base untouchable system (which luckily is much easier to override with the new Portage 2.2, even if it insists bothering you about an overridden system set), and focusing on what you’re going to use to boot the system, OpenRC is currently requiring you a mixture of packages including coreutils, which (a single command that lies in its own package … for ESR’s sake, why was it not implemented within coreutils yet?), grep, findutils, gawk and sed (seriously, four packages for these? I mean I know they are more widely used than coreutils, as they are often used on non-Linux operating systems, but do they really deserve their own package, each of them?).

The most irritating part nowadays for me I guess is the psmisc vs procps battle: with the latter now maintained by Debian, I wonder why they haven’t merged the former yet. Given that they are implementing utilities for the same areas of the system… of course one could wonder why they are not all part of util-linux anyway — yes I know that Debian is also supporting GNU/kFreeBSD on their package. At any rate there is another consideration to be made: only the newer procps allows you to drop support for the ncurses library, earlier depended on it forcefully, and the same is still true for psmisc.

For what it’s worth, what I decided to do was to replace as much as possible with just busybox, including the troublesome psmisc, so that I could drop ncurses from our firmware altogether — interestingly enough, OpenRC depends explicitly on psmisc even though it is not bringing in most of the rest of its dependencies.

Public Service Announcement: if you’re writing an init script and you’re tempted to use which, please replace it with type -p use command -v instead … depending on an extra program when sh already has its built-in is bad, ‘mkay?

Edit: people made me notice that type -p is not in POSIX so it does not work in Dash. I’m afraid that my only trials to run OpenRC without bash before have used BusyBox, which supports it quite fine; the option to use command -v is valid though, thanks Tim.

Oh right, of course to replace coreutils usage with BusyBox you have to be able to drop it out of the dependency tree. Sounds easy, doesn’t it? Well, the problem is that even if you’re not deal with PulseAudio (which we are), which brings in eselect-esd, as of yesterday at least every package that could use Python would bring eselect-python in! Even when you were setting USE=-python.

Fortunately, after I bitched a bit about it to Luca, he made a change which at least solves the issue at hand until the stupid eclass is gone out of the tree. And yes I’m no longer even trying to consider it sane, the code is just so hairy and so encrypted that you can’t make heads or tails of it.

There are more issues with a project like this that I can discuss, outside of those part that are “trade secret” (or rather business logic), so expect to hear a bit more about it before it’s announced full out. And many of these have to do with how easy (or not) is to use Gentoo as a basis for devices’ firmwares.

Anybody hiring me for PAM?

This post might sound like a nasty plug, but I’m really doing this because it seems like the only solution up to this point.

In the past few days some trouble came up on the PAM side again. Let me try to put this into prospective: if nowadays I can actually find some use in knowing how PAM works, I joined the PAM team four years ago while working on Gentoo/FreeBSD because I needed configuration files migrated to a format that worked there as well as Linux. Since then, Azarah went missing and the whole of PAM was shoved on my back. Nowadays, I maintain the Linux-PAM package, a bunch of random PAM modules, and should oversee over the general PAM configuration in Gentoo.

Unfortunately, this requires also a lot of coordinating skils, and time to do the coordination: maintainers of other PAM modules, and maintainers of packages that use PAM themselves, should talk with me about the default configurations and the like; instead I’m usually reactive on that matter. And that is, as you might guess, not the best of the experience, nor the easiest of the tasks.

I have written before about the need of a new pambase and this is now obvious to actually implement proper support for multiple authentication methods like Kerberos, LDAP, PKCS#11, YubiKey, … I have a few ideas on how to solve this, namely changing the current situation with a few predefined, hidden chains (.gentoo-session-minimal, .gentoo-session-console .gentoo-session-graphical) wrapped on the system-* series of chains, all generated with M4 rather than the current C preprocessor (that lacks any kind of arithmetic capabilities).

But even more than fixing pambase, there is the need to review the packages that use PAM. A few days ago, Samuli complained to me that ConsoleKit was not being executed properly on login(1) — turned out the problem was that /etc/pam.d/login was not calling back into system-local-login as it was expected to. Root cause was that the modified PAM chain file was replaced with the previous ones (which didn’t use that chain) after the major bump of sys-apps/shadow when it was picked up by Debian. Dated 24 Feb 2008. Over two and a half years ago.

While I cannot get rid of the fault of missing the revert; why did I miss it? Simple enough: Portage’s confmem feature never told me that /etc/pam.d/login was changed from the one I had before. It assumed that my local version was a modified one and thus accepted that one as the good one.

Now this makes the second revision bump and second stable request that I have to take care of to fix PAM-connected trouble; the previous one, back in July (for the bump) and last month (for stable) related to the chpasswd chain that had been broken for, well, almost the same time as this one.

In fixing another ConsoleKit problem, bug #342345 I found that the GDM and KDM chains are not compatible with pambase, and they both need more fine-grained control over the sessions (console and graphical sessions have different needs, in the latter case we have to skip motd/mail/lastlog modules).

A quick check around on the tinderbox told me that there are a number of PAM chain files that should be cleaned up, reduced, optimised and so on. And that the number of files there does not correspond with the number of files installed.

Basically, what we need now is an audit of all the PAM-using packages in the tree, beside the improvement of pambase as stated above. We’re talking of a month or two worth of work. Not something I’d do myself in spare time, not something I can do as it is during work time. It’s not simply a matter of writing what I did for the original pambase; it’s a work more in line with the Ruby NG situation, and that one we haven’t completed just yet with three people working on it, including a bunch of work time thrown in by me for a few jobs I took during the year.

So here’s the catch: nobody has helped me with PAM in years, and while Constanze is being ascended to developer status, I know she’s also pretty busy with her thesis so I cannot easily ask her to commit enough time to lift me from enough work. This means that we can either keep the current status-quo of just band-aiding through enough troubles so that we can keep it running, or somebody got to help me either with work or with funding. As I said, I’m already losing money with the tinderbox and I don’t want to lose time, sleep and (possible) money on working on something as ungrateful as PAM.

Don’t get me wrong, I’m not asking for donations here; I’m asking to be paid to do a job, and that job is the auditing and review of (a part of) the PAM-using ebuilds. If you’re using Gentoo (and PAM) in production, you might be interested in hiring me to get this out of the way. I don’t even have much to pretend: €1500/month, one to three months time (depending on the deepness of the work you’d want to fund), you can provide the agreement details and give me a list of priority programs to work on (those that you use in your organisation). On the same terms, I’m willing to help you package new software that is not currently in portage in the spare time.

Let me know by mail if you’re interested. Extra points if you use GPG-encrypted email because that stuff doesn’t get sent to spam.

Linux Containers on Gentoo, Redux

I’ve got a few further requests for Linux Containers support in Gentoo, one of which came from Alex, so I’ve decided to give it a try to update the current status of the stack, and give a decent description of what the remaining problems are.

First of all, the actual software: I’m still not fond of the way upstream is dealing with the lxc package itself; the build system is still trying to be smart and happening to be stupid, related to the LXC library. There are still very little failsafes, and there isn’t really enough space to manage LXC with the default tools as they are. While libvirt should support LXC just fine, I haven’t found the time to try it again and see if it works; I’m afraid it might only work if you use the forced setup that RedHat uses for LXC… but again I cannot say much until I find time to try it out and tweak it where needed.

*A note, as I stated before a lot of the documentation and tutorials regarding libvirt only apply to RedHat or Fedora. I can’t blame them for that, they do the work, they do the walk, but often it means that we have to adapt them or at least find a way to provide them with the pieces they expect in the right place. It requires a lot of my time to do that.*

I’ve finally added my “custom” init script to the ebuild, with version 0.7.2; it might change further with or without revision bump as I fix bugs reported; it should mostly auto-configure, the only requirement it has is to symlink to lxc.container to start the container defined in /etc/lxc/container.conf; it auto-detects the root path (so it won’t force you a particular filesystem layout), and works with both 32- and 64-bit containers transparently, so long there is a /sbin/init command (which I could have to change for systemd-based distributions at some point). What I now reason it lacks is support for detecting the network interface it uses and require that started up; I can add that at some point, in the mean time use /etc/conf.d/lxc.container and then add rc_need="net.yourif".

For what concerns networking, last I checked with lxc 0.7.1 userspace, and kernel 2.6.34, the macvlan system still isolated the host from the guests, which might be what you want but it’s definitely not what I care for. I’m guessing this might actually be by design; at any rate, even though technically slower, I find myself quite comfortable with using a Linux-based bridge as main interface, and bridge together the Virtual Ethernet device of the guest with the physical interface(s) of the host. This also works fine with libvirt/KVM, so it’s not a bad decision in my opinion. I just added 0.7.2 but I can’t see how that makes a difference, as macvlan is handled in kernel.

Thankfully, doing so with Gentoo’s networking system (which Roy wanted to deprecate, tsk!) is piece of cake: open /etc/conf.d/net, rename config_eth0 with config_br0, then add config_eth0="null" and bridge_br0="eth0".. exec ln -s net.lo /etc/init.d/net.br0, and use that for bringing the network up. Then on the LXC configuration side you got

lxc.network.type = veth
lxc.network.link = br0

and you’re all set. As I said, piece of cake.

Slightly more difficult is to properly handle the TTY devices; some people prefer to make use of the Linux virtual terminals to handle LXC containers; I sincerely don’t want it to mess with my real virtual terminals, I prefer using the lxc-console command to access it without networking. Especially since it messes up a lot if you are using KMS with the Radeon driver (which is what I’ve been doing for the past year or so).

For this to work out, I noted two things though: the first is that simply using the cgroup access control lists on the devices don’t help out that much (I actually haven’t tried to set them up properly just yet); on the other hand, LXC can create “pseudo-ttys” that can be used with lxc-console; the default number (9) does not work all that well, because the Gentoo init system set up twelve virtual terminals by default. So my solution is to use my custom static device tarball and the following snippet in the configuration:

lxc.tty = 12
lxc.pts = 128

This ensures that the TTY devices are all properly set up, so that they don’t mess with your virtual terminals, and lxc-console works like a charm in this configuration.

Now, the sad part: OpenRC is not yet stable, and I haven’t fixed yet the NFS bug I found (you stop it into the container and the host’s NFS exports are destroyed.. bad script, bad script!). On the other hand, I’m not doing LXC daily any longer for the simplest reason: the tinderbox is set up as I wish already, for the most part, so I have little to no incentive to work more on this; the good news is, I’m up for hire as I said for what concerns Ruby. So if you really want to use LXC in production and want me to improve whatever area Gentoo-related to it, including libvirt, you can just contact me.

Beside that, everything should be in place. Have fun working with LXC!

Really want Ruby 1.9 generally available? Read on.

Gentoo currently does not offer Ruby 1.9 available to users directly; there are a number of reasons for that, and can be summed up in what Alex described as “not pulling a Python 3 on our users”. Right now, there are near to no packages that need Ruby 1.9, and a lot that does not even work with it. While a minority nowadays, a few won’t even work if it’s installed together with 1.8, let alone configured as primary provider for Ruby.

Me, Alex and Hans have been working for a long time to find a solution, and since last year the definite solution seems to be Ruby NG which I originally started in May 2009 after having trouble with keeping this very blog alive on the previous vserver — which nowadays only hosts the xine bugzilla .

The road has been still uphill from there, as the three pages of posts tagged with RubyNG on this blog can document; trouble with the ideas and implementations, compatibility problems, a huge web of dependencies between packages, various fixes, all of it makes the road to Ruby 1.9 quite difficult for us packagers. At the same time, we’ve been doing our best to ensure that what the users are given with proper software, of good quality. Maybe it’s because I’m deeply involved with QA, maybe it is because I’m not writing production software daily, but I still think that we shouldn’t be providing with half-assed software easily, just for the sake of it.

That means that most of the time we either don’t add support for Ruby 1.9, or we go deeply into fixing the underlying issues to make sure that the software will work upstream, and not just in Gentoo (as otherwise there could be nasty surprises, like some I got, where an application works perfectly fine locally, where software is installed through Portage, and fails on Heroku that uses plain Rubygems). You can tell how that can be a PITA by looking at my github page — it lists mostly Ruby packages that I had to “fork” (branch, actually) to get the fixes in; mostly they have been merged upstream, sometimes they are dead in the water though.

All of this makes the situation quite complex; while I sort-of enjoy working with Ruby and these things, I also noted that it takes a very long time to get all the dependency web tested and fixed… and it’s the sort of time that, in my personal free time, I just don’t have. I have been packaging (and thus testing and fixing) a few packages that I triaged for a few job tasks, and some that I’m still using, using the paid work time, but that can’t cut it to work for every package out there. I guess the same thing goes on for Alex, Hans and Gordon.

What’s the bottom-line? Well, Hans in particular has been doing a huge work to port the ebuilds from the old gems.eclass to ruby-fakegem.eclass so that they can be installed when Ruby 1.9 is present without messing that up, even though they wouldn’t work with it. This makes the day that we can get it unmasked much nearer. But there are quite a few cases where we can’t just drop the old version so easily, and it mostly relates to non-gem bindings and the usage of Ruby as a scripting engine (rather than adding support for a library to Ruby itself). And this is without counting further issues like bundler not working altogether too well because it lacks dependency information, or getting Rubygems to refuse messing with the Portage-installed gems altogether (that is now much more feasible than before, since we no longer use the gem command from within Portage to install the stuff).

So what can you do to get this sooner? You can help out by making sure packages work with Ruby 1.9; when they have been positively tested not to work on that version, they are usually marked as so in the ebuild itself; for my part, I always note the problems with an Unicode right-pointed arrow, so running a fgrep command on the tree for “ruby19 →” should give you a very good idea of how many problems (and how many different problems there are out there).

You have no idea where to start with this thing? There is another option: hire me. Well, I would have liked to say “hire us”, but it turns out at least both Alex and Hans are not looking to be hired for this, while a project of mine is delivered this week and then I have some extra time for the next few months. I wouldn’t mind being paid to work full-time on getting Ruby 1.9-ready packages in the tree. I’m a registered freelancer in Italy so I have an European VAT ID and I can make proper invoices, so it’s going to be all clear in the books. If you’re interested you can contact me to discuss pricing and amount of work you’re looking for.

Just please, stop harassing the team because we’re not as fast as you’d like us to be… we’re already doing a hell of a job in a hell of a hurry!

Who does the anti-corporatism feeling serve?

I have, as a Free Software developer and enthusiast, a particular dislike for the anti-corporate websites, and the general anti-corporate feeling that seems to transpire from some of the communities that form around so-called “Free Software Advocates”. You probably know that already if you read me frequently.

In the past few days I have been again in open contrast with those trying to spread “hyperboles” which I’d sincerely call “sensationalistic name-calling”. Similarly to another point this started with one statement by Carlo Piana, who asked to stop calling “piracy” what actually is unauthorised copy. I do agree with his rational that it shouldn’t be called that way, but I’m a pragmatic, I live in this world, and like it or not, the word “piracy” as synonym for “unauthorised copy” is an unfortunate reality. Given that, you have two choice:

  • keep trying to get people to use the “right term” ever and ever — the so-called GNU/Linux method;
  • use their own weapons against them and (as I suggested) call piracy the disregard for copyleft licenses like the GNU GPL (note my use of words here: copyleft licenses; disregarding MIT and BSD is definitely much harder and yet they are Free Software licenses).

As I said I’m a pragmatic so it’s nothing new that I’d go with the second choice. But too many people either still think they can change the world with negative activism, or at least they pretend to, and suggested to call everything proprietary as piracy…. facepalm moment gals and guys.

I still think that most, if not all, of the people involved in anti-corporatism who pretend to care for Free Software, have no idea of what kind of effort is needed to create and maintain Free Software. Sure they might not want to be paid to do what they do, and they might have a different kind of job, so that they can do their job without “dirtying their hands” with proprietary software and proprietary vendors, but most of us, write software for a living, and usually the money come not from writing just pure Free Software — you rather have to compromise.

This does not mean that there is no business case for Free Software; we do know that a number of companies out there do Free Software mainly and can make money and pay developers to do their work, but they don’t make enough money to pay all of the people out there without at least partly compromising, leaving part of their business logic out of Free Software. Nokia and Intel, Sun before and Oracle both before and now, Canonical and RedHat, SuSE and even Apple… they all do lots of contributions to Free Software and yet their main business varies widely, just in a couple of case being mainly Free Software! Google, Yahoo and Facebook also work on Free Software, publish new, pay for maintenance of already present… yet they are not even software houses mostly (or originally).

If Free Software would require people not to be employed by companies producing any kind of proprietary software, the number of developers would be much, much reduced. Not everyone lives alone, many have a family to maintain, some have further complications, most don’t live like a hippy like Richard Stallman seems happy to. So what’s the solution? A few people, including the FSF last I checked, insist that if Free Software won’t pay for your living you can get another job, or settle for a lower wage.. but again, that is not always possible!

Do these activists put their money where their mouth is? I sincerely doubt so, as they most likely have no idea of how people sustain themselves in this environment while still keeping working on Free Software. I’ll try to give you myself as an example, but I’m sure there are situations that are more complex than mine (and quite a few that are easier, but that’s beside the point).

I don’t have to pay a rent, I’m lucky, but I’m still not working for myself alone, as I live with my mother and she’s not working. I have bills to pay each months, unhelped, that comprise phones, Internet and power, all three of which are needed for my Free Software work, as well as my “daily job” and my general living. I have obviously to buy food and general home supplies, and at the same time I have hardware to maintain, again for all the three cases. I have had a few health troubles, and I still have to both keep myself in check and be ready in case something else happens to me. I could do without entertainment expenses, but that would most likely burn myself out so I count those as an actual need as well.

In all of this, how much of the money I get is derived directly from Free Software? I’ll be honest: in the past five years, donations would probably have covered three or four months of basic need, without any saving. And mostly, that is covered by a handful of regular contributors. And before you tell me I should feel ashamed for having said this, I wish to say that I’m still very thankful to everybody who ever sent me something, be it a five euros donation, a flattr click, a book, a hardware component, or a more consistent money donation. Thank you all! Those are the things that let me keep doing what I do, as I feel it’s important for somebody.

I have written a few articles for LWN, but even that only covered a part of what I needed; the main reason is that being a non-native speaker, the time I need to write a proper article is disproportionate, again this is not to say that LWN does not pay properly – they actually pay nicely – it’s my own trouble not to be able to make a proper living from that. I actually tried finding a magazine in Italy that I could be paid to write for, getting rid of the language barrier, but the only one who ever published something (and the first article was an unpaid try) was the Italian edition of Linux Journal that has stopped publishing a couple of months later. Oh and by the way, this kind of work is also considered “proprietary work” as articles, and most books, are as far as I know not usually licensed under Creative Commons or otherwise Free licenses.

So if my pure Free Software work is not paying for bills or anything, nor my writing about it is, what am I to do? I considered for a while getting a job at the nearest Mediaworld (the Italian name for the German chain Mediamarkt), selling consumer electronics. I could do that, but then I wouldn’t probably be willing to contribute to Free Software in my spare time. What I actually do instead is, I work for companies that either make proprietary software (web software, firmware, or whatever else) or that commercialise Free Software (sorta, that’s the case for LScube for the most part). When I do, though, it often ends up with me working at least on the side for Gentoo, or Free Software in general.

I have already described my method a few months ago, I would like to say that a lot of my work on Ruby ebuilds in Portage has been done on paid time for some of my work, and the presence of gdbserver in the tree is due to a customer of mine having migrated to a Gentoo-based build system (to replace buildroot), and gdbserver was to be loaded in their firmware. A lot of the documentation I wrote also is related to that, as is my maintaining of Amazon EC2 software, …

And before this can be mistaken.. I have received more than a few job offers to do Free Software work. Most I had to turn down, either because they required me to go too much out of my way, or because of bad timing (I’m even currently in the mid of something). I also turned down Google, repeatedly, because I have no intention to ever come to USA because of my health trouble. The best offer I had was from a very well known Rails-based hosting company, I was actually very interested in the position and would have accepted even a lower wage than what I was offered, especially because Gentoo was well part of my responsibilities, but they never followed through; twice.

So anyway, what has all of this to do with the original statement, and my problem with anti-corporatism? Well, as I said most of my customers are using Free Software for developing appliances and software whose business logic is still proprietary. It’s better than nothing in the fact that they are still giving me money to keep doing what I’ve done in the past five years and counting. But at the same time, they are wary about Free Software, if they were to (like a few already do) think that Free Software is either too amateurish, or is trying to undermine their very existence entirely, they might decide that their money should not be spent on furthering those ideas.

And nothing is more dangerous than that, because if there is something that Free Software in general needs more, is competent people being paid to work mainly on Free Software. And the money often is in the hands of those companies that you’re scaring away with your “Fight da man” attitude; the same companies that Microsoft did their best to spread FUD to, regarding Linux and the Free Software and Open Source movements. I’d be surprised if there is nobody in Microsoft’s offices right now that is gloating, to see how the so-called “Advocates” are doing their best to isolate Free Software from the money it needs.

Ah yes and I was forgetting to say: if you don’t think that money is important for Free Software… take a hitch and don’t even try commenting, I will be deleting such inane and naïf comments.

Adding to the tree for once

You probably are used to me sending last rites for packages on behalf of QA, and thus removing packages (or in the past just removing packages, without the QA part). It often seems like my final contribution to Gentoo will be negative, in the sense that I remove more than I add.

Right now, though, I’ve been adding quite a few packages to the tree, so I’d like to say that no, I’m not one of those people who just like to remove stuff and who would like you to have a minimal system!

So with some self-advertisement (and a shameless plug while I’m at it…) I’d like to point out some of the things I’ve been working on.

The first package I’d like to separate is the newly-added app-emacs/nxml-libvirt-schemas: as I ranted on I wanted to be able to have syntax completion for the libvirt (a)XML configuration files (I still maintain they should have had either a namespace, a doctype or a libvirt-explicit document element name), so now that me and Daniel got the schemas to a point where they can be used to validate the current configuration files, I’ve added the package. It uses the source tarball of libvirt, with the intent of not depending on it, I’m wondering if it’d be better to use the system-installed Relax-NG files to create the specific Relax-NG Compact files, but that’ll have to wait for 0.7.5 anyway (which means, hopefully next week).

The second set of packages is obviously tied in with the Ruby-NG work and consists of a few new Ruby packages; some are brought in from the testing overlay I’ve built to try out the new packages, others have been brought in as dependency of packages being ported to the new eclasses, one instead (addressable) is a dependency of Typo that I didn’t install through Portage lately. I should probably add that I’m testing the new ebuilds “live” on this blog, so if you find problems with it, I’d be happy to receive a line about that.

The third set of packages instead relates to a work I’m currently doing for a long-time customer of mine, a company developing embedded systems; I won’t disclose much about the project, but I’m currently helping them build a new firmware, and I’m doing most of my job through the use of Portage and Gentoo’s own ebuilds. For this reason I have already added an ebuild for gdbserver (the small program that allows for remote debugging) that makes it trivial to cross-compile it for a different architecture, and I’m currently working on a gcc-runtime ebuild (which would also be pretty useful, if I get it right, for remote servers, like my own, to void having to install the full-blown GCC, but still have the libraries needed).

And tied to that same work you’ll probably find a few changes for cross-compilation both in and out of Gentoo, and some other similar changes; I have some GCC patches that I have to send upstream, and some changes for the toolchain eclass (right now you cannot really merge a GCJ cross-compiler, or even build one for non-EABI ARM).

So this is what I’m currently adding to the tree myself, I’m also trying to help the newly-cleaned up virtualization team to handle libvirt (and its backports) as well as the GUI programs, and I should be helping Pavel getting gearmand into shape if I had more time (I know, I know). And this is to the obvious side of the tinderbox work which is still going on and on (and identi.ca proves it since the script is denting away the status continuously), or the maintainer work for things like PAM (which I bumped recently and I need to double-check for uclibc).

So now, can you see why I might forget about things from time to time?

Gentoo developer up for hire

Although I have a job interview this afternoon, for a Java programmer position nonetheless, I’m currently just working as an external contractor, thus I don’t have any kind of job security, as you can guess from my ranting of not having been able to pay for Yamato just yet.

But this is not a request for help, it’s rather an offer, if anybody is looking to hire me up for some job related to Free Software, may it be related to Gentoo, embedded software, ELF files and stuff like that. I’ve written up something here.

You can guess that if I’m hired for Gentoo-related jobs, or to extend Free Software projects, which are things I’m experienced with, and good at, it’s going to be a contribution to the community too.

Now, this weekend I’m probably going to write a bit more about Gentoo automation and other topics, so please bear with me with this little “advertisement break” while I finish my other writings.

XML misuses

XML, the eXtensible Markup language, is probably one of the data description formats most hated between developers, especially open source developers. It is also one of the most used formats, lately.

A lot of projects nowadays rely on XML: backend for configuration systems (think GConf), feeds for your blogs, format for the documents of both OpenOffice and Microsoft Office, used for modern web applications and for RPC between webservices.

It is hated a lot because it is often misused. Look at the XML configuration files of fontconfig, they tend to be quite unreadable. Indeed XML is not much designed to be human-readable as it is to be easily parsable by very different software. This is one good reason to hate it. Then add stuff like SOAP and WSDL and you can see when XML can really get out of hand.

XML is a very good way to make it simpler for different software to share data, as you can easily add more data into a given format without having to rewrite the parser, and as long as you follow some design rules, it is also easy to keep backward compatibility to very old versions. It is also good for converting structured data between formats, think DocBook, XHTML and our very own GuideXML. I also use a variant of that for my site even if I never actually formalised it and published it. One day I’ll do that, too.

Binary formats are not easily extensible, although EBML (the format upon which the Matroska multimedia container format is based) tries to do that, with a huge amount of complexity. They are quite nicer to deal with when you have a lot of data to transmit and little of it has to be understood by humans, so I will always find UPnP over-engineered, and its use of XML not a good choice.

Text-based formats like the INI format are, in my opinion better suited for configuration files. This is especially true since there are quite a lot of libraries that implement an easy way to parse them without reinventing the wheel, it should also be trivial to write a simple command that can be used to parse them in bash – if there isn’t one already.

But this post was supposed to be about XML, right? So is its misuse as a configuration file enough to make XML the most blamed format out there? Maybe, but it’s certainly not the only reason I can find.

Another reason, to me more important, is the “almost XML” formats. What is an “almost XML” format? It’s simply a format that is based off XML but is not enough XML. In this category I’d put the ASX format. Even if Wikipedia defines it an XML data format, the truth is that most ASX files have a bastardised XML format:

  • closing tags gets optional, just like HTML;
  • tag and attribute names are case-insensitive, just like HTML.

Parsing an ASX file is quite more a problem than parsing a true XML file; and caused quite a bit of problems with xine-lib (and its frontends), as it stops us from just reusing a parser like the one in libxml2, and it’s easier to make mistake while reinventing the wheel.

Yesterday (actually, today for me; if you couldn’t tell before, I started writing blog posts in advance and just showing them the day afterward about at noon on my timezone) I ended up working on another “almost XML” format. This time the format itself is declared as XML, it also has .xml extension, but it is not described by a DTD (or an XML schema), it features redundant fields, unused fields, and… it is not parsed as XML.

To be precise, I’m writing a software that writes these files, and I can tell you, they are written as XML, and also read as XML from my side. I actually use libxml2 to do the work. But the consumer of those files does not treat them as XML. Instead, it expects the file to be formatted in a precise way, with the line count being always the same, which means that I have to keep comments, I can’t add more comments, I can’t ignore ignored elements, and so on.

Now I can tell why there is so much hate for XML around.