Bad branching, or a Quagga call for help

You might remember that for a while I worked on getting quagga in shape in Gentoo. The reason why I was doing that is that I needed quagga for the ADSL PCI modem I was using at home to work. Since right now I’m on the other side of the world, and my router decided to die, I’m probably going to stop maintaining Quagga altogether.

There are other reasons to be, which is probably why for a while we had a Quagga ebuild with invalid copyright headers (it was a contribution of somebody working somewhere, but over time it has been rewritten to the point it didn’t really made sense not to use our standard copyright header). From one side it’s the bad state of the documentation, which makes it very difficult to understand how to set up even the most obvious of the situations, but the main issue is the problem with the way the Quagga project is branching around.

So let’s take a step back and see one thing about Quagga: when I picked it up, there were two or three external patches configured by USE flags; these are usually very old and they are not included in the main Quagga sources. It’s not minimal patches either but they introduce major new functionality, and they are very intrusive (which is why they are not simply always included). This is probably due to the fact that Quagga is designed to be the routing daemon for Linux, with a number of possible protocol frontends connecting to the same backend (zebra). Over time instead of self-contained, easily out-of-date patches to implement new protocols, we started having whole new repositories (or at least branches) with said functionalities, thanks to the move to GIT, which makes it too easy to fork even if that’s not always a bad thing.

So now you get all these repositories with extra implementations, not all of which are compatible with one another, and most of which are not supported by upstream. Is that enough trouble? Not really. As I said before, Paul Jakma who’s the main developer of the project is of the idea that he doesn’t need a “stable” release, so he only makes release when he cares, and maintained that it’s the vendors’ task to maintain backports. On that spirit, some people started the Release Engineering for Quagga, but ….

When you think about a “Release Engineering” branch, you think of something akin to Greg’s stable kernel releases, so you get the latest version, and then you patch over it to make sure that it works fine, backporting the new features and fixes that hit master. Instead what happens here is that Quagga-RE forked off version 0.99.17 (we’re now to 0.99.21 on main, although Gentoo is still on .20 since I really can’t be bothered), and they are applying patches over that.

Okay so that’s still something, you get the backports from master on a known good revision is a good idea, isn’t it? Yes it would be a good idea if it wasn’t that … it’s actually new features applied over the old version! If you check you see that they have implemented a number of features in the RE branch which are not in master… to the result that you have a master that is neither a super-set nor a sub-set of the RE branch.

Add to this that some of the contributors of new code seems to have not clear what a license is and they cause discussion on the mailing list on the interpretation of the code’s license, and you can probably see why I don’t care about keeping this running, given I’m not using it in production anywhere.

Beforehand I was still caring about this knowing that Alin was using it, and he was co-maintaining it … but now that Alin has been retired, I’d be the sole maintainer of a piece of software that rarely works correctly, and is schizophrenic in its development, so I really don’t have extra time to spend on this.

So to finish this post with a very clear message: if you use Gentoo and rely on Quagga working for production usage, please step up now or it might just break without notice, as nobody’s caring for it! And if a Quagga maintainer reads this, please, please start making sense on your releases, I beg you.

Software is [fF]ree just as long as nobody cares about it

While the title might sound inflammatory, I don’t think there is any way around this: there is not going to be any software that is entirely Free of every constrain unless nobody but its author cares about it. The moment enough people care about a particular piece of software, you have to apply constrain to that, which are usually in form a license. Sure the license could be a Free Software License, or an Open Source License, or both of them at the same time, but the very fact that you have to apply a license over it is usually enough to piss off enough people into saying that your software isn’t really free.

I’m one of those people who really can’t be told that there will ever be an “year of the Linux desktop”; I think that one of the reasons for that is that each year we witness the same amount of controversy in Free Software; enough controversy that even your average daily political issue wouldn’t snob it.

It was an year that I wrote about Sony dropping Linux support and we still are seeing people – declaring themselves Free Software advocates – defending the cracker and the “right” of people to simply copy games that weren’t designed to be free in any way to begin with. I have quite a personal grudge against these people for the simple reason that if they really cared about Linux on the PS3 they would have contributed to the work, rather than hiding behind that excuse just to be able not to pay the authors of content that was designed to be paid for. By accepting this position, the Free Software community is actually showing itself quite immature: “You should follow my (Free) license, but I won’t follow yours!”.

But if last year we had the “defection” of Sony and Oracle, this year started with a pissing match between Canonical and RedHat followers and the rest of the “community”. And before somebody asks, I have personal dislike of the way most of the Canonical deals go on, and I happen to agree with Jürgen about Jono’s average task about moving the spotlight away. But while I have a personal preference for RedHat methods – and a number of colleagues who I don’t hesitate to consider more than simply work acquaintance in their ranks and file – that doesn’t stop me to think that the kernel patch move is quite obnoxious for the community at large.

And now we’re getting a number of mixed signals about Google’s Android, including the mistaken controversy about Linux syscall interfaces — I can’t blame that on bad intentions though, as I was also confused by unclear license information and that is indeed a problem that should be solved. While I now can tell I did read about it before, I totally forgot about the syscall exception, and that should probably be better known.

There are obviously different ranges of “non-as-free-as-you’d-wish” to be found in software that people care about: from license restrictions to the way upstream and distributions collaborates… I have encountered an example of that just last week, with Quagga a routing daemon software.

I have taken up maintaining Quagga simply because I wanted to use it on my home setup, as it was the “easiest” way to get the ADSL router-in-PCI-card to know where to send packets coming from the IPoA bridge — note if you happen to have a Linux-compatible ADSL/ATM modem picking up dust I wouldn’t mind replacing that one… and if I did I would be helping picking up the linux-atm stack that seems to be bitrotting in Gentoo.

When I picked up Quagga I did some polishing among the other things because it was originally copyrighted by a non-Gentoo-related company who contributed ebuilds and init scripts, and their copyright statements took over the Gentoo ones up to that point; that was probably a signal I failed to interpret. While I tried sending upstream my patches, the upstream mailing list has shown a very low level of actual involvement from the one developer and can probably explain why two features that the ebuild did support are still external patches.

At any rate, the 0.99.18 version that was recently released turned out to be not so good as it was intended to be, and the homepage now shows a pretty interesting warning that you would need three patches to fix a few issues that came up after the release — which has been reported to Gentoo, and were fixed before the 0.99.18 ebuild hit the stable tree, for what concerns Gentoo users. When it was proposed on the mailing list to release a bugfix tarball, I proposed simply getting the distributors together and manage a shared “blessed” list of patches and backports for Quagga, to release the same way Greg releases the stable kernels. I’m not really convinced by the reply from the one developer above:

If someone wants to maintain a stable series, they’d be welcome. TBF
though, this is what some distros already specialise in. Some will
even answer your phone calls and try debug & fix your problems for
you!

(FWIW, I’d strongly encourage anyone using Quagga in a business
setting to get a support contract with their favoured vendor; if
they’re a general Linux/Unix vendor rather than a network/routing
specialist, then be sure to tell them Quagga is an important package
to you. This is what helps pay for people to work on Quagga.).

Maybe I’m reading too much between the lines, but while I do understand the need for people to be actually paid to work on stuff (I definitely don’t subscribe to RMS’s doctrine that developers should be okay with making “a mere living” — we’re the bone structure of this age!) I don’t see how inviting vendors to forget coordination and collaboration to try spilling money from users.

Are we really at this point? If we are, that’s a sorry state for Free Software to be in, and is likely going to be a signal of a crumbling system.

Random Gentoo fixes

In the past week or so, I’ve been working in parallel on two Gentoo-related project; one is that I wrote about, for which I need to bypass NATs and allow external IPv6 access while the other will probably be more known in the next few days, when I deploy in Gentoo the code I’m developing.

Both works are something I’m paying to do, even though for the former I’m paid not nearly enough I should, and interestingly, both seem to require me to make some seemingly random, but to my aim quite important changes to Gentoo. Since they are unlikely to show up on anybody’s radar as they are, but some might be of interest to other people doing similar things, I thought I could give you a couple of heads’ up on these:

  • first of all, speaking about Quagga I changed the init scripts to make use of logger if available; this should depend on the configuration files (you can decide whether to use syslog or not) but since that only works with OpenRC and I’m not sure how to test for OpenRC within an init script, I decided to simply add a use requirement on all of them; as it is, it won’t require the logger, but if it’s in the runlevel, it’ll run after it;
  • again on topic of init script, I tweaked the OpenSSH sshd init script so that it doesn’t forcefully regenerate the RSA1 host key, as that’s only used by the version 1 of the SSH protocol, which is not enabled by default; if you do enable it, then RSA1 host key is regenerated, but it’s no longer happening if you don’t request it in sshd_config; I wanted to make it possible to disable RSA/DSA keys for SSH2 but it’s unclear how that would work; this reduces the amount of time needed to start up a one-off instance of Gentoo, such as a SysRescueCD live, or EC2, and reduces the entropy consumption at boot time;
  • tying it up, I’ve tried running audio-entropyd on the two boxes I have to deploy, hoping that it would make it possible for me to replenish the entropy without having to buy two more EntropyKeys (which would erode my already narrow profit margin); unfortunately it didn’t work at all, turning up a number of I/O errors, but the good news is that the timer_entropyd software – that Pavel is proxy-maintaining through me – works pretty nicely for this usage, and I’ve now opened a stable request for it;
  • also while trying to debug audio-entropyd, I’ve noted that strace didn’t really help when calls to ioctl() are made with ALSA-related operations; the problem is that by default, the upstream package is sent down with a prebuilt list of ioctl definitions; I’ve found a way to remove this limitation, which is now in tree as 4.5.20-r1, although I did break build with Estonian language — because the upstream code is not compatible in the first place; I’ll be fixing the Estonian problem tomorrow as soon as I have time;
  • I have started looking into the pwgen init script that is used by our live CDs and by SysRescueCD; again, there are a few useful things with that approach, but I really want to look more into it because I think there is room for improvement;
  • tomorrow and in the next days, depending on how much time I’m left with, I’ll be starting to try again the PKCS#11 authentication — last time I ended up with a system that let me in as root with any password, now I think how I can solve it, but it’ll require me to rewrite pambase almost from scratch;
  • not really related to my work project but I helped a bit our Hardened team to fix a suhosin bug and sent the patch upstream; I’ll be writing in deeper details about this – again as I find time – since it actually motivates me to resume my work on Ruby-ELF to solve the problem at the root.

On a different note, I switched my router from using pdnsd to unbound; while documentation leaves a lot to be desired, unbound seems to perform much better, and also does work with both IPv4 and IPv6 socket listening, which doesn’t seem to be the case at all for pdnsd. I also taken down some notes about using the bind ddns protocol since the documentation “robbat2’:http://robbat2.livejournal.com/ pointed me at is probably out of date now.

Finally solving the IPv6 problem

So, after discussing NATs and dynamic host solutions I finally was able to reach a solution, although I would never have been able to without Maxime’s help as I didn’t consider Freenet6 something worth looking at before, while that is my good solution, as it covers all my requirements:

  • it allows to pass through hostile IPv4 NATs through ipv6-over-udp;
  • it provides you with stable addresses (as long as you connect to the same server and you authenticate);
  • it provides not just one address, but a routed /64 prefix which is exactly what I was looking for.

Unfortunately, it isn’t in itself a solution that is totally painless; the first problem is with the software needed to set up the connection. The original freenet6 client was renamed to net-misc/gateway6; when I first installed that one, it failed to start altogether, beside, it reported overflows which I didn’t like at all. I was able to at least cover the overflows and improve a bit on the init script with the 6.0-r2 version (which also respects LDFLAGS). But even that fails badly when you try connecting to the authenticated server.

Bernard found what the main problem was with that: the software changed name again, and also versioning, it is now called gogoCLIENT and the package is named gogoc. With a bit of trial-and-error, I was able to get net-misc/gogoc in tree, for now only keyworded amd64 unstable (I’ll open a keyword request tomorrow, since it’s the upgrade path for gateway6 users). The new software works fine with authenticated servers, but has some trouble with radvd as it tries to configure it itself… pretty much failing to work in a decent way. It’s something I’ll try to fix in the next few days.

Interestingly enough, this version rather than solving the overflows, adds a new one. So once again if you ever experienced crashes with freenet6/gateway6/gogoc (not in Gentoo, the latter) on x86-64 based-systems, make sure to fetch my overflow patches, or start using Gentoo.

So now I have my stable IPv6 tunnel, and I can set up the firewall to let me access the network as I require, again, thanks Maxime!

There are a few more issues that i have to polish down so I can deploy it safely, some that I didn’t even expect I’d find myself thinking of about. The first problem was finding a way to access the boxes on my network while I’m setting it up without going through the whole routing that gogoNet requires me to go through (it skips out through my point of presence for 6to4, go through Frankfurt, Washington, then back to the Netherlands, which is quite silly). Solving the problem is, actually, pretty simple: I still have Quagga installed on my main router, and at a quick test, I was able to get the two RIP daemons to talk to each other so that accessing the network I assigned to my customer (I was careful to choose one that is disjuncted and compatible with the one I’m running locally (although I probably should come back to what they have been using already right now since it’s just as compatible for what I’m concerned (192.168.2.0/24). Tomorrow, I’ll probably set up RIPng as well and that should solve my problems during testing rather than deployment — I definitely hope this is not the last time I do work like this, since it’s actually partly enjoyable once you know how to deal with it.

But then, I got another interesting notion, although that is definitely more out of professional curiosity than actual necessity. Once I have the two network segments both available as IPv6 (with firewall allowing me access just between the two), I can connect from my side to the other (but obviously not the other way around) just by using the stable addresses assigned to the machines, using IPv6. But what about those machines that don’t support IPv6 at all (mostly, printers)? For those, I would need access to IPv4; I can do that with a SSH jump host, but… what about dual-stack routing?

Theoretically speaking, I should be able to connect to ::192.168.2.135 to connect to an IPv4 address through an IPv6 connection; let’s say I can set up routing with Quagga so that the gateway on my side knows to forward those requests to the gateway on the other side, but then what? Which software component takes care of the translation between IP versions?

Sigh, I wish more SoHo hardware routers provided IPv6 by default, it’ll be making my work much much easier.

About the new Quagga ebuild

A foreword: some people might think that I’m writing this just to banter about what I did; my sincere reason to write, though, is to point out an example of why I dislike 5-minutes fixes as I wrote last December. It’s also an almost complete analysis of my process of ebuild maintenance so it might be interesting for others to read.

For a series of reasons that I haven’t really written about at all, I need Quagga in my homemade Celeron router running Gentoo — for those who don’t know, Quagga is a fork of an older project called Zebra, and provides a few daemons for route advertisement protocols (such as RIP and BGP). Before yesterday, the last version of Quagga in Portage was 0.99.15 (and the stable is an old 0.98 still), but there was recently a security bug that required a bump to 0.99.17.

I was already planning on getting Quagga a bump to fix a couple of personal pet peeves with it on the router; since Alin doesn’t have much time, and also doesn’t use Quagga himself, I’ve added myself to the package’s metadata; and started polishing the ebuild and its support files. The alternative would have been for someone to just pick up the 0.99.15 ebuild, update the patch references, and push it out with the 0.99.17 version, which would have categorized for a 5-minutes-fix and wouldn’t have solved a few more problems the ebuild had.

Now, the ebuild (and especially the init scripts) make a point that they were contributed by someone working for a company that used Quagga; this is a good start, from one point: the code is supposed to work since it was used; on the other hand companies don’t usually care for the Gentoo practices and policies, and tend to write ebuilds that could be polished a bit further to actually be compliant to our guidelines. I like them as a starting point, and I got used to do the final touches in those cases. So if you have some ebuilds that you use internally and don’t want to spend time maintaining it forever, you can also hire me to clean them up and merge in tree.

So I started from the patches; the ebuild applied patches from a tarball, three unconditionally and two based on USE flags; both of those had URLs tied to them that pointed out that they were unofficial feature patches (a lot of networking software tend to have similar patches). I set out to check the patches; one was changing the detection of PCRE; one was obviously a fix for --as-needed, one was a fix for an upstream bug. All five of them were on a separate patchset tarball that had to be fetched from the mirrors. I decided to change the situation.

First of all, I checked the PCRE patch; actually the whole PCRE logic, inside configure is long winded and difficult to grok properly; on the other hand, a few comments and the code itself shows that the libpcreposix library is only needed non non-GNU systems, as GLIBC provides the regcomp/@regexec@ functions. So instead of applying the patch and have a pcre USE flag, I changed to link the use or not of PCRE depending on the elibc_glibc implicit USE flag; one less patch to apply.

Second patch I looked at was the --as-needed-related patch that changed the order of libraries link so that the linker wouldn’t drop them out; it wasn’t actually as complete as I would have made. Since libtool handles transitive dependencies fine, if the libcap library is used in the convenience library, it only has to be listed there, not also in the final installed library. Also, I like to take a chance to remove unused definitions in the Makefile while I’m there. So I reworked the patch on top of the current master branch in their GIT, and sent it upstream hoping to get it merged before next release.

The third patch is a fix for an upstream bug that hasn’t been merged in a few releases already, so I kept it basically the same. The two feature patches had new versions released, and the Gentoo version seems to have gone out of sync with the upstream ones a bit; for the sake of reducing Gentoo-specific files and process, I decided to move to use the feature patches that the original authors release; since they are only needed when their USE flags are enabled, they are fetched from the original websites conditionally. The remaining patches are too small to be part of a patchset tarball, so I first simply put them in files/ are they were, with mine a straight export from GIT. Thinking about it a bit more, I decided today to combine them in a single file, and just properly handle them on Gentoo GIT (I started writing a post detailing how I manage GIT-based patches).

Patches done, the next step is clearing out the configuration of the program itself; the ipv6 USE flag handles the build and installation of a few extra specific daemons for for the IPv6 protocol; the rest are more or less direct mappings from the remaining flags. For some reason, the ebuild used --libdir to change the installation directory of the libraries, and then later installed an env.d file to set the linker search path; which is generally a bad idea — I guess the intention was just to follow that advice, and not push non-generic libraries into the base directory, but doing it that way is mostly pointless. Note to self: write about how to properly handle internal libraries. My first choice was to see if libtool set rpath properly, and in that case leave it to the loader to deal with it. Unfortunately it seems like there is something bad in libtool, and while rpath worked on my workstation, it didn’t work on the cross-build root for the router though; I’m afraid it’s related to the lib vs lib64 paths, sigh. So after testing it out on the production router, I ended up revbumping the ebuild already to unhack itif libtool can handle it properly, I’ll get that fixed upstream so that the library is always installed, by default, as a package-internal library, in the mean time it gets installed vanilla as upstream wrote it. It makes even more sense given that there are headers installed that suggest the library is not an internal library after all.

In general, I found the build system of quagga really messed up and in need of an update; since I know how many projects are sloppy about build systems, I’d probably take a look. But sincerely, before that I have to finish what I started with util-linux!

While I was at it, I fixed the installation to use the more common emake DESTDIR= rather than the older einstall (which means that it now installs in parallel as well); and installed the sample files among the documentation rather than in /etc (reasoning: I don’t want to backup sample files, nor I want to copy them to the router, and it’s easier to move them away directly). I forgot the first time around to remove the .la files, but I did so afterwards.

What remains is the most important stuff actually; the init scripts! Following my own suggestions the scripts had to be mostly rewritten from scratch; this actually was also needed because the previous scripts had a non-Gentoo copyright owner and I wanted to avoid that. Also, there were something like five almost identical init scripts in the package, where almost is due to the name of the service itself; this means also that there had to be more than one file without any real reason. My solution is to have a single file for all of them, and symlink the remaining ones to that one; the SVCNAME variable is going to define the name of the binary to start up. The one script that differs from the other, zebra (it has some extra code to flush the routes) I also rewrote to minimise the differences between the two (this is good for compression, if not for deduplication). The new scripts also take care of creating the /var/run directory if it doesn’t exist already, which solves a lot of trouble.

Now, as I said I committed the first version trying it locally, and then revbumped it last night after trying it on production; I reworked that a bit harder; beside the change in libraries install, I decided to add a readline USE flag rather than force the readline dependency (there really isn’t much readline-capable on my router, since it’s barely supposed to have me connected), this also shown me that the PAM dependency was strictly related to the vtysh optional component; and while I looked at PAM, (Updated) I actually broke it (and fixed it back in r2); the code is calling pam_start() with a capital-case “Quagga” string; but Linux-PAM puts it in all lower case… I didn’t know that, and I was actually quite sure that it was case sensitive. Turns out that OpenPAM is case-sensitive, Linux-PAM is not; that explains why it works with one but not the other. I guess the next step in my list of things to do is check out if it might be broken with Turkish locale. (End of update)

Another thing that I noticed there is that by default Quagga has been building itself as a Position Independent Executable (PIE); as I have written before using PIE on a standard kernel, without strong ASLR, has very few advantages, and enough disadvantages that I don’t really like to have it around; so for now it’s simply disabled; since we do support proper flags passing, if you’re building a PIE-complete system you’re free to; and if you’re building an embedded-enough system, you have nothing else to do.

The result is a pretty slick ebuild, at least in my opinion, less files installed, smaller, Gentoo-copyrighted (I rewrote the scripts practically entirely). It handles the security issue but also another bunch of “minor” issues, it is closer to upstream and it has a maintainer that’s going to make sure that the future releases will have an even slicker build system. It’s nothing exceptional, mind you, but it’s what it is to fix an ebuild properly after a few years spent with bump-renames. See?

Afterword: a few people, seemingly stirred up by a certain other developer, seems to have started complaining that I “write too much”, or pretend that I actually have an uptake about writing here. The main uptake I have is not having to repeat myself over and over to different people. Writing posts cost me time, and keeping the blog running, reachable and so on so forth takes me time and money, and running the tinderbox costs me money. Am I complaining? Not so much; Flattr is helping, but trust me that it doesn’t even cover the costs of the hosting, up to now. I’m just not really keen on the slandering because I write out explanation of what I do and why. So from now on, you bother me? Your comments will be deleted. Full stop.