Sophistication can be bad

Everybody heard about the KISS principle I guess — the idea is the less complex a moving part is, the better. This is true in software as much as mechanics. Unix in particular, and all the Unix-like projects including GNU, also tended to follow that principle as it can be shown by the huge amount of small utilities that only do one particular text or file editing functions — that is until you introduce sed, awk and find.

Now we all know that the main sophistication that is afoot in the Linux world nowadays is Lennart’s systemd. I have no intention to discuss it now, or at any later time I’d say. I really don’t care as long as I have a choice not to use it, and judging from a given thread I think we’ll always have an alternative, no matter what some people said before and keep saying.

No, my problem today is not with udev deciding it’s time to stop using the same persistent rules that people had to fight with for years and that now are no longer usable, and instead it’s a problem with util-linux, and in particular with the losetup utility that manages the loop devices. See, the loop devices have been quite a big deal in the past, mostly because they started as a fixed amount, then the kernel let you decide how many, and then finally code was enabled that would let you change dynamically the amount of loop devices you want to have available. Great, but it required a newer version of util-linux, and at the time when it was introduced, there wasn’t one that actually worked as intended.

Anyway, in the past week I’ve been working on building a new firmware image for the device I’m working on, and when it comes down to run the script that generates the image to burn on the SSD, it locked up with 100% CPU usage (luckily the system is multicore so I could get in to kill it). The problem was to be found in losetup, so today with enough time on my hands, I went to check it out. Turns out that the reason why it failed was a joint issue between my setup, OpenRC updates, and util-linux updates, but let’s proceed with order.

The build happen on a container for which I was not mounting /sys — or at least so I intended, although it is possible that OpenRC mounted it on its own; this has changed recently, but I don’t think those changes hit stable yet, so I’m not sure that’s the case. I had created static nodes for the loop devices and for /dev/loop-control — but this latter was not to be found at first today. Maybe I deleted it by mistake or something along those lines. But the point is it worked before, and nothing changed beside an emerge -avuDN.

So, what happens is that the script is running something along the lines of losetup --find --show file which is intended to find the first available loop device, set up the file, and then print the loop device that was found. It’s a bit more complex than this as I’m explicitly setting up only the partition on the loop device (getting partitioned loop devices to play cool with LXC is a pain), but the point stands. Unfortunately, when both /dev/loop-control and /sys are unreachable, the looping around that should give us the first available device is looping over the same device over and over and over again, never trying the next. This causes the problem noted above, of losetup locking at 100% CPU usage.

And it’s definitely not the only problem! If you just execute losetup --find, which should give you the first available device, it provides you /dev/loop0 even if that device is already in use. Not content enough with these problems? losetup -a lists no device, even when they are present, and still returns with a valid, zero exit status. Which is definitely not the case!

Okay you can say that losetup is already trying its best by using not one but three different sources (the third one is /proc/partitions) to find the data to use, but when the primary two are not usable, you shouldn’t expect it to give you proper information, should you? Well, that’s not the point. The big problem is that it should tell me “man, I can’t get you the data you requested because I need more sources, give me the sources!” instead of trying its best, failing, and locking up.

The next question is obviously “why are you ranting, instead of fixing it?” — the answer is that I tried, but the code I was reading made me cry. The problem is that nowadays, losetup is just a shallow interface to some shared code in util-linux .. and the design of said code makes it very difficult to make it clear whether a non-zero return value from a function is a “we reached the end of the list” or “I couldn’t see anything because I lack my sources”. And it really didn’t feel like a good idea for me to start throwing away that code to replace it with something more KISS-compliant.

So at the end of the day, I fixed my container to mount /sys and everything works, but util-linux is still broken upstream.

Updating init scripts

Hah. I know what you’re thinking: Flameeyes has disappeared! Yeah, I probably wish I was spending vacations in London or somewhere along those lines, but that’s not the case. Alas I’m doing double shifts lately, which is why Gentoo is taking mostly second place. But I shouldn’t complain, in this economy, having too much work is pretty rare.

Beside still operating the tinderbox I’ve decided to spend some time to update the init scripts that come with my packages. The new OpenRC init system has a much more extensive runscript shell that is used to execute the init scripts; this means that new init scripts can be written in a declarative way, that makes them shorter and more fool proof.

Indeed, for some init scripts – such as the new netatalk ones – the script boils down to setting the dependencies and declare which command to start and with which option. This is very good and nice.

I have to thank Samuli for the idea, as he went to update acpid’s init script with the new style, and so pushed me to look at other init scripts — in one case, it was because the package (haveged) was not working on one of my vservers, it seems like it was simply a transient problem, the latest version worked fine… and now has a new init script as well!

In some cases these updates also come with slight changes in behaviour, mostly in the case of ekeyd that is no longer setting sysctls for you (that’s why you got a sysctl.conf after all!), and in the case of quagga I ended up finally collapsing the two init scripts in a single one (before it was one for zebra and one for the rest, now it’s a single one symlinked for each service).

What is this reminding me, though, is the problem with init scripts that I have found with LXC before: a number of init scripts can’t be used on the host if you plan on using the same init script within a container: vtun, ulogd, autoconfig, wmacpimon, nagios, amphetadesk, vdr, portsentry and gnunetd use killall; drqsd, drqmd, ttyd, upsd and irda use non-properly-bound pkill, ncsa, npcd, btpd, nrpe, gift, amuled and amuleweb use non-properly-bound pgrep (and in the case of ncsa, npcd and nrpe, all of which seem to involve nagios, what it’s trying to do is a simple pkill).

Luckily it seems like there aren’t any more scripts installed in /etc/init.d that don’t use runscript, although I’m sure there are a few more that need work, if they are provided by upstream, as the Netatalk case shows us.

Oh well, more work to do then.

Squeezing more performance out of the tinderbox

While I got some contacts that might change a bit the way tinderbox will work in the (not-too-near) future, I’m still doing my best to optimise the whole process so to squeeze as much power out of it as possible, both here and wherever it’ll be in that future.

Interestingly enough, dropping out of the software RAID1 volume and moving to a straight LVM, especially for what concerns the log files, made a difference. This shows especially when you look at the time spent doing the final merge to the lifefs. After all, RAID1 wasn’t as important to me as a proper backup of the configuration, which I set up properly with rsnapshot, and an external (eSATA) Western Digital RAID-0 drive.

Right now, I’m instead looking for a way to reduce the merge redundancy: while it has gotten much better than the earlier days, especially since USE-based dependencies can be caught much sooner than the old hacks, there is still room for improvement.

One of the improvements I figured out yesterday under the shower (a pretty good place to think on how to solve programming problems). Right now the queue is just a list, printed out by the tindebox scripts that Zac wrote. To avoid recurring over the same packages over and over, each time I stop the process to sync up, I resume the list where it stopped, rather than just generate a new one; this way I’m quite sure it reduces over time, for sure. The “queue list” is randomise

In parallel, I still generate the complete list, this time not randomised, to be used to fetch all the packages that are not fetched already (it also provides me a list of fetch-restricted packages, and helps finding packages that, well, fail to fetch for various other reasons). This list is generated from scratch so it lists all the packages that follow the usual requirements (“not installed in its latest version in the past six weeks”), and is thus usually longer (as it lists also the packages that failed the tinderbox run, and those passed through over six weeks ago).

But, in that second list, you might not find some of the packages in the first: packages installed as a dependency will not be in there, if they were installed less than six weeks ago, and yet if they were slated for later merge they’ll still be into the queue. When this shortlist of packages include things like Berkeley DB, it makes very much sense to avoid them: the sys-libs/db merge takes a few days to complete. Right now, the difference between the difference is about 238 packages (well, to be honest this is a rough estimate since it’s also accounting for packages masked by failure), but this is when almost all of the packages were merged, less than 3000 packages remaining. At the beginning of the run this is going to be much better.

What happens now then? Well I’ll probably fix up the shell script I’m using to restart the tinderbox, so that it proceeds to generate the new complete list, launch the fetch, then reduce the main queue (re-shuffling it at the same time). This would reduce the work I need to do each time the tinderbox restarts, and also reduce the time needed to make a complete run.

On the other hand I’d definitely need some way to handle the “remove-head-of-list” automatically; right now I have to handle that manually, and I also have to kill the tinderbox manually, trying to catch it between merges. So if somebody feels like fetching the git repository, check my notes and replace the current xargs call with a proper script that can be directed in some way (netcat is as good a way as any other to command it), and that can pause-rehash-resume, keeping score of what was merged and what not…