New personal gamestation!

Beside a new devbox that I talked about setting up, now that I no longer pay for the tinderbox I also decided to buy myself a new PC for playing games (so Windows-bound, unfortunately), replacing Yamato that has been serving me well for years at this point.

Given we’ve talked about this at the office as well, I’ll write down the specs over here, with the links to Amazon (where I bought the components), as I know a fair number of people are always interested to know specs. I will probably write down some reviews on Amazon itself as well as on the blog, for the components that can be discussed “standalone”.

  • CPU: Intel i7 5930K, hex-core Haswell-E; it was intended as a good compromise between high performance and price, not only for gaming but also for Adobe Lightroom.
  • Motherboard: Asus X99-S
  • Memory: Crucial Ballistix 32GB (8GBx4) actually this one I ordered from Crucial directly, because the one I originally ordered on Amazon UK was going to ship from Las Vegas, which meant I had to pay customs on it. I am still waiting for that one to be fully cancelled, but then Crucial was able to deliver an order placed on Wednesday at 10pm by Friday, which was pretty good (given that this is a long weekend in Ireland.)
  • Case: Fractal Design Define R5 upon suggestion of two colleagues, one who only saw it in reviews, the other actually having the previous version. It is eerily quiet and very well organized; it would also fit a huge amount of storage if I needed to build a new NAS rather than a desktop PC.
  • CPU cooler: NZXT Kraken X61 I went with water cooling for the CPU because I did not like the size of the copper fins in the other alternatives of suggested coolers for the chosen CPU. Since this is a completely sealed system it didn’t feel too bad. The only shaky part is that the only proper space for this to fit into the case is on the top-front side, and it does require forcing the last insulation panel in a little bit.

Now you probably notices some parts missing; the reason is that I have bought a bunch of components to upgrade Yamato over the past year and a half since being employed also means being able to just scratch your itch for power more easily, especially if you, like me, are single and not planning a future as a stock player. Some of the updates are still pretty good and others are a bit below average now, and barely average when I bought it, but I think it might be worth listing them still.

  • SSD: Samsung 850 EVO and Crucial M550, both 1TB. The reason for having two different ones is because the latter (which was the first of the two) was not available when I decided to get a second one, and the reason to get a second one was because I realized that while keeping pictures on the SSD helped a lot, the rest of the OS was still too slow…
  • GPU: Asus GeForce GTX660 because I needed something good that didn’t cost too much at the time.
  • PSU: be quiet! Dark Power Pro 1200W which I had to replace when I bought the graphics card, as the one I had before didn’t have the right PCI-E power connectors, or rather it had one too few. Given that Yamato is a Dual-Quad Opteron, with registered ECC memory, I needed something that would at least take 1kW; I’m not sure how much it’s consuming right now to be honest.

We’ll see how it fares once I have it fully installed and started playing games on it, I guess.

The end of an era, the end of the tinderbox

I’m partly sad, but for the most part this is a weight that goes away from my shoulders, so I can’t say I’m not at least in part joyful of it, even though the context in which this is happening is not exactly what I expected.

I turned off the Gentoo tinderbox, never to come back. The S3 storage of logs is still running, but I’ve asked Ian to see if he can attach everything at his pace, so I can turn off the account and be done with it.

Why did this happen? Well, it’s a long story. I already stopped running it for a few months because I got tired of Mike behaving like a child, like I already reported in 2012 by closing my bugs because the logs are linked (from S3) rather than attached. I already made my position clear that it’s a silly distinction as the logs will not disappear in the middle of nowhere (indeed I’ll keep the S3 bucket for them running until they are all attached to Bugzilla), but as he keeps insisting that it’s “trivial” to change the behaviour of the whole pipeline, I decided to give up.

Yes, it’s only one developer, and yes, lots of other developers took my side (thanks guys!), but it’s still aggravating to have somebody who can do whatever he likes without reporting to anybody, ignoring Council resolutions, QA (when I was the lead) and essentially using Gentoo as his personal playground. And the fact that only two people (Michał and Julian) have been pushing for a proper resolution is a bit disappointing.

I know it might feel like I’m taking my toys and going home — well, that’s what I’m doing. The tinderbox has been draining on my time (little) and my money (quite more), but those I was willing to part with — draining my motivation due to assholes in the project was not in the plans.

In the past six years that I’ve been working on this particular project, things evolved:

  • Originally, it was a simple chroot with a looping emerge, inspected with grep and Emacs, running on my desktop and intended to catch --as-needed failures. It went through lots of disks, and got me off XFS for good due to kernel panics.
  • It was moved to LXC, which is why the package entered the Gentoo tree, together with the OpenRC support and the first few crude hacks.
  • When I started spendig time in Los Angeles for a customer, Yamato under my desk got replaced with Excelsior which was crowdfounded and hosted, for two years straight, by my customer at the time.
  • This is where the rewrite happened, from attaching logs (which I could earlier do with more or less ease, thanks to NFS) to store them away and linking instead. This had to do mostly with the ability to remote-manage the tinderbox.
  • This year, since I no longer work for the company in Los Angeles, and instead I work in Dublin for a completely different company, I decided Excelsior was better off on a personal space, and rented a full 42 unit cabinet with Hurricane Electric in Fremont, where the server is still running as I type this.

You can see that it’s not that ’m trying to avoid spending time to engineer solutions. It’s just that I feel that what Mike is asking is unreasonable, and the way he’s asking it makes it unbearable. Especially when he feigns to care about my expenses — as I noted in the previously linked post, S3 is dirty cheap, and indeed it now comes down to $1/month given to Amazon for the logs storage and access, compared to $600/month to rent the cabinet at Hurricane.

Yes, it’s true that the server is not doing only tinderboxing – it also is running some fate instances, and I have been using it as a development server for my own projects, mostly open-source ones – but that’s the original use for it, and if it wasn’t for it I wouldn’t be paying so much to rent a cabinet, I’d be renting a single dedicated server off, say, Hetzner.

So here we go, the end of the era of my tinderbox. Patrick and Michael are still continuing their efforts so it’s not like Gentoo is left without integration test, but I’m afraid it’ll be harder for at least some of the maintainers who leveraged the tinderbox heavily in the past. My contract with Hurricane expires in April; at that point I’ll get the hardware out of the cabinet, and will decide what to do with it — it’s possible I’ll donate the server (minus harddrives) to Gentoo Foundation or someone else who can use it.

My involvement in Gentoo might also suffer from this; I hopefully will be dropping one of the servers I maintain off the net pretty soon, which will be one less system to build packages for, but I still have a few to take care of. For the moment I’m taking a break: I’ll soon send an email that it’s open season on my packages; I locked my bugzilla account already to avoid providing harsher responses in the bug linked at the top of this post.

How much the tinderbox is worth

After some of the comments on the previous post explaining the tinderbox I’ve been thinking more about the idea of moving it to a dedicated server, or even better making it an ensemble of dedicated servers, maybe trying to tackle more than just one architecture, as user99 suggested.

This brings up the problem of costs, and the worth of the effort itself. Let’s start from the specifics: right now the tinderbox is running on Yamato, which is my main workstation, it’s a bi-quad Opteron, total 8×2.0GHz CPUs, 16GB of registered ECC RAM, and over 2TB of storage, connected to the net from my own DSL line which is not that stable nor fast. As I said the main bottleneck is the I/O, rather than the CPUs, although when packages have proper parallel build systems, the multiple CPUs work quite well. Not all resources are dedicated to the tinderbox effort as they are: storage space especially is just partially given to the tinderbox as it doesn’t need it that much. I use this box for other things beside Tinderboxing, some related to Gentoo, other to Free Software in general, and others finally related to my work or my leisure.

That said, looking through the offers of OVH (which is what Mauro suggested me before, and seem to have most friendly staff and good prices), a dedicated server that has more or less the same specifics as Yamato costs around €1800/year. It’s definitely too much for me to invest to the sole Tinderbox project, but it’s not definitely too much (getting the hardware would cost me more, and this is outsourcing all the hardware maintenance problems). Considering two boxes, so the out-of-tinderbox resources could also be shared (one box to hold the distfiles and tree, the other to deal with the logs), it would be €3600/year, and the ability to deal with both x86 and amd64 problems. Again, too much for me alone, but not absolutely too much.

Let me consider how resources are actually used right now, one by one.

The disk on-disk space used by the Tinderbox is relatively difficult to properly evaluate: I use separated but shared partitions for the distfiles, the build directories and the installed system. For what concerns the distfiles, and the tree, I could get some partial result from the mirrors’ statistics; right now I can tell you that 127GB is the size of the distfiles directory on Yamato. The installed system is instead 73GB (right now) while the space needed for the build directories never went over 80GB (which is the maximum size of the partition I use), with the nasty exception of qemu, as it fills the directory entirely. So in general, it doesn’t need that much hard disk space.

Network traffic is not much of a problem either I’d say: beside the first round of fetching all the needed distfiles, I don’t usually fetch more than 1GB/sync (unless big stuff like a new KDE4 release is handled). This would also be vastly made moot if the internal network had its own Gentoo mirror (not sure if that’s the case for OVH, but I’ll get to that later).

So the only problem is CPU and I/O usage, which is what a dedicated server is all about, no problem there I guess at all, so whoever would end up hosting the (let’s assume) two tinderboxes would only have to mind the inter-box traffic, which is usually also not a problem if they are physically on the same network. I looked into OVH because that’s what I was suggested; I also checked out the prices for Bytemark which is already a Gentoo sponsor, but at least the price to the public is entirely another league. Ideally, given it’s going to be me to invest the man-hours to run the tinderbox, I’d like for the boxes to be located in Europe rather than in America, where as far as I know most of Gentoo’s current infrastructure is; if you have any other possible option you can share, I’d very much like to compare first of all the prices to the public for various providers, given a configuration in this range: LXC-compatible kernel, bi-quad CPU, with large cache, 16G RAM minimum, 500GB RAID1 storage (backup is not necessary).

Now, I said that I cannot afford even to pay for one single dedicated server for the tinderbox, why am I pondering about this? Well, as many asked before “Why is this not on official Gentoo infra?“ is a question I’m not sure how to answer, last I knew infra wasn’t interested in this kind of work. On the other hand even if it’s not proper infra, I’d very much like to have some numbers, to propose the Gentoo Foundation for paying for the efforts. This would allow to extend the reach of the tinderbox, without having me praying for help every other day (I would most likely not stop using Yamato for tinderboxing, but two more instances would probably help).

Also, even if the Foundation wouldn’t have directly the kind of money to sustain this for a long period, it might still be better to have them to pay for it sustained by users’ donations. I cannot get that kind of money clearing through my books directly, but the Gentoo Foundation might help for that.

So it is important, in my opinion, to have a clear figure, and objective, on the kind of money that it’d be costing. It would also help to have some kind of status “Tinderboxes covered to run for X months, keep them running”.

And before somebody wonders: this is all my crazy idea, I haven’t even tried to talk with the Foundation yet, I’ll do so once I can at least present some data to them.

Tinderbox: explaining some works

Many people asked before to explain better how my tinderbox works so that they can propose changes and help. Well, let me try to explain more how the thing is working so I can actually get some suggestions, as right now I’m a bit stuck.

Right now the tinderbox is a Linux Container that runs in Yamato; since Linux Containers are pretty lightweight, that means it has all the power of Yamato (which is actually my workstation, but it’s an 8-way Opteron system, with 16GB of Registered ECC RAM and a couple of terabytes of disk space).

Processing power, memory and disk space are shared between the two so you can guess that while I’m working with power-hungry software (like a virtual machine running a world rebuild for my job), the tinderbox is usually stopped or throttled down. On the other hand this makes it less of a problem to run, since Yamato is usually always running anyway. If I had to keep another box running just for the tinderbox, the cost in electrical power would be probably too high for me to sustain for a long time). The distfiles are also shared in a single tree (among also the virtual machines, other containers and chroots so that makes it very lightweight for Yamato to run in the background).

Since it’s an isolated container, I access the tinderbox through SSH, from there I launch screen and then I start the tinderbox work; yes it’s all manual for now. The main script is tinderbox4a.py that Zac wrote for me some time ago; this lists all the packages that haven’t been merged in the given time limit (6 weeks), or that have been bumped since the last time they were merged. It also spews on the standard error if there are USE-based dependencies that wouldn’t be satisfied with the current configuration (since the dependencies are brought in automatically, I just need to make sure the package.use file is set properly.

The output of this script is sorted by name and category; unfortunately I noticed that doing so would isolate too many of the packages at the bottom of the list, so to make it more useful, I sort it at random before saving it to a file. That file is then passed as argument to two xargs calls: the tinderbox itself, and the fetcher. The tinderbox itself has this command xargs --arg-file list -n1 emerge -1D --keep-going which means that each package listed is tried to install with its dependencies brought in, and if some new dependency fails to build (but the old is present) it’s ignored.

Now that you’ve seen how the tinderbox is executed you can see why I have a separate fetcher: if I were to download all the sources inline with the tinderbox (which is what I did a long time ago) I would end up having to wait for the download to complete before it would start the build, and thus add a network-bound latency to the whole job which is already long enough. So the fetcher runs this: xargs --arg-file list emerge -fO --keep-going which runs a single emerge to fetch all the packages. I didn’t use multiple calls here because the locks on vdb would make the whole thing definitely slower than what it is now; thanks to --keep-going it doesn’t stop when one package is unfetchable at least.

Silly note here: I noticed tonight while looking at the output that sometimes it took more time to resolve the same host name than fetching some smaller source file (since Portage does not yet reuse the connection as it’s definitely non-trivial to implement — if somebody knows of some kind of download manager that keeps itself in the background to reuse connections without using proxies I’d be interested!). The problem was I forgot to start nscd inside the tinderbox… took a huge hit from that, now it’s definitely faster.

This of course only shows the running interface; there are a couple extra steps involved though; there is another script that Zac wrote me: unavailable_installed.py that lists the packages that are installed in the tinderbox but are now unavailable, for instance because they were masked, or removed. This is important so I can keep a clean system from stuff that has been dropped because it was broken and so on. I run this each time I sync, before starting the actual tinderbox list script.

In the past the whole tinderbox was much hairier; Zac provided me with patches that let me do some stuff that the official portage wouldn’t do, that made my job easier, but now all of them are in the official Portage, and I just need to disable the unmerge-logs feature as well as enable the split-log one: the per-category split logs are optimal to submit them to the bugzilla, as Firefox does not end up chocking while trying to load the list of files.

When it’s time to check the logs for failures (of either build/install or tests, since I’m running with FEATURES="test test-fail-continue"), I simply open my lovely emacs and run this grep command: egrep -nH -e "^ .**.* ERROR:.*failed" -r /media/chroots/logs/tinderbox/build/*/*:*.log | sort -k 2 -t : -r which gives me the list of logs to look into. Bugs for them are then filed by me, manually with Firefox and my bug templates since I don’t know enough Python to make something useful out of pybugz.

Since I’m actually testing for a few extra things that are currently not checked for by Portage, like documentation to be installed in the proper path, or mis-installed man pages, I’ve got a few more greps rounds to run in the completed logs to identify them and report them, also manually. I should clean up the list of checks but for now you got my bashrc if you want to take a peek.

The whole thing is long, boring, and heavy on maintenance; I have still to polish some rough edges, like a way to handle the updates to the base system before starting the full run, or a way to remove the packages broken by ABI changes if they are not vital for the tinderbox operations (I need some extra stuff which is thus “blessed” and supposedly never going to be removed, like screen, or ruby to use ruby-elf).

There are currently two things I’d like to find a way to tweak in the scripts. The first is a way to identify collision problems: right now those failures gets only listed in the elog output and I have no way to get the correct data out without manually fiddling a lot with the log, which is suboptimal. The second problem is somewhat tied to that: I need a scoring system that drops all the packages that failed to merge to drop down in the list of future merges: build failures and collisions alike. This would let me spend more time building untested packages than rebuilding those that failed before.

If you want to play with the scripts and send me improvements, that’s definitely going to be helpful; a better reporting system, or a bashrcng plugin for the QA tests (hint, this was for you Mauro!) would be splendid.

If you still would like to contribute to the tinderbox effort without having knowledge of the stuff behind it, there are a couple of things you can get me that would definitely be helpful; in particular a pretty useful thing would be more RAM for Yamato; it has to be exactly the same as the one that I got inside, but luckily, I got it from Crucial so you can get it with the right code: CT2KIT51272AB667 — yes the price is definitely high, I paid an even higher price though for it, though. If you’d like to contribute this, you should probably check the comments, in the unlikely case I get four pairs of those. I should be able to run with 24, but the ideal would be to upgrade from the current 16 to 32 GB; that way I would probably be able to build using tmpfs (and find eventual problems tied to that as well). Otherwise check the donations page or this post if you’re looking for more “affordable” contributions.

The routed network broadcast problem

Fragment of my topology

You might remember my network diagram that has shown you the absurd setup I have at home to connec tall the rooms where computers are located. Since then, something was reduced, and indeed now the network section between my bedroom and the office is over the usual Ethernet (should be Gigabit, but something doesn’t look right) cable. This actually should also reduce the power consume at home since the old Powerline adaptors were still an extra powered appliance; the main reason why I replaced the, though, was that the green LEDs definitely bothered me while trying to sleep, and at the same time, speed was quite an issue with some files’ streaming.

The result is that only two media are used here: WiFi and cabled Ethernet; unfortunately, I still lack a way to connect Yamato and Deep Space 9 (the router) via Ethernet directly, so they are connected via a standard infrastructure WiFi. This is not really exceptional, in the sense that the connection between them is not very stable (I use an ath9k card on Yamato, with 2.6.32rc7 kernel), and when I’m downloading stuff with bittorrent or similar, I need to restart the network connections about once every five minutes to keep it going properly, which you can guess is not that fun.

Now unfortunately there is one problem here, which I ignored for quite a while but I cannot ignore any longer (because I finally got the table I needed to play with Unreal Tournament 3 with my PlayStation 3!): the cabled ethernet segments fail to get UPnP support.

The whole network inside a single Class B IP range (172.28.0.0/16), fractioned into four main subnetworks (direct, and behind Yamato, known and unknown computers, they have different filters on the firewall) by the DHCP server running on Deep Space 9 (for simplicity, Yamato is the only box in the network to have a static IP address, in an unused subnetwork range together with Deep Space 9, beside the router/DHCP server itself). Yamato has two interfaces enabled: wlan0 which connects to the AP and then to DS9, and br0 which is the bridge of the remaining interfaces (eth0 and eth1 for the cabled network segments – the latter I only bring up when I need more ports for work devices – and vde0 for the virtual networks). Here start the problem: while a WiFi network is usually akin to a switched network, and of course my cabled segment is also switched, the two together are not switched but routed together, by Yamato which is a second router in the network.

Of course I built DS9 to reduce the load of Yamato (even though my original planning involved linking the cabled with that through a another, very long, cable), so the services are currently mostly running on DS9 rather than Yamato: DHCP server, DNS server, UPnP server and so on. The problem is that almost all the “zeroconf” kind of services, which include not only Apple’s Bonjour protocol, but UPnP and DHCP as well use the UDP transport and the broadcast address to look for the servers. And UDP broadcast only works within switched networks, not routed ones.

The obvious solution in these cases, which is more or less the only solution you’ll ever read proposed around when people ask about broadcast repeaters, is to use bridging instead of routing to merge the two networks together; a switch is, after all, just a multi-port bridge, so the result is again a switched network. Unfortunately this brings two issues with it: the first is that you effectively lose the boundary between the two networks, even when that was very transparent, like I’d like it to be, the filtering can still be useful for some things; the latter is that bridging WLAN interfaces is complex and pretty much suboptimal.

The problem with bridging WLAN is that putting the network card in promiscuous mode is not enough: the access point by default only sends over the air the PDUs whose destination is an associated mac address. And telling the access point to send all the PDUs might not be good either; while in my setup the problem is relatively small (the only two devices connected via Ethernet to DS9 are the AP and the Siemens VoIP phone — the Linux bridge software will still understand to only send the VoIP phone data to the connected network card and the rest to the AP), it doesn’t look like a very good long-term solution.

To solve part of the problem, at least the most common part of it, both ISC DHCP and Avahi provide support for transparently join two routed networks that would otherwise be isolated: dhcrelay and Avahi’s refector. The former is not just a simple repeater of DHCP requests, but it also adds a “circuit-id” to the requests, so that requests coming from behind it are tagged and can be treated differently (this is how I handle differently the clients behind Yamato — of course those have to get to a subnet that is routed through Yamato); the latter just picks up the service broadcasts and copy them to the various interfaces it listens on… but neither is perfect.

With dhcrelay the problem is deep inside the way it has been implemented: it has to listen on both the interface the requests will come from, and that where the responses come from… and it doesn’t discriminate between them; this means in the case of Yamato that I have to listen to both br0 and wlan0, but then the requests sent by the clients on WiFi will still reach the relay and would be sent back to DS9 through the relay; for this reason the “circuit-id” contains the interface the request came from, so I only check for that id to be br0 instead of just checking if it exists, before deciding how to divide the clients. The alternative is using iptables to filter the requests from the wlan0 interface, but let’s leave that for a moment.

The problem with Avahi seems more to be a bug, or rather an untested corner case; I have found no way to stop Linux from issuing link-local IPv6 addresses to the interfaces that result “up”; this unfortunately means that eth0, vde0 and br0 all have their IPv6 address… so the broadcasts coming from wlan0 are reflected on all three of them, and all the clients connected to the cabled (or virtual) segment will receive the broadcast twice. This wouldn’t be much of an issue if Apple’s compuers didn’t decide to rename themselves to “Whatever (2)” when they felt somebody else was using their hostname in the network. I should speak with Lennart about it but I haven’t had time to deal with that just yet.

There remains a third protocol there that I found no solution for yet: UPnP; with UPnP the problem is relatively easy: SSDP uses UDP broadcasts on port 1900 to find the router, before talking directly with it, so the only thing that I’d be needing is a repeater over that particular port. The best solution to me would have been using iptables directly, but since that’s not implemented for what I can see, I guess I’ll end up either writing my own UDP repeater, or look for something working, and properly written. If somebody has a clue about that, I’d be happy to hear the solutions.

Interestingly enough, UPnP during my analysis proven to be the only protocol I’m interested in that actually could be just re-broadcasted with a generic repeater; for DHCP, I need to discern proxied requests to assign them to properly routed subnetworks; for Bonjour, the port wouldn’t be free for a repeater since Avahi itself would be using it to begin with.

So bottom-line, I’d have three needs that somebody might want to help me with: get a better dhcrelay the current implementation sucks in more ways than a few, starting for the not being able to specify which is the input and which the output interface, or the lack of a configurable circuit-id string; fix the Avahi IPv6 reflector over bridged network, although I have no idea how (alternative: find a way to tell Linux/OpenRC not to issue a link-local IPv6 address to the interfaces); write a generic UDP broadcast repeater so that UPnP can work with a routed network — the last one is what I’ll probably work on tomorrow so I can get the PS3 to pass through the ports with DS9.

Preparing for the new tinderflame run

So the first run of my tinderbox completed. And now I’m preparing for some further sweeps to take care of what the first run ignored. This includes packages needing kernel sources, packages with USE dependencies that are not expressed in EAPI 2 yet and so on.

The new run will build still on the current system disks, since after thinking about it I cannot really reuse the hardware I got already, and to get to a solution I’m going to have to buy some extra hardware, either an external eSATA enclosure with port multiplier, or a new SATA controller using PCI-E, and new disks. Since I haven’t finished paying for Yamato yet (for those interested in the bookkeeping, completed with SDMC board for IPMI stuff, costed me around €2300; the funds I’ve raised up to today cover around €500), I’d rather not go buy new hardware, thus the build will run on the system disks, and I’ll see to set up smartd so that it tests them at least once a week.

I’m not going to test for proper -ggdb use though since that’s going to require much more disk space than I have for now, so it’s still delayed for a different run for now. I’m probably going to check the pre-stripped files though, I’ll have to archive the older logs to check those against what is reported already on the bugzilla.

So what remains to be said? Well that I’m really not feeling too well lately, and I spend my nights working till exhaustion because I’m afraid of my own sleep. Which is bad for me but very good for Gentoo, I guess…

The disk problem

My first full tree build with dependencies, to check for --as-needed support has almost finished. I have currently 1344 bugs open in “My Bugs” search, that contains reports for packages failing to build or breaking with --as-needed, packages failing to build for other reasons, packages with file collision that lack blockers around them (there are quite a lot, even totally unrelated one with the other), and packages bundling internal copies of libraries such as zlib, expat, libjpeg, libpng and so on.

I can tell you, the amount of tree packages not following policies such as respecting user LDFLAGS, not using bundled libraries, and not installing stuff randomly in /usr is much higher than one might hope for.

I haven’t even started filing bugs for pre-stripped packages since I have to check those for being filed already, by either me in a previous run, or by Patrick with his tinderbox or other people as well. I also wanted to check this against a different problem: packages installing useless debug info using split-debug, by not passing -g/-ggdb properly to the build system and thus not including debug information at all. Unfortunately for this one I need much more free space than I have right now on Yamato. And here I start with my disks problems.

The first problem is space; I allocated 75GB of space for the chroots partition, which uses XFS, after extending it a lot; with a lot of packages missing, I’m reaching for the last 20GB free. I’ll have to extend it more, but to do that I have to get rid of the music and video partitions after moving them to the external drive that Iomega replaced for me (now running RAID1 rather than JBOD; and HFS+ since I want to share it with the laptop if I need the data and Yamato is off). I also will have to get rid of the Time Machine volume I created in my LVM volume group, and start sharing the copy on the external drive; I did that so that the laptop was still backed up while I waited for the replacement disk.

The distfiles directory has reached over 61GB of data, and this does not include most of the fetch-restricted packages, of course I already share it between Yamato’s system and all the chroots (by the way, I currently have it as /var/portage/distfiles, but I’m considering moving it to /var/cache/portage/distfiles since it seems to make more sense; maybe I should propose this to be the actual default in the future, as using /usr for this does not sound kosher to me), like I share the actual synced tree. Still, it is a huge amount of data.

Also, I’m not using in-RAM build, even though I have 16GB of memory in this box. There are multiple reasons for this; the first is that I leave the build run even when I’m doing something else, which might require RAM by itself, and thus I don’t want the two to disrupt themselves so easily, and also, I often go away to watch movies, playing or something while it builds, so I have to look back at the build even a day after; and sometimes colleagues ask me to look at a particular build that might have happened a few days earlier. Having the build on disk helps me a lot here, especially for epatch, eautoreconf and econf logs.

Another reason is that the ELF scan process that scanelf uses is based on memory mapped files, which is very nice when you have to run a series of scanelf calls on the same set of files, since the first run will cache all of them in memory and the others will just to traverse the filesystem to find them. So I want to have as much memory free as I can.

So at the end the disks get to be used a lot, which is not very nice especially since they are the disks that host the whole system for now. I start to fear for their health, and I’m looking for a solution, which does not seem to be too obvious.

First of all, I don’t want to go buying more disks, possibly I’d rather not buy any new hardware for now since I haven’t finished paying for Yamato yet (even though quite a few users contributed, whom I thank once again; I hope they’re happy to know what Yamato’s horsepower is being used for!), so any solution has to be able to be realised using what I have already in house, or need to be funded somehow.

Second, speed is not much of an issue although it cannot be entirely ignored; the build reached sys-power today at around 6pm, and it started last Friday, so I have to assume that a full build, minus KDE4, is going to take around ten days. This is not optimal yet since kde-base makes the ebuild rebuild the same packages over and over switching between modular and monolithic, the solution would be to use binpkgs to cache the rebuilds, which is going to be especially useful to avoid rebuilds on collision-protect failures, and on unmerged packages due to blockers, but that’s going to slow down the build a notch. I haven’t used ccache either, I guess I could have, but I’d have to change the cache directory to avoid resetting the caching I use for my own projects.

So what is my current available hardware?

  • two Samsung SATA (I) disks, 160GB big; they were the original disks I bought for Enterprise, they currently are one in Farragut (which is lacking a PSU and a SATA controller, after I turned it off last year), and one in Klothos (the Sun Ultra 5 with G/FBSD);
  • one Maxtor 80GB EIDE disk;
  • one Samsung 40GB EIDE disk;
  • just one free SATA port on Yamato’s motherboard;
  • a Promise SATA (I) PCI controller;
  • no free PCI slots on Yamato;
  • one free PCI-E x16 slot;

The most logical solution would be to harness the two Samsung SATA disks in a RAID0 software array, and use it as /var/tmp, but I don’t have enough SATA ports; I could set up the two EIDE drives but they are not the same size so RAID0 would be restricted to the 40GB size of the smallest one, which may still be something, since the asneeded chroot’s /var/tmp is currently 11GB.

Does anybody know if a better solution to my problems? Maybe I should be using external drive enclosures or look for small network attached storage systems, but those are things that I don’t have available, and I’d rather not go to buy until I finished paying for Yamato. By itself, Yamato has enough space and power to handle more disks, I guess I could be using a SATA port multiplier too, but I don’t really know about their performance, nor brands or anything, and again would be requiring to buy more hardware.

If I get to have enough money one day, I’m going to consider cabling with gigabit network my garage and set up there a SAN with Enterprise or some other box, a lot of HDDs, and serve them through ZFS+iSCSI or something. For now, that’s a mere dream.

Anyway, suggestions, advices and help about how to reorganise the disk problem are very welcome!

Building the whole portage

Since my blog post about forced --as-needed yesterday, I started building the whole portage in a chroot to see how many packages break with the forced --as-needed at build time. While build-time failure are not the only problem here (for stuff like Python, Perl and Ruby packages, the failure may well be at runtime), build-time failure are probably the most common problem with --as-needed; if we also used --no-undefined, like Robert Wohlrab is trying (maybe a little too enthusiastically), most of the failures would be at build time by the way.

As usual, testing one case also adds tests for other side cases, this time I’ve added further checks to tell me if packages install files in /usr/man, /usr/info, /usr/locale, /usr/doc, /usr/X11R6, …and filed quite a few bugs about that already. But even without counting these problems, the run started telling me some interesting thing that I might add to the --as-needed fixing guide when I get back to work on it (maybe even this very evening).

I already knew that most of the failures I’d be receiving would be related to packages that lack a proper buildsystem and thus ignored LDFLAGS up to now (included --as-needed), but there are a few notes that really gets interested here: custom ./configure scripts seem to almost always ignore LDFLAGS and yet fail to properly link packages; a few glib-based packages fail to link to libgthread the main executable, failing to find g_thread_init(); and a lot of packages link the wrong OpenSSL library (they link libssl when they should link libcrypto).

This last note, about OpenSSL libraries, is also a very nice and useful example to show how --as-needed helps users in two main ways. Let’s go over a scenario where a package links in libssl instead of libcrypto (since libssl requires libcrypto, the symbols are satisfied, if the link is done without --as-needed).

First point: ABI changes: if libssl changed its ABI (happens sometimes, you know…), but libcrypto kept the same, the program would require an useless rebuild: it’s not affected by libssl ABI, but by libcrypto’s.

Second point, maybe even more useful at runtime: when executing the program, the libraries in NEEDED are loaded, recursively. While libssl is not too big, it would still require loading one further library that is unneeded to the program since libcrypto is the one that is actually needed. I sincerely don’t know if libssl has any constructor functions, but when this extra load happens with libraries that have many more dependencies, or constructor functions, it’s going to be a quite huge hit for no good reason.

At any rate, I wish to thank again all the people who contributed to pay for Yamato, as you can see the horsepower in it is being put to good use (although I’m still just at app-portage); and just so I don’t have to stress it to do the same work over and over again, I can tell you that some of the checks I add to my chroots for building are being added to portage, thanks to Zac, who’s always blazing fast to add deeper QA and repoman warnings, so that further mistakes don’t creep into the new code.. one hopes, at least.

Oh and before people start expecting cmake to be perfect with --as-needed, since no package using it has been reported as failing with --as-needed … well the truth is that I can’t build any package using cmake in that chroot since it fails to build because of xmlrpc-c. And don’t get me started again on why a build system has to use XML-RPC without a chance for the user to tell “not even in your dreams”.

More notes about the flags testing

Before entering the hospital in Verona, I wrote about a feasible way to check for CFLAGS; in the past two days I decided to start testing my method over a wide range of packages, running buildpkg over each category (for packages either without build-time dependencies or with build-time dependencies I have merged already in the chroot). Its results have been quite interesting.

Beside the fact that this method also allows me to identify ebuilds installing pre-stripped files, I’ve also found quite a few packages failing in general, a few with broken DEPEND (as in missing stuff that’s needed at buildtime, and a lot of those were caused by typos or trivial mistakes in the ebuild, which I fixed myself without even bothering opening a bug), but I also noticed one thing about my test.

The way I designed my test, it sets up the modified CFLAGS (injecting the symbol) during pre_src_compile (and now pre_src_configure for EAPI=2); the problem with that is that there are more than a couple of packages that do respect CFLAGS, but by setting them in stone inside the makefiles during src_unpack (and probably nowadays src_prepare).

While it’s not officially a mistake, I’d sincerely say this is not what’s intended; I’d expect CFLAGS to be used only during configure/compile phases, not during unpack/prepare phases that should be, in my opinion, not system dependent. For instance it would be nice if one day we could run up to src_prepare once, and then build N-times the package as needed by multilib dependencies.

Anyway, if you’re a Gentoo developer maintaining a package that does set in stone the CFLAGS during src_unpack, you’re most likely going to get a bug from me; I won’t be disappointed even if you close it, but really, don’t you think you can do best?

In general, for software that does not respect CFLAGS by default you can work it around in many ways without recurring to the set in stone approach:

# This...
CFLAGS = -O2 -fomit-frame-pointer -Wall -Wextra -DSOMETHING -DOTHER
# may become
CFLAGS += -Wall -Wextra -DSOMETHING -DOTHER

# This...
gcc -O9 -funroll-all-loops -Ipath
# may become
gcc $(CFLAGS) -Ipath

and so on so forth.

uClibc testing

As I said on What did Enterprise do?, I had (and have again) a series of chroots I use for testing particular setups; I have, for instance, one running OpenPAM that can tell me whether the software in the tree have the proper dependencies (either sys-libs/pam if it wants Linux-PAM or virtual/pam if it works with OpenPAM).

Since yesterday, thanks to solar, I have a new addition to my testing rig: an uclibc chroot. I asked solar to get me something I could download and run locally as I had to fix a bug with PAM which is now fixed.

I have to say that I don’t know much yet about the setup of uClibc itself, which means I haven’t gotten to understand well yet how iconv is supported in it. Certainly I now know that once USE-based dependencies will be available in the tree I’ll try once again to see if libiconv can be used for something else beside Gentoo/FreeBSD (but the collision with man-pages should be solved before that, if it’s not already).

Even though I know solar does not really wish for me to mess in with NLS and uClibc, I find it a pretty important part of Gentoo/FreeBSD work, I always found it as such, the reason for this is that it’s easier to fix something the right way when you have more than one alternative case. Otherwise you might end up special casing something that should be made generic instead.

I also expect Gentoo/FreeBSD 7 to be upcoming, and that will probably mean my return to that too, now that I can get a VirtualBox running at a decent speed.

But I haven’t even started setting up the uClibc chroot to speed with what I want to do, in particular I want to set up my cowstats script on that too, and maybe one day adding the flags testing script too (which is unfortunately disruptive).

All in all, I hope that having an uclibc chroot around will allow the packages I maintain to work out of the box on uClibc, it’s going to be a pretty interesting task.