I wouldn’t want to be a mirror admin

I’ve noted in my previous post that I recently built a 12TB storage server; half a terabyte has been already reserved for Gentoo’s distfiles, both as a local mirror to update my boxes without having to re-download everything, and because the tinderboxes require a lot of distfiles by definition (since they build the whole tree).

The original way I used to download all the files was to simply pass the whole list of files to emerge -Of that ran in parallel with the tinderbox process.. unfortunately this has shown to be of limited reliability; in particular due to REQUIRED_USE there are situations where a newly-introduced requirement will cause the fetch not to behave, and thus will slow down tinderboxing of new packages while the new files are fetched. Plus, if the tinderbox masked all the version of a particular package (which it can do when no version of said package builds in its environment), and I passed it to emerge -f, it wouldn’t fetch anything at all — you can’t really run a single emerge -f command, as the command line arguments limit is hit much sooner, so xargs splits it into multiple, serial calls. And as a final straw, whenever the tinderbox has to fallback to an older version of a package, it’ll have to find that distfile as well, which might not be in its cache already.

To solve all these issues and make good use of the new box that stores the data, I was given by Zac the set of infra scripts that are used to manage distfiles; in particular the mirror-dist script, written by Brian a long time ago, is the one that takes care of fetching the packages from the upstream sources and add them to the master mirror. Looking at its output I’m .. honestly scared.

Let’s begin with the whole kernel.org issue: you probably already know that their master server was compromised and all the attached services have been disabled, including the network of mirrors for both kernel- and non-kernel-related software (among others, Linux-PAM is also hosted at kernel.org). Well, this means that all the upstream fetch URIs for those packages are unusable. Due to the nature of the mirror-dist script, it was obviously not going to fetch the packages out of Gentoo mirrors, until I asked Fabio to hack around it (I’m no good with Python), and get the packages from Gentoo mirrors first, so until that point, it was unable to fetch any package released on kernel.org. Lovely.

There is a second condition that is outside of Gentoo’s control that is causing headaches to this, and probably to our mirror admins as well. It hasn’t gotten as much coverage as the whole kernel.org issue, but FSF found themselves not in compliance with the GPL, with respect to binutils, as some intermediate output was provided in the tarballs without the original sources used to generate that. So what did they decide to do? Revoke all the tarballs and replace them with a new release with new version numbers? No. Reissue them with an appended “a” noting it? No. They decided to simply rewrite all of them. Same filename, same URL, but different content. Congratulations for the headache you’re causing us!

But kernel.org projects, and GNU packages, are definitely not the only type of packages that have trouble with fetching; a number of upstream repositories no longer allows packages to be downloaded, and this causes major headaches if you don’t want to rely on the Gentoo mirrors’ network.

It has been proposed many times before to fix the SRC_URI variable for the packages that point to unfetchable sources. I even opened a wishlist bug for it to check (with HTTP’s HEAD method) whether the file is available upstream or not (and the same goes for the homepage). Unfortunately I lack the Python skills to implement this and nobody else seems to be interested in this. I would have suggested this for GSoC, but .. let’s not go there, please.

But if you have the skills, and the time, having repoman check for the availability of the files before committing would be a godsend — it would have, for instance, prevented committing one system and two system-related packages in the past months without their respective patchsets. Well, if we were to also ban direct access to mirror://gentoo/ of course.

3 thoughts on “I wouldn’t want to be a mirror admin

  1. I did a “quick hack”:https://gist.github.com/121…, you can grab the patch and check out example output (tested on x11-terms/, run with -vv)I want to see if this meets your need first before I move on to submit the patch and continue the work (only http/https uris are checked so far)Please not I didn’t even know repoman before you posted about it, so you might want to point out any flaws. Also the QA name “SRC_URI.fetcherror” might not be valid if there is a spec for that.


  2. binutils is a non-issue as FSF has forgiven any GPL violations.The real issue that hurts users is when package maintainers don’t check to make sure their package is fetchable (as you pointed out already, but a repoman check won’t help all cases), really that list is always available at: http://dev.gentoo.org/distf… Occasionally, I widdle that list down to <5 myself, but it always seems to return to ~15.


  3. as of yesterday am when I last tried binutils-2.21.1 was still not fetchable.Couldn’t get the file right no matter if I tried to d’load via ebuild or directly and add to /usr/portage/distfiles/ if it doesn’t resolve itself when I sync this afternoon I’ll be ‘bugging’ again.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s