Pavel wrote an interesting comment on a previous post of mine which I think is worth quoting and considering a bit:
I wonder if Gentoo has ways to automate mundane tasks. A simple script runned monthly somewhere on gentoo infra could use, i.e. curl to test if HOMEPAGE returns 4xx errors, if it redirects, where to and possibly even file a bug automatically that the HOMEPAGE is wrong for the given package.
Same goes for patch upstream pushing … many projects are hosted on sourceforge, some on distributed VCSs. It could save time for developers to be able to send the patch to upstream via a simple shell interface (sourceforge: curl POST with using a generic gentoo sf account, say bugbot @ gentoo.org) ..
BTW, does Gentoo have tools that do automatic bug filing (into gentoo’s bugzilla) – i.e. based on some build tests (thinking now of, say, gcc 4.3 testing). If yes, it might be really cool if the compilation testing workload could be distributed among gentoo users in a sandboxed+no.user.interaction manner. Might for example, BOINC be used to distribute compile test workunits? These would contain some commands to create a claan virtual host and compilation instructions to test a package/compiler/anything. Maybe it’s a naive approach, might put lots of extra load on gentoo infra but overall would help gentoo move much faster. This might be a cool GSoC project.
There isn’t really much automation going on in Gentoo, although I know there are people working on an automated tinderboxing framework. This has been a target of many people for years I think, although I think the most bugs filed with tinderboxing has been by me and Patrick, who recently joined back as a developer. In my case, the bugs are not automated at all, since I file them one by one using Firefox and bug templates. The boring part is looking up metadata, I guess.
Now, I sincerely don’t want to focus yet on automatic the big tasks of testing and checking; before reaching that point we have to automate smaller tasks in my opinion. This starts for instance with allowing automatic or semi-automatic assignment of bugs, which is something Robin has been working for a long time. But this has found quite a bit of problem, and I sincerely think there are a few details one has to work on for making this process easier; some of these points have been already described by Tiziano (dev-zero) some time ago, but I’m unable to find his blog post right now.
Please note that all that I’m going to write here is only my opinion and hasn’t been submitted to the gentoo-dev mailing list yet, I have a backlog of things to submit there, I hope to be able to deal with them one by one in the next weeks.
The first problem is that currently the metadata information is scattered among different repositories: the
metadata.xml files that we have in packages to tell us who maintains them and other information are in the actual tree (gentoo-x86 repository), the DTD for those is with the website data (gentoo-xml) and the
herds.xml file that tells us who is maintaining the herd, as well as the herd’s email address (which is not always the same as the herd name, and which might or might not be the same as the bugzilla contact!), is in the “gentoo” repository by itself.
You can see how this is awkward by the fact that repoman actually fetches the metadata.dtd file each week to make sure it has a fresh copy; but when I proposed having a packaged version of DTDs Josh (nightmorph) answered that there is no point having packages of the DTDs because they seem to be able to change those on a daily basis. Now this already brings me another problem but I’ll try to get some order first.
My first point here is that it makes sense to try consolidating the metadata lookup by at least moving
herds.xml in the same repository as
metadata.xml; this would make it possible to actually have a complete set of data in a single repository, since DTDs are not really needed for accessing it. This is something I have to propose and that have to be considered.
Now, to get down to the DTD issue that I’ve already brought up, leaving alone GuideXML-related DTDs, for
herds.xml you don’t expect the DTD to change on a daily basis. You actually want it to stay as stable as possible, which should make it much easier to have stable DTDs for those. If we were to have that, repoman wouldn’t be fetching the DTD every week, but it would just use the copy that is already on the system, which would be much easier; it’d be up to the developers to actually have the updated copy, one guesses.
But let’s take a step further and return to something I think Tiziano already addressed: DTD syntax does not allow us for expressing structure that would allow us to access more reliably the data. Using some more advanced schema syntax like XML Schema or Relax-NG would make it much simpler to express constrain; I admit I’m not sure of this but I think I remember that they also allows you to define a document where to find a list possible values for a field, which would make sure that the <herd> tag in
metadata.xml would actually be present in
herds.xml. It would then just be a matter to make sure that the schema is checked during CVS check in, so that the data is always correctly formed.
It’s even more problematic than this: currently there are herds that lack an email, like secure-tunneling, which means that you cannot really assign an email address to submit bugs to for net-misc/strongswan for instance (this is what happened to me last week). So we really have to crack down our own data and make sure that the files are always valid and consistent.
I wanted to talk about other details too but for now I’ll stop here, more posts on the topic will follow.