Patrick has written about broken packages homepages, and called for other OCD developers to look after that. While I hope I’m not suffering from OCD myself, there is a very good point about making sure that the HOMEPAGE variable is correct, and I think this is one of the thing we have been overlooking for too long already.
The first problem is obviously that users get reported the wrong homepage, as well as developers that try to push changs upstream, but there is another consideration that I noticed the other day, looking to push some changes upstream. An homepage has disappeared from the net, the domain expired, maybe the upstream author lost interested in free software or simply changed domain; I didn’t go around to actually look for the right homepage sincerely. The important part is that the domain was no more, and instead a domain squatter parking site replaced it.
Can you see the problem yet? HOMEPAGE values are duplicated many times on ViewVC as well as on all our mirrors that put the ebuilds available on HTTP as well as RSync, and even more importantly, they are linked on Packages which means that domain squatter get inbound links from a supposedly reliable source. I guess if we don’t already we should set nofollow on the links, but that’s also not nice since for the valid homepages indexing should actually happen properly, for sake of the people who actually look for a software’s homepage and similar.
But this is not the only “nitpicking” that actually has effects on users. There are more things that most developers overlook with time and that instead should be taken care of so that users have the best possible experience. For instance, think of packages that download stuff from sites like kernel.org or Debian, Ubuntu or Fedora download servers; those should use the mirror://
special protocol that looks up a list of possible mirrors from the thirdpartymirrors
file and also allows the user to override the choice with its own preferred mirror for that particular archive (I for instance set a series of preferred mirrors in Italy for the most used archives). Unfortunately, not all packages actually do this, and instead refer to the main first server, probably overloaded already, to download the files by default.
It is nitpicking since most users will most likely get the source files out of Gentoo mirrors rather than the upstream sites, but there are some times when it’s useful to use the mirrors, for instance when you want to bump a package before it actually enters portage, and thus the Gentoo mirrors, or when the bumped package is in an overlay. Or when you need to get an older version that is no longer available on our mirrors.
Want more nitpicking? Cleaning up FILESDIR is what you are looking for. You probably know that the tree does not only host all our ebuilds but also patch files, init scripts and default configuration files; this allows for the developers to make changes available in no time, but it has obvious downsides: the more patches are added to the tree, the more data uers have to download, even for packages they are not interested in. While we could work on a different strategy to distribute the tree, we can try to avoid overloading users. To do this, repoman
already warns you if you’re committing to a package that has in FILESDIR files bigger than 20K; unfortunately there are a number of files in the tree that are bigger, sometimes much bigger; I estimated that yesterday there were between 1.5 and 2MB of big files in the CVS tree which were not supposed to be there.
While developers are not supposed to overlook these things when they add new files, there are subtle causes that has to be considered for this, one of this is the patching of Makefile.in
and configure
files together with Makefile.am
and configure.ac
. While Mart dislike my idea of not allowing that situation, it would certainly reduce the risks of having huge files in your downloaded tree that you don’t need.
But it’s not just huge files the problem, there are also packages that have, all in all, big FILESDIR since they have a number of small files (remember that for most users, each file is gong to take at least 4KiB each, since not everybody uses properly optimised filesystems for small files). And also there are duplicated files, which are something that I really have a beef for, since most of the times you don’t need duplicated files at all. It’s understandable (although it really screams HACK!) for KDE 3 ebuilds, since every patch is living in both the split and the monolithic ebuild, but it’s not understandable for other packages if there are duplicated identical files between versions (somebody said apcupsd?).
Then there is the need to push upstream more and more patches, trying to get various upstream to accept our patches, reworking them if needed, and so on (I sincerely think that stuff like git and other distributed VCS helps a lot to push the patches upstream since we can have official trees with the patches that just get merged, also allowing to attribute them to the users submitting them when needed), which is a very long ad boring task, but also very important to reduce the overhead that Gentoo as a distribution adds.
Now, I know all the things I wrote about this time are details, but when you explode these details on a massive scale you get quality improvements that actually matter. Now, writing down this gave me more than a couple of ideas that I should probably flesh out to see to push for Gentoo, I’ll write them down on FreeMind so I can get back to those tomorrow or, more likely, in the next week.
Also, if you want to reach me you might not be able to until next week; today I have a friend needing hardware help in the evening; tomorrow (Friday) is my birthday and I’ll go out in the evening/night, Saturday I’ll probably be having a sort of party with family, and Sunday I’ll be to watch a play directed by a friend of mine. I really need time to charge my batteries since lately I’m feeling blue a bit too easily.
I wonder if Gentoo has ways to automate mundane tasks. A simple script runned monthly somewhere on gentoo infra could use, i.e. curl to test if HOMEPAGE returns 4xx errors, if it redirects, where to and possibly even file a bug automatically that the HOMEPAGE is wrong for the given package.Same goes for patch upstream pushing … many projects are hosted on sourceforge, some on distributed VCSs. It could save time for developers to be able to send the patch to upstream via a simple shell interface (sourceforge: curl POST with using a generic gentoo sf account, say bugbot @ gentoo.org) ..BTW, does Gentoo have tools that do automatic bug filing (into gentoo’s bugzilla) – i.e. based on some build tests (thinking now of, say, gcc 4.3 testing). If yes, it might be really cool if the compilation testing workload could be distributed among gentoo users in a sandboxed+no.user.interaction manner. Might for example, BOINC be used to distribute compile test workunits? These would contain some commands to create a claan virtual host and compilation instructions to test a package/compiler/anything. Maybe it’s a naive approach, might put lots of extra load on gentoo infra but overall would help gentoo move much faster. This might be a cool GSoC project.BTW, Happy Birthday!
> It’s understandable (although it really screams HACK!) for KDE 3 ebuilds, since every patch is living in both the split and the monolithic ebuildThis isn’t true, if the patch set feature is used. The problem is folks adding patch after patch even for minor issues, instead accumulating and updating the patch set tarball every now and then. I didn’t add this feature for no reason.
Just wanted to wish you a happy birthday and good health for the next year. (-:
Happy birthday!
from me as well, good health and enjoy your birthday! thank you very much for your gentoo work.
Happy Birthday!
Happy birthday! And try to relax.
Thanks everybody 🙂 I didn’t want to jinx it so I only I answer today, a few of the last birthdays at home ended up with me in hospital, and I’m happy to have avoided that this time :)Carlo, I know about the patchset tarballs, but I guess sometimes you just want to fix an issue. Sure thing, for a _big_ patch line the one I found with repoman, it would have been *much* nicer to just rebuild the tarball straight away. For the rest my way of handling it was piling up a few and then moving them away, easier for most of the users hitting the rebuild right away, and the others would just be fine.Pavel, as I told you by email, I’ll see to address some of the issues you rised in a different blog post. The homepage issue can also be shuffled around to reduce the problem in size, but there are other things that should changed before we can go with some more automatic submission, but it certainly has to be worked on.