Nitpicking and improving

Patrick has written about broken packages homepages, and called for other OCD developers to look after that. While I hope I’m not suffering from OCD myself, there is a very good point about making sure that the HOMEPAGE variable is correct, and I think this is one of the thing we have been overlooking for too long already.

The first problem is obviously that users get reported the wrong homepage, as well as developers that try to push changs upstream, but there is another consideration that I noticed the other day, looking to push some changes upstream. An homepage has disappeared from the net, the domain expired, maybe the upstream author lost interested in free software or simply changed domain; I didn’t go around to actually look for the right homepage sincerely. The important part is that the domain was no more, and instead a domain squatter parking site replaced it.

Can you see the problem yet? HOMEPAGE values are duplicated many times on ViewVC as well as on all our mirrors that put the ebuilds available on HTTP as well as RSync, and even more importantly, they are linked on Packages which means that domain squatter get inbound links from a supposedly reliable source. I guess if we don’t already we should set nofollow on the links, but that’s also not nice since for the valid homepages indexing should actually happen properly, for sake of the people who actually look for a software’s homepage and similar.

But this is not the only “nitpicking” that actually has effects on users. There are more things that most developers overlook with time and that instead should be taken care of so that users have the best possible experience. For instance, think of packages that download stuff from sites like kernel.org or Debian, Ubuntu or Fedora download servers; those should use the mirror:// special protocol that looks up a list of possible mirrors from the thirdpartymirrors file and also allows the user to override the choice with its own preferred mirror for that particular archive (I for instance set a series of preferred mirrors in Italy for the most used archives). Unfortunately, not all packages actually do this, and instead refer to the main first server, probably overloaded already, to download the files by default.

It is nitpicking since most users will most likely get the source files out of Gentoo mirrors rather than the upstream sites, but there are some times when it’s useful to use the mirrors, for instance when you want to bump a package before it actually enters portage, and thus the Gentoo mirrors, or when the bumped package is in an overlay. Or when you need to get an older version that is no longer available on our mirrors.

Want more nitpicking? Cleaning up FILESDIR is what you are looking for. You probably know that the tree does not only host all our ebuilds but also patch files, init scripts and default configuration files; this allows for the developers to make changes available in no time, but it has obvious downsides: the more patches are added to the tree, the more data uers have to download, even for packages they are not interested in. While we could work on a different strategy to distribute the tree, we can try to avoid overloading users. To do this, repoman already warns you if you’re committing to a package that has in FILESDIR files bigger than 20K; unfortunately there are a number of files in the tree that are bigger, sometimes much bigger; I estimated that yesterday there were between 1.5 and 2MB of big files in the CVS tree which were not supposed to be there.

While developers are not supposed to overlook these things when they add new files, there are subtle causes that has to be considered for this, one of this is the patching of Makefile.in and configure files together with Makefile.am and configure.ac. While Mart dislike my idea of not allowing that situation, it would certainly reduce the risks of having huge files in your downloaded tree that you don’t need.

But it’s not just huge files the problem, there are also packages that have, all in all, big FILESDIR since they have a number of small files (remember that for most users, each file is gong to take at least 4KiB each, since not everybody uses properly optimised filesystems for small files). And also there are duplicated files, which are something that I really have a beef for, since most of the times you don’t need duplicated files at all. It’s understandable (although it really screams HACK!) for KDE 3 ebuilds, since every patch is living in both the split and the monolithic ebuild, but it’s not understandable for other packages if there are duplicated identical files between versions (somebody said apcupsd?).

Then there is the need to push upstream more and more patches, trying to get various upstream to accept our patches, reworking them if needed, and so on (I sincerely think that stuff like git and other distributed VCS helps a lot to push the patches upstream since we can have official trees with the patches that just get merged, also allowing to attribute them to the users submitting them when needed), which is a very long ad boring task, but also very important to reduce the overhead that Gentoo as a distribution adds.

Now, I know all the things I wrote about this time are details, but when you explode these details on a massive scale you get quality improvements that actually matter. Now, writing down this gave me more than a couple of ideas that I should probably flesh out to see to push for Gentoo, I’ll write them down on FreeMind so I can get back to those tomorrow or, more likely, in the next week.

Also, if you want to reach me you might not be able to until next week; today I have a friend needing hardware help in the evening; tomorrow (Friday) is my birthday and I’ll go out in the evening/night, Saturday I’ll probably be having a sort of party with family, and Sunday I’ll be to watch a play directed by a friend of mine. I really need time to charge my batteries since lately I’m feeling blue a bit too easily.

Exit mobile version