Let’s actually get some metadata!

This blog entry is intended to be a technical proposal, so you might need some knowledge on how Gentoo works if you want to follow it very well. I hope to be as clear as possible so that even non-technical users can read this, but I can’t guarantee it.

So you might or might not know (but if you are a Gentoo developer, and you don’t know, please leave your badge and CVS commit access on the table at the right of the door) that together with ebuilds, Manifests, ChangeLogs and files/ subdirectory, a package’s directory usually contain a metadata.xml file. This file right now contains very little information, although it’s very important information: which herd the package belongs to, who is/are the maintainer(s) of the package, and an optional long description that couldn’t fit in the DESCRIPTION variable of the ebuild. Optionally, you can encode them in different languages, although I see very little going on with that.

These are indeed metadata for the package, as they are data that describe the package. Good, then. Are there problems with this? I don’t.

So if we don’t have problems to solve we should be all set, right? No I don’t think so. As Doug pointed on #gentoo-dev a while ago, it’s very difficult to understand what generic USE flags do in a particular package. Sure you all can imagine that jpeg or png USE flags do in general, but what do they do in the particular case of CUPS for instance? Do they allow jpeg or png files to be printed? Do they allow the HTTP interface to use jpeg or png files?

This is not limited to CUPS and those USE flags. What does the minimal USE flag do for libcdio? And what does it do for vcdimager? And for other packages? While the general sense of “minimal” is clear – it cuts down the stuff that the package installs just to install the core of it, like a library for instance – the specific behaviour of the flag on a given package might require looking at the package.

Doug proposed to use use.local.desc more often, but the problem with that is that the file is already way too long, and you can’t really do much explanation with it, or you make it very very big. And it’s impossible to restrict the description of an USE flag per version.

Here is the problem then, how can represent enough information for this? Well, one way is to write extensive documentation for the package and add it to the Gentoo documentation. This works well for complex packages like, say, Apache, but it doesn’t scale well for simple packages that may not have documentation at all because they are not supposed to be directly used by users.

A possible solution is to consider the particular description of an USE flag for the package as metadata of that package, and, well, write down that information in metadata.xml. I suppose the main problem here is that a lot of Gentoo developers despise XML in any form, without even considering that, while I actually hate any configuration file in XML, it’s quite easy to use it to represent information that needs to be accessed through many different means, as an interexchange format.

Anyway, this is just a proposal, feel free to comment if you want to say something about it; please avoid XML bashing in general, although feel free to comment if you think that it’s inappropriate its use in this context.

I suppose that for this proposal to be accepted, there are three main obstacles to overcome: the first is modifying the DTD to allow some tags to document the USE flags; the second is deciding if the flag in metadata.xml is enough to skip IUSE.invalid warnings (by changing repoman in case), and the third is implementing the support for fetching the description of the USE flag in tools like ufed and similar.

I just hope this doesn’t have to become a GLEP to actually be used.

Exit mobile version