Finding IDs to submit

I have written a lot about the hardware IDs but i haven’t said much about submitting new entries to the upstream databases. Indeed, the package just mirrors the data that is collected by the USB and PCI databases that are managed by Stephen, Martin and Michal.

As an example, I’ll show you how I’ve been submitting the so-called Subsystem IDs for PCI devices from computers I either own, or fix up for customers and friends.

First off, you have to find a system or device whose subsystem IDs have not been submitted yet. Unfortunately I don’t have any computer at hand that I haven’t submitted to the database already. But fear not — it so happens I had an interesting opening. I rented a server from OVH recently, as I’ve had some trouble with one of my production hosts lately, and I’m entertaining the idea of moving everything on a new server and service altogether. But the whole thing is a topic for a completely different time. In any case, let’s see what we can do about these IDs now that I have an interesting system at hand.

First of all, while I don’t have the server at hand to know what’s in it, OVH does tell me what hardware is on it — in particular they tell me it’s an Intel D425KT board (yes I got a Kimsufi Atom, I got the three months lease for now and I’ll see if it can perform decently enough), so that’s a start. Alternatively, I could have asked dmidecode — but I just don’t have it installed on that server right now.

First step is to look at what lspci -v says:

00:00.0 Host bridge: Intel Corporation Atom Processor D4xx/D5xx/N4xx/N5xx DMI Bridge
        Subsystem: Intel Corporation Device 544b
        Flags: bus master, fast devsel, latency 0
        Capabilities: [e0] Vendor Specific Information: Len=08 <?>

This is of course only the first entry in the list but it’s still something. You can see on the second line that it says “Subsystem: Intel Corporation Device 544b” — that means that it knows the subsystem vendor (ID 8086, I can tell you by heart — they have been funny at that), but it doesn’t know the subsystem device. So it’s what we’re looking for: an unknown system! Time to compare the output of lspci -vn — that one does not resolve the IDs, since we’ll need them to submit to the PCI database so if you’re not registered already, do register so that they can be submitted to begin with.

00:00.0 0600: 8086:a000
        Subsystem: 8086:544b
        Flags: bus master, fast devsel, latency 0
        Capabilities: [e0] Vendor Specific Information: Len=08 <?>

Okay so now we know that our first device is Intel’s (VID 8086) and has a000 as device ID — this brings us to https://pci-ids.ucw.cz/read/PC/8086/a000 easy, isn’t it? At the end of the page there’s a list of the known subsystem IDs; pending submissions does not show up the name, but they show up in the table with a darker gray background. All PCI ID entries are moderated by hand by the database’ s maintainers. When you’ll be reading this, the entry for my board will be in already, but right now it isn’t — if it wasn’t obvious, I’m looking for an entry that reads 8086 544b (which is under “Subsystem” above).

Now the form requires just a few words: the ID itself – which is 8086 544b with a space, not a colon – and a name. Note is for something that needs to be written on the pci.ids, so in most cases need to be empty. Discussion if when you wan tot comment on the certainly of your submission; for my laptop for instance we had some trouble with “Intel Corporation Device 0153” — which is now officially “3rd Gen Core Processor Thermal Subsystem”.

The name I’m going to submit is “Desktop Board D425KT” as that’s what the other entry in the database for that device uses as a format — okay it actually uses DeskTop but I’d rather not capitalize another T and see a kitten cry.

Now it’s time to go through all the other entries in the system — yes there are many of them, and most of the time the IDs are not set in the order of the PCI connections, so be careful. More interestingly, not all the subsystems are going to be listed in the same line. Indeed, the third entry that I have is this:

00:1c.0 0604: 8086:27d0 (rev 01) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 00001000-00001fff
        Memory behind bridge: e0f00000-e12fffff
        Prefetchable memory behind bridge: 00000000e0000000-00000000e00fffff
        Capabilities: [40] Express Root Port (Slot+), MSI 00
        Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [90] Subsystem: 8086:544b
        Capabilities: [a0] Power Management version 2
        Capabilities: [100] Virtual Channel
        Capabilities: [180] Root Complex Link
        Kernel driver in use: pcieport

The subsystem ID is listed under “Capabilities” instead — but it’s always the same. This is actually critical: if the subsystem does not match, it means that it’s coming from a different component — for instance if you’re building your own computer, the subsystem of the internal CPU devices and those of the motherboard will not match, as they come from different vendors. And so would happen to add-on cards (PCI, PCI-E, AGP, …).

Sometimes, a different subsystem is also available on internal components that get different names from the motherboard itself — in this case, the Realtek network card on this motherboard reports a completely different ID and I really don’t know how to submit it:

01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet controller (rev 05)
        Subsystem: Intel Corporation Device d626
        Flags: bus master, fast devsel, latency 0, IRQ 44
        I/O ports at 1000 [size=256]
        Memory at e0004000 (64-bit, prefetchable) [size=4K]
        Memory at e0000000 (64-bit, prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, MSI 01
        Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
        Capabilities: [d0] Vital Product Data
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel
        Capabilities: [160] Device Serial Number 01-00-00-00-36-4c-e0-00
        Kernel driver in use: r8169

If for whatever reason you make a mistake, you can click on the “Discuss” link on the submitted content and edit the name that you want to submit. I did make such a mistake during submitting the IDs for this.

So here are the tricks.. happy submission!

Bloody upstream

Please note, this post is likely to be interpreted as a rant. From one point of view it is. It’s mostly a general rant geared toward those upstreams that is generally impossible to talk into helping us distribution out.

The first one is the IEEE — you might remember that back in April I was troubled by their refusal to apply a permissive license to their OUI database, and actually denied that they allow redistribution of said database. A few weeks ago I had to bite the bullet and added both the OUI and the IAB databases to the hwids package that we’re using in Gentoo, so that we can use them on different software packages, including bluez and udev.

While I’m trying not to bump the package as often as before, simply because the two new files increase the size of the package four times. But I am updating the repository more often so that I can see if something changes and could be useful to bump it sooner. And what I noticed is that the two files are managed very badly by IEEE.

At some point, while adding one entry to the OUI list, the charset of the file was screwed up, replacing the UTF-8 with mojibake then somebody fixed it, then somebody decided that using UTF-8 was too good for them and decided to go back to pure ASCII, doing some near-equivalent replacement – although whoever changed ß to b probably got to learn some German – then somebody decided to fix it up again … then again somebody broke it while adding an entry, another guy tried to go back to ASCII, and someone else fixed it up again.

How much noise is this in the history of the file? Lots. I really wish they actually wrote a decent app to manage those databases so they don’t break them every other time they have to add something to the list.

The other upstream is Blender. You probably remember I was complaining about their multi-level bundling ad the fact that there are missing license information for at least one of the bundled libraries. Well, we’re now having another problem. I was working on the bump to 2.65, but now either I return to bundle Bullet, or I have to patch it because they added new APIs to the library.

So right now we have in tree a package that:

  • we need to patch to be able to build against a modern version of libav;
  • we need to patch to make sure it doesn’t crash;
  • we need to patch to make it use over half a dozen system libraries that it otherwise bundles;
  • we need to patch to avoid it becoming a security nightmare for users by auto-executing scripts in downloaded files;
  • bundles libraries with unclear licensing terms;
  • has two build systems, with different features available, neither of which is really suitable for a distribution.

Honestly, I reached a point where I’m considering p.masking the package for removal and deal with those consequences rather than dealing with Blender. I know it has quite a few users especially in Gentoo, but if upstream is unwilling to work with us to make it fit properly, I’d like users to speak to them to see that they get their act together at this point. Debian is also suffering from issues related to the libav updates and stuff like that. Without even going into the license issues.

So if you have contacts with Blender developers, please ask them to actually start reducing the amount of bundled libraries, decide on which of the two build systems we should be using, and possibly start to clear up the licensing terms of the package as a whole (including the libraries!). Unfortunately, I’d expect them not to listen — until maybe distributions, as a whole, decide to drop Blender because of the same reasons, to make them question the sanity of their development model.

I’m doing it for you

Okay this is not going to be a very fun post to read, and the title can already make you think that I’m being an arrogant bastard this time around, but I got a feeling that lately people are missing the point that even being grumpy, I’m not usually grumpy just because, I’m usually grumpy because I’m trying to get things to improve rather than stagnate or get worse.

So let’s take an example right now. Thomáš postd about some of the changes that are to be expected on LibreOffice 4 — one of these is that the LDAP client libraries are no longer an optional dependency but have to be present. I wasn’t happy about that.

I actually stumbled across that just the other day when installing the new laptop: while installing KDE component with the default USE flags, OpenLDAP would have been installed. The reason is obviously that the ldap USE flag is enabled by default, which makes sense, as it’s (unfortunately) the most common “shared address book” database available. But why should I get an LDAP server if I selected explicitly a desktop profile?

So the first task at hand, was to make sure that the minimal USE flag was present on the package (it was), and if it did what was intended, i.e., not install the LDAP server — and that is the case indeed. Good, so we can install only the client libraries. Unfortunately the default dependencies were slightly wrong, with said USE flag, as some things like libtool (for libltdl) are only really used by the server components. This was easy to fix, together with a couple more fixes.

But as I proposed on the mailing list to change the defaults, for the desktop profile, to have the minimal USE flag enabled, hell broke loose — now the good point about it is that the minimal USE flag is definitely being over-used — and I’m afraid I’m at fault there as well, since both NRPE and NSCA have a minimal USE flag. I guess it’s time to reel back on that for me as well. And I now I have a patch to get openldap to gain a server USE flag, enabled by default – except, hopefully, on the desktop profile – to replace the old minimal flag. Incidentally looking into it I also found that said USE flag was actually clashing with the cxx one, for no good reason as far as I could tell. But Robin doesn’t even like the idea of going with a server USE flag for OpenLDAP!

On a different note, let’s take hwids — I originally created the package to reduce the amount of code our units’ firmware required, but while at it I ended up with a problematic file on my hands, as I wrote the oui.txt file downloaded from IEEE has been redistributed for a number of years, but when I contacted them to make sure I could redistribute it, they told me that it wasn’t possible. Unfortunately the new versions of systemd/udev use that file to generate some hardware database — finally implementing my suggestion from four years ago better late than never!

Well, I ended up having to take some flak, and some risk, and now the new hwids package fetches that file (as well as the iab.txt file) and also fully implements re-building the hardware database, so that we can keep it up to date from Portage, without having to get people to re-build their udev package over and over.

So, excuse me if I’m quite hard to work with sometimes, but the amount of crap I have to take when doing my best to make Gentoo better, for users and developers, is so high that sometimes I’d just like to say “screw it” and leave it to someone else to fix the mess. But I’m not doing that — if you don’t see me around much in the next few days, it’s because I’m leaving LA on Wednesday, and I can’t post on the blog while flying to New York (because the gogonet IP addresses are in virtually every possible blacklist, now and in the future —- so no way I can post to the blog, unless I figure out a way to set up a VPN and route traffic to my blog to said VPN …).

And believe it or not, but I do have other concerns in my life beside Gentoo.

Hardware identification, version bumps, Excelsior

Let’s start with the good news: most of Excelsior has arrived and it’s already set up. The only thing that is missing is … the CPUs, which are coming in from Philadelphia, and should arrive here tomorrow, standing to Amazon’s tracking. As I said, the server will be co-located by my current employer, so that’s one issue not to worry about.

Without Yamato, it turns out that my ability to bump the version of my own packages is vastly reduced, mostly because I don’t want to install packages such as MongoDB on this laptop just to test out Ruby Gems, and at the same time I don’t want to have too many extraneous packages installed. Luckily this means that starting tomorrow we should be all ready to start the install phase.

One of the things I’ve been keeping busy with was the split hardware IDs package — sys-apps/hwids, which I’m bumping weekly. This from one side makes it much less important to use the (now gone) network cron scripts to update the IDs files, and on the other allows people who don’t want their systems to access the network directly to be kept up-to-date with the files themselves. This is the first week I’m skipping over the bump, simply because … there is no new content!

I’ve added a new device today to the USB IDs database though so that should mean that next week we might have an update. And tomorrow I’ll probably update it with the possibly missing subsystem IDs for the devices on Excelsior, which will go to the PCI IDs database where I already sent my laptop’s and one of the local server’s subsystems.

Speaking about device identification I can understand why Kay thinks that it might be better to have a general database of everything, instead of multiple small databases… for instance it would be nice if I could just update one database with the IDs of my new external HDD (WD My Passport), and let smartctl know that it has to connect to it with the SAT method, instead of having to write it on a page and then remember about it myself. Speaking about which, WD still is my favourite HDD vendor.

Anyway, thanks once more to all the people who helped the new Excelsior to be built; tomorrow I’ll post a few more details about it, including some photos hopefully, as I’ve got my camera with me as well. There has actually been some trouble with the SSDs and the mounting bays, which I think would be a valuable lesson not only for me.

Who said that IDs wouldn’t have license issues?

When I posted about the hwids data I was not expecting to find what I just found today, and honestly, I’m wondering if I’m always looking for stuff under rocks.

My reason to split out the ID databases off their respective utilities (pciutils and usbutils) was simple enough: I didn’t need to deal with the two utilities, both of which are GPL-2, when the database themselves are available under the BSD 3-clauses license; it was just a matter of removing code, and avoiding auditing of projects that we don’t need to rely upon.

The fact that it was still a pet peeve of mine to not have an extra package taking care of it, rather than bundling them, was just an added bonus.

So after creating a silly placeholder which is fine for our needs here, with the blessing of Greg I created a new package, sys-apps/hwids (I wanted to call it hwdata, but we have both gentoo-hwdata and redhat-hwdata that install very different stuff), which has its own repository and with a live ebuild that simply fetches the files out of the respective website. I’m going to update the package weekly, if there are changes, so that we always have a more up-to-date version of it, and we won’t be needing the network cron scripts at all either.

I’ve also updated lshw to support the split package, so that it doesn’t install its own ids files anymore… of course that is only half the work done there, since the lshw package has two more datafiles: oui.txt and manuf.txt. The latter comes out of Wireshark, while the former is downloaded straight from IEEE’s Public OUI registry and that’s where the trouble starts.

The problem is that while you’re free to download the oui.txt file, you won’t find any kind of license on the file itself. I’ve sent a request to IEEE for some clarification on the matter and their answer is a clear “you cannot redistribute that file” (even though Ulrich, while not a lawyer, pointed out that it’s factual information which might not be copyrightable at all — see Threshold of originality for further details.

So why would I care about that single file given that lshw is a minor package by itself, and shouldn’t matter that much? Well, the answer is actually easy to give: bluez also contains a copy of it. And we’re redistributing that for sure, at least in source form. Sabayon is actually distributing binaries of it.

Interestingly enough, neither Debian’s lshw package nor their Bluez one do install the oui.txt file and I wouldn’t be surprised if their source archives have been neutered made Free by removing the distributed copy of the file.

What should we do about this? Unfortunately, that’s one question I don’t have an answer for myself yet, but I guess it should be clear enough that you can’t always assume that what upstream declares to be the case… actually is the case, especially for what concerns licensing. And this is the reason why, even though we don’t have any problem with releasing the source of all the GPL’d packages we have, we’d like to reduce as much as possible the amount of licenses I have to go through.