Boot-to-Kodi, 2019 edition

This weekend I’m oncall for work, so between me and my girlfriend we decided to take a few chores off our to-do lists. One of the things for me was to run the now episodic maintenance over the software and firmware of the devices we own at home. I call it episodic, because I no longer spend every evening looking after servers, whether at home or remote, but rather look at them when I need to.

In this case, I honestly forgot when it was the last time that I ran updates on the HTPC I use for Kodi and for the UniFi controller software. And that meant that after the full update I reached the now not uncommon situation that Kodi refused to start at boot. Or even when SSH’ing into the machine and starting the service by hand.

The error message, for ease of Googling, is:

[  2092.606] (EE) 
Fatal server error:
[  2092.606] (EE) xf86OpenConsole: Cannot open virtual console 7 (Permission denied)

What happens in this case is that the method I have been using to boot-to-Kodi was to use a systemd unit lifted from Arch Linux, that started a new session, X11, and Kodi all at once. This has stopped working now, because Xorg no longer can access the TTY, because systemd does not think it should access the console.

There supposedly are ways to convince systemd that it should let the user run X11 without so much fluff, but after an hour trying a number of different combinations I was not getting anywhere. I finally found one way to do it, and that’s what I’m documenting here: use lightdm.

I have found a number of different blog posts out there that try to describe how to do this, but none of them appear to apply directly to Gentoo.

These are the packages that would be merged, in order: 
 
Calculating dependencies... done! 
[ebuild   R    ] x11-misc/lightdm-1.26.0-r1::gentoo  USE="introspection -audit -gnome -gtk -qt5 -vala" 0 KiB

You don’t need Gtk, Qt or GNOME support for lightdm to work. But if you install it this way (which I’m surprised is allowed, even by Gentoo) it will fail to start! To configure what you need, you would have to manually write this to /etc/lightdm/lightdm.conf:

[Seat:*] 
autologin-user=xbmc 
user-session=kodi 
session-wrapper=/etc/lightdm/Xsession

In this case, my user is called xbmc (this HTPC was set up well before the rename), and this effectively turns lightdm into a bridge from systemd to Kodi. The kodi session is installed by the media-tv/kodi package, so there’s no other configuration needed. It just… worked.

I know that some people would find the ability to do this kind of customization via “simple” text files empowering. For me it’s just a huge waste of time, and I’m not sure why there isn’t just an obvious way for systemd and Kodi to get along. I would hope somebody builds one in the future, but for now I guess I’ll leave with that.

On professionalism (my first and last post explicitly about systemd)

I have been trying my best not to comment on systemd one way or another for a while. For the most part because I don’t want to have a trollfest on my blog, because moderating it is something I hate and I’m sure would be needed. On the other hand it seems like people start to bring me in the conversation now from time to time.

What I would like to point out at this point is that both extreme sides of the vision are, in my opinion, behaving childishly and being totally unprofessional. Whether it is name-calling of the people or the software, death threats, insults, satirical websites, labelling of 300 people for a handful of them, etc.

I don’t think I have been as happy to have a job that allows me not to care about open source as much as I did before as in the past few weeks as things keep escalating and escalating. You guys are the worst. And again I refer to both supporters and detractors, devs of systemd, devs of eudev, Debian devs and Gentoo devs, and so on so forth.

And the reason why I say this is because you both want to bring this to extremes that I think are totally uncalled for. I don’t see the world in black and white and I think I said that before. Gray is nuanced and interesting, and needs skills to navigate, so I understand it’s easier to just take a stand and never revise your opinion, but the easy way is not what I care about.

Myself, I decided to migrate my non-server systems to systemd a few months ago. It works fine. I’ve considered migrating my servers, and I decided for the moment to wait. The reason is technical for the most part: I don’t think I trust the stability promises for the moment and I don’t reboot servers that often anyway.

There are good things to the systemd design. And I’m sure that very few people will really miss sysvinit as is. Most people, especially in Gentoo, have not been using sysvinit properly, but rather through OpenRC, which shares more spirit with systemd than sysv, either by coincidence or because they are just the right approach to things (declarativeness to begin with).

At the same time, I don’t like Lennart’s approach on this to begin with, and I don’t think it’s uncalled for to criticize the product based on the person in this case, as the two are tightly coupled. I don’t like moderating people away from a discussion, because it just ends up making the discussion even more confrontational on the next forum you stumble across them — this is why I never blacklisted Ciaran and friends from my blog even after a group of them started pasting my face on pictures of nazi soldiers from WW2. Yes I agree that Gentoo has a good chunk of toxic supporters, I wish we got rid of them a long while ago.

At the same time, if somebody were to try to categorize me the same way as the people who decided to fork udev without even thinking of what they were doing, I would want to point out that I was reproaching them from day one for their absolutely insane (and inane) starting announcement and first few commits. And I have not been using it ever, since for the moment they seem to have made good on the promise of not making it impossible to run udev without systemd.

I don’t agree with the complete direction right now, and especially with the one-size-fit-all approach (on either side!) that tries to reduce the “software biodiversity”. At the same time there are a few designs that would be difficult for me to attack given that they were ideas of mine as well, at some point. Such as the runtime binary approach to hardware IDs (that Greg disagreed with at the time and then was implemented by systemd/udev), or the usage of tmpfs ACLs to allow users at the console to access devices — which was essentially my original proposal to get rid of pam_console (that played with owners instead, making it messy when having more than one user at console), when consolekit and its groups-fiddling was introduced (groups can be used for setgid, not a good idea).

So why am I posting this? Mostly to tell everybody out there that if you plan on using me for either side point to be brought home, you can forget about it. I’ll probably get pissed off enough to try to prove the exact opposite, and then back again.

Neither of you is perfectly right. You both make mistake. And you both are unprofessional. Try to grow up.

Edit: I mistyped eudev in the original article and it read euscan. Sorry Corentin, was thinking one thing and typing another.

Packaging for the Yubikey NEO, a frustrating adventure

I have already posted a howto on how to set up the YubiKey NEO and YubiKey NEO-n for U2F, and I promised I would write a bit more on the adventure to get the software packaged in Gentoo.

You have to realize at first that my relationship with Yubico has not always being straightforward. I have at least once decided against working on the Yubico set of libraries in Gentoo because I could not get a hold of a device as I wanted to use it. But luckily now I was able to place an order with them (for some two thousands euro) and I have my devices.

But Yubico’s code is usually quite well written, and designed to be packaged much more easily than most other device-specific middleware, so I cannot complain too much. Indeed, they split and release separately different libraries with different goals, so that you don’t need to wait for enough magnitude to be pulled for them to make a new release. They also actively maintain their code in GitHub, and then push proper make dist releases on their website. They are in many ways a packager’s dream company.

But let’s get back to the devices themselves. The NEO and NEO-n come with three different interfaces: OTP (old-style YubiKey, just much longer keys), CCID (Smartcard interface) and U2F. By default the devices are configured as OTP only, which I find a bit strange to be honest. It is also the case that at the moment you cannot enable both U2F and OTP modes, I assume because there is a conflict on how the “touch” interaction behaves, indeed there is a touch-based interaction on the CCID mode that gets entirely disabled once enabling either of U2F or OTP, but the two can’t share.

What is not obvious from the website is that to enable U2F (or CCID) modes, you need to use yubikey-neo-manager, an open-source app that can reconfigure the basics of the Yubico device. So I had to package the app for Gentoo of course, together with its dependencies, which turned out to be two libraries (okay actually three, but the third one sys-auth/ykpers was already packaged in Gentoo — and actually originally committed by me with Brant proxy-maintaining it, the world is small, sometimes). It was not too bad but there were a few things that might be worth noting down.

First of all, I had to deal with dev-libs/hidapi that allows programmatic access to raw HID USB devices: the ebuild failed for me, both because it was not depending on udev, and because it was unable to find the libusb headers — turned out to be caused by bashisms in the configure.ac file, which became obvious as I moved to dash. I have now fixed the ebuild and sent a pull request upstream.

This was the only real hard part at first, since the rest of the ebuilds, for app-crypt/libykneomgr and app-crypt/yubikey-neo-manager were mostly straightforward — only I had to figure out how to install a Python package as I never did so before. It’s actually fun how distutils will error out with a violation of install paths if easy_install tries to bring in a non-installed package such as nose, way before the Portage sandbox triggers.

The problems started when trying to use the programs, doubly so because I don’t keep a copy of the Gentoo tree on the laptop, so I wrote the ebuilds on the headless server and then tried to run them on the actual hardware. First of all, you need to have access to the devices to be able to set them up; the libu2f-host package will install udev rules to allow the plugdev group access to the hidraw devices — but it also needed a pull request to fix them. I also added an alternative version of the rules for systemd users that does not rely on the group but rather uses the ACL support (I was surprised, I essentially suggested the same approach to replace pam_console years ago!)

Unfortunately that only works once the device is already set in U2F mode, which does not work when you’re setting up the NEO for the first time, so I originally set it up using kdesu. I have since decided that the better way is to use the udev rules I posted in my howto post.

After this, I switched off OTP, and enabled U2F and CCID interfaces on the device — and I couldn’t make it stick, the manager would keep telling me that the CCID interface was disabled, even though the USB descriptor properly called it “Yubikey NEO U2F+CCID”. It took me a while to figure out that the problem was in the app-crypt/ccid driver, and indeed the change log for the latest version points out support for specifically the U2F+CCID device.

I have updated the ebuilds afterwards, not only to depend on the right version of the CCID driver – the README for libykneomgr does tell you to install pcsc-lite but not about the CCID driver you need – but also to check for the HIDRAW kernel driver, as otherwise you won’t be able to either configure or use the U2F device for non-Google domains.

Now there is one more part of the story that needs to be told, but in a different post: getting GnuPG to work with the OpenPGP applet on the NEO-n. It was not as straightforward as it could have been and it did lead to disappointment. I’ll be a good post for next week.

Setting up Yubikey NEO and U2F on Gentoo (and Linux in general)

When the Google Online Security blog announced earlier this week the general availability of Security Key, everybody at the office was thrilled, as we’ve been waiting for the day for a while. I’ve been using this for a while already, and my hope is for it to be easy enough for my mother and my sister, as well as my friends, to start using it.

While the promise is for a hassle-free second factor authenticator, it turns out it might not be as simple as originally intended, at least on Linux, at least right now.

Let’s start with the hardware, as there are four different options of hardware that you can choose from:

  • Yubico FIDO U2F which is a simple option only supporting the U2F protocol, no configuration needed;
  • Plug-up FIDO U2F which is a cheaper alternative for the same features — I have not witnessed whether it is as sturdy as the Yubico one, so I can’t vouch for it;
  • Yubikey NEO which provides multiple interface, including OTP (not usable together with U2F), OpenPGP and NFC;
  • Yubikey NEO-n the same as above, without NFC, and in a very tiny form factor designed to be left semi-permanently in a computer or laptop.

I got the NEO, but mostly to be used with LastPass ­– the NFC support allows you to have 2FA on the phone without having to type it back from a computer – and a NEO-n to leave installed on one of my computers. I already had a NEO from work to use as well. The NEO requires configuration, so I’ll get back at it in a moment.

The U2F devices are accessible via hidraw, a driverless access protocol for USB devices, originally intended for devices such as keyboards and mice but also leveraged by UPSes. What happen though is that you need access to the device, that the Linux kernel will make by default accessible only by root, for good reasons.

To make the device accessible to you, the user actually at the keyboard of the computer, you have to use udev rules, and those are, as always, not straightforward. My personal hacky choice is to make all the Yubico devices accessible — the main reason being that I don’t know all of the compatible USB Product IDs, as some of them are not really available to buy but come from instance from developer mode devices that I may or may not end up using.

If you’re using systemd with device ACLs (in Gentoo, that would be sys-apps/systemd with acl USE flag enabled), you can do it with a file as follows:

# /etc/udev/rules.d/90-u2f-securitykey.rules
ATTRS{idVendor}=="1050", TAG+="uaccess"
ATTRS{idVendor}=="2581", ATTRS{idProduct}=="f1d0", TAG+="uaccess"

If you’re not using systemd or ACLs, you can use the plugdev group and instead do it this way:

# /etc/udev/rules.d/90-u2f-securitykey.rules
ATTRS{idVendor}=="1050", GROUP="plugdev", MODE="0660"
ATTRS{idVendor}=="2581", ATTRS{idProduct}=="f1d0", GROUP="plugdev", MODE="0660"

-These rules do not include support for the Plug-up because I have no idea what their VID/PID pairs are, I asked Janne who got one so I can amend this later.- Edit: added the rules for the Plug-up device. Cute their use of f1d0 as device id.

Also note that there are properly less hacky solutions to get the ownership of the devices right, but I’ll leave it to the systemd devs to figure out how to include in the default ruleset.

These rules will not only allow your user to access /dev/hidraw0 but also to the /dev/bus/usb/* devices. This is intentional: Chrome (and Chromium, the open-source version works as well) use the U2F devices in two different modes: one is through a built-in extension that works with Google assets, and it accesses the low-level device as /dev/bus/usb/*, the other is through a Chrome extension which uses /dev/hidraw* and is meant to be used by all websites. The latter is the actually standardized specification and how you’re supposed to use it right now. I don’t know if the former workflow is going to be deprecated at some point, but I wouldn’t be surprised.

For those like me who bought the NEO devices, you’ll have to enable the U2F mode — while Yubico provides the linked step-by-step guide, it was not really completely correct for me on Gentoo, but it should be less complicated now: I packaged the app-crypt/yubikey-neo-manager app, which already brings in all the necessary software, including the latest version of app-crypt/ccid required to use the CCID interface on U2F-enabled NEOs. And if you already created the udev rules file as I noted above, it’ll work without you using root privileges. Just remember that if you are interested in the OpenPGP support you’ll need the pcscd service (it should auto-start with both OpenRC and systemd anyway).

I’ll recount separately the issues with packaging the software. In the mean time make sure you keep your accounts safe, and let’s all hope that more sites will start protecting your accounts with U2F — I’ll also write a separate opinion piece on why U2F is important and why it is better than OTP, this is just meant as documentation, howto set up the U2F devices on your Linux systems.

A new XBMC box

A couple of months ago I was at LinuxTag in Berlin with the friends from VIdeoLAN and we shared a booth with the XBMC project. It was interesting to see the newest version of XBMC running, and I decided that it was time for me to get a new XBMC box — last time I used XBMC was on my AppleTV and while it was not strictly disappointing it was not terrific either after a while.

At any rate, we spoke about what options are available nowadays to make a good XBMC set up, and while the RaspberryPi is all the rage nowadays, my previous experience with the platform made it a no-go. It also requires you to find a place where to store your data (the USB support on the Pi is not good for many things) and you most likely will have to re-encode animes to the Right Format™ so that the RPi VideoCore can properly decode them: anything that can’t be hardware-accelerated will not play on such a limited hardware.

The alternative has been the Intel NUC (Next Unit of Computing), which Intel sells in pre-configured “barebone” kits, some of which include wifi antennas, 2.5” disk bays, and a CIR (Consumer Infrared Receiver) that allows you to use a remote such as the one for the XBox 360 to control the unit. I decided to look into the options and I settled on the D54250WYKH which has a Core i5 CPU, space for both a wireless card (I got the Intel 7260 802.11ac which is dual-radio and supports the new 11ac protocol, even though my router is not 11ac yet), and a mSATA SSD (I got a Transcend 128GB one), as well the 2.5” bay that allows me to use a good old spinning-rust harddrive to store the bulk of the data.

Be careful and don’t repeat my mistake! I originally ordered a very cool Western Digital Caviar Green 2TB HDD but while it is a 2.5” HDD, it does not fit properly in the provided cradle; the same problem used to happen with the first series of 1TB HDDs on PlayStation 3s. I decided to keep the HDD and bring it with me to Ireland, as I don’t otherwise have a 2TB HDD, instead I opted for a HGST 1.5TB HDD (no link for this one as I bought it at Fry’s the same day I picked up the rest, if nothing else because I had no will to wait, and also because I forgot I needed a keyboard).

While I could have just put OpenELEC on the device, I decided instead to install my trusted Gentoo — a Core i5 with 16GB of RAM and a good SSD is well in its ability to run it. And since I was finally setting something up that needs (for myself) to turn on very quickly, I decided to give systemd a go (especially as Robbins is now considered a co-maintainer for OpenRC which drains all my will to keep using it). The effect has been stunning, but there are a few issues that needs to be ironed out; for instance, as far as I can tell, there is no unit for rngd which means that both my laptop (now converted to systemd) and the device have no entropy, even though they both have the rdrand instruction; I’ll try to fix this lack myself.

Another huge problem for me has been getting the audio to work; while I’ve been told by the XBMC people that the NUC are perfectly well supported, I couldn’t for the sake of me get the audio to work for days. At the end it was Alexander Patrakov who pointed out to intel_iommu=on,igfx_off as a kernel option to get it to work (kernel bug #67321 still unfixed). So if you have no HDMI output on your NUC, that’s what you have to do!

Speaking about XBMC and Gentoo, the latest version as of last week (which was not the latest upstream version, as a new one got released exactly while I was installing the box), seem to force you to install FFmpeg over libav – I honestly felt a bit sorry for the developers of XBMC at LinuxTag while they were trying to tell me how the multi-threaded h264 decoder from FFmpeg is great… Anton, who wrote it, is a libav developer! – but even after you do that, it seems like it does not link it in, preferring a bundled copy of it instead. Which also doesn’t seem to build support for multithread (uh?). This is something that I’ll have to look into once I’m back in Dublin.

Other than that, there isn’t much to say; the one remaining big issue is to figure out how to properly have XBMC start up at boot without nasty autologin hacks on systemd. And of course finding a better way than using a transmission user to start the Transmission daemon, or at least find a better way to share the downloads with XBMC itself. Probably separating the XBMC and Transmission users is a good idea.

Expect more posts on what’s going on with my XBMC box in the future, and take this one as a reference about the NUC audio issue.

Predictable persistently (non-)mnemonic names

This is part two of a series of articles looking into the new udev “predictable” names. Part one is here and talks about the path-based names.

As Steve also asked on the comments from last post, isn’t it possible to just use the MAC address of an interface to point at it? Sure it’s possible! You just need to enable the mac-based name generator. But what does that mean? It means that your new interface names will be enx0026b9d7bf1f and wlx0023148f1cc8 — do you see yourself typing them?

Myself, I’m not going to type them. My favourite suggestion to solve the issue is to rely on rules similar to the previous persistent naming, but not re-using the eth prefix to avoid collisions (which will no longer be resolved by future versions of udev). I instead use the names wan0 and lan0 (and so on), when the two interfaces sit straddling between a private and a public network. How do I achieve that? Simple:

SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="00:17:31:c6:4a:ca", NAME="lan0"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="00:07:e9:12:07:36", NAME="wan0"

Yes these simple rules are doing all the work you need if you just want to make sure not to mix the two interfaces by mistake. If your server or vserver only has one interface, and you want to have it as wan0 no matter what its mac address is (easier to clone, for instance), then you can go for

SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="*", NAME="wan0"

As long as you only have a single network interface, that will work just fine. For those who use Puppet, I also published a module that you can use to create the file, and ensure that the other methods to achieve “sticky” names are not present.

My reasoning to actually using this kind of names is relatively simple: the rare places where I do need to specify the interface name are usually in ACLs, the firewall, and so on. In these, the most important part to me is knowing whether the interface is public or not, so the wan/lan distinction is the most useful. I don’t intend trying to remember whether enp5s24k1f345totheright4nextothebaker is the public or private interface.

Speaking about which, one of the things that appears obvious even from Lennart’s comment to the previous post, is that there is no real assurance that the names are set in stone — he says that an udev upgrade won’t change them, but I guess most people would be sceptic, remembering the track record that udev and systemd has had over the past few months alone. In this situation my personal, informed opinion is that all this work on “predictable” names is a huge waste of time for almost everybody.

If you do care about stable interface names, you most definitely expect them to be more meaningful than 10-digits strings of paths or mac addresses, so you almost certainly want to go through with custom naming, so that at least you attach some sense into the names themselves.

On the other hand, if you do not care about interface names themselves, for instance because instead of running commands or scripts, you just use NetworkManager… well what the heck are you doing playing around with paths? If it doesn’t bother you that the interface for an USB device changes considerably between one port and another, how can it matter to you whether it’s called wwan0 or wwan123? And if the name of the interface does not matter to you, why are you spending useless time trying to get these “predictable” names working?

All in all, I think this is just an useless nice trick, that will only cause more headaches than it can possibly solve. Bahumbug!

Predictably non-persistent names

This is going to be fun. The Gentoo “udev team”, in the person of Samuli – who seems to suffer from 0-day bump syndrome – decided to now enable by default the new predictable names feature that is supposed to make things so much nicer in Linux land where, especially for people coming from FreeBSD, things have been pretty much messed up. This replaces the old “persistent” names, that were often enough too fragile to work, as they did in-place renaming of interfaces, and would cause way too often conflicts at boot time, since swapping two devices’ names is not an atomic operation for obvious reasons.

So what’s this predictable name all around? Well, it’s mostly a merge of the previous persistent naming system, and the BIOS label naming project which was developed by RedHat for a few years already so that the names of interfaces for server hardware in the operating system match the documentation of said server, so that you can be sure that if you’re connecting the port marked with “1” on the chassis, out of four on the motherboard, it will bring up eth2.

But why were those two technologies needed? Let’s start first with explaining how (more or less) the kernel naming scheme works: unlike the BSD systems, where the interfaces are named after the kernel driver (en0, dc0, etc.), the Linux kernel uses generic names, mostly eth, wlan and wwan, and maybe a couple more, for tunnels and so on. This causes the first problem: if you have multiple devices of the same class (ethernet, wlan, wwan) coming from different drivers, the order of the interface may very well vary between reboots, either because of changes in the kernel, if the drivers are built-in, or simply because of locking and execution of modules load (which is much more common for binary distributions).

The reason why changes in the kernel can change the order is that the order in which drivers are initialized has changed before and might change again in the future. A driver could also decide to change the order with which its devices are initialized (PCI tree scanning order, PCI ID order, MAC address order, …) and so on, causing it to change the order of interfaces even for the same driver. More about this later.

But here’s my first doubt arises: how common is for people to have more than one interface of the same class from vendors different enough to use different drivers? Well it depends on the class of device; on a laptop you’d have to search hard for a model with more than one Ethernet or wireless interface, unless you add an ExpressCard or PCMCIA expansion card (and even those are not that common). On a desktop, I’ve seen a few very recent motherboards with more than one network port, and I have yet to see one with different chips for the two. Servers, that’s a different story.

Indeed, it’s not that uncommon to have multiple on-board and expansion card ports on a server. For instance you could use the two onboard ports as public and private interfaces for the host… and then add a 4-port card to split between virtual machines. In this situation, having a persistent naming of the interfaces is indeed something you would be glad of. How can you tell which one of eth{0..5} is your onboard port #2, otherwise? This would be problem number two.

Another situation in which having a persistent naming of interfaces is almost a requirement is if you’re setting up a router: you definitely don’t want to switch the LAN and WAN interface names around, especially where the firewall is involved.

This background is why the persistent-net rules were devised quite a few years ago for udev. Unfortunately almost everybody got at least one nasty experience with them. Sometimes the in-place rename would fail, and you’d end up with the temporary names at the end of boot. In a few cases the name was not persistent at all: if the kernel driver for the device would change, or change name at least, the rules wouldn’t match and your eth0 would become eth1 (this was the case when Intel split the e1000 and e1000e drivers, but it’s definitely more common with wireless drivers, especially if they move from staging to main).

So the old persistent net rules were flawed. What about the new predictable rules? Well, not only they combined the BIOS naming scheme (which is actually awesome when it works — SuperMicro servers such as Excelsior do not expose the label; my Dell laptop only exposes a label for the Ethernet port but doesn’t for either the wireless adapter or the 3G one), but it has two “fallbacks” that are supposed to be used when the labels fail, one based on the MAC address of the interface, and the other based on the “path” — which for most PCI, PCI-E, onboard, ExpressCard ports is basically the PCI address; for USB… we’ll see in a moment.

So let’s see, from my laptop:

# lspci | grep 'Network controller'
03:00.0 Network controller: Intel Corporation Centrino Advanced-N 6200 (rev 35)
# ifconfig | grep wlp3
wlp3s0: flags=4163  mtu 1500

Why “wlp3s0”? It’s the Wireless adapter (wl) PCI (p) card at bus 3, slot 0 (s0): 03:00.0. Matches lspci properly. But let’s see the WWAN interface on the same laptop:

# ifconfig -a | grep ww
wwp0s29u1u6i6: flags=4098  mtu 1500

Much longer name! What’s going on then? Let’s see, it’s reporting it’s card at bus 0, slot 29 (0x1d) — lspci will use hexadecimal numbers for the addresses:

# lspci | grep '00:1d'
00:1d.0 USB controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 05)

Okay so it’s an USB device, even though the physical form factor is a mini-PCIE card. It’s common. Does it match lsusb?

# lsusb | grep Broadband
Bus 002 Device 004: ID 413c:8184 Dell Computer Corp. F3607gw v2 Mobile Broadband Module

Not the Bus/Device specification there, which is good: the device number will increase every time you pop something in/out of the port, so it’s not persistent across reboots at all. What it uses is the path to the device standing by USB ports, which is a tad more complex, but basically means it matches /sys/bus/usb/devices/2-1.6:1.6/ (I don’t pretend to know how the thing works exactly, but it describe to which physical port the device is connected).

In my laptop’s case, the situation is actually quite nice: I cannot move either the WLAN or WWAN device on a different slot so the name assigned by the slot is persistent as well as predictable. But what if you’re on a desktop with an add-on WLAN card? What happens if you decide to change your video card, with a more powerful one that occupies the space of two slots, one of which happen to be the place where you WLAN card is? You move it, reboot and .. you just changed the interface name! If you’ve been using Network Manager, you’ll just have to reconfigure the network I suppose.

Let’s take a different example. My laptop, with its integrated WWAN card, is a rare example; most people I know use USB “keys”, as the providers give them away for free, at least in Italy. I happen to have one as well, so let me try to plug it in one of the ports of my laptop:

# lsusb | grep modem
Bus 002 Device 014: ID 12d1:1436 Huawei Technologies Co., Ltd. E173 3G Modem (modem-mode)
# ifconfig -a | grep ww
wwp0s29u1u2i1: flags=4098  mtu 1500
wwp0s29u1u6i6: flags=4098  mtu 1500

Okay great this is a different USB device, connected to the same USB controller as the onboard one, but at different ports, neat. Now, what if I had all my usual ports busy, and I decided to connect it to the USB3 add-on ExpressCard I got on the laptop?

# lsusb | grep modem
Bus 003 Device 004: ID 12d1:1436 Huawei Technologies Co., Ltd. E173 3G Modem (modem-mode)
# ifconfig -a | grep ww
wwp0s29u1u6i6: flags=4098  mtu 1500
wws1u1i1: flags=4098  mtu 1500

What’s this? Well, the USB3 controller provides slot information, so udev magically uses that to rename the interface, so it avoids using the otherwise longer wwp6s0u1i1 name (the USB3 controller is on the PCI bus 6).

Let’s go back to the on-board ports:

# lsusb | grep modem
Bus 002 Device 016: ID 12d1:1436 Huawei Technologies Co., Ltd. E173 3G Modem (modem-mode)
# ifconfig -a | grep ww
wwp0s29u1u3i1: flags=4098  mtu 1500
wwp0s29u1u6i6: flags=4098  mtu 1500

Seems the same, but it’s not. Now it’s u3 not u2. Why? I used a different port on the laptop. And the interface name changed. Yes, any port change will produce a different interface name, predictably. But what happens if the kernel decides to change the way the ports are enumerated? What happens if the USB 2 driver is buggy and is supposed to provide slot information, and they fix it? You got it, even in these cases, the interface names are changed.

I’m not saying that the kernel naming scheme is perfect. But if you’re expected to always only have an Ethernet port, a WLAN card and a WWAN USB stick, with it you’ll be sure to have eth0, wlan0 and wwan0, as long as the drivers are not completely broken as they are now (like if the WLAN is appearing as eth1), and as long as you don’t muck with the interface names in userspace.

Next up, I’ll talk about the MAC addresses based naming and my personal preference when setting up servers and routers. Have fun in the mean time figuring out what your interface names will be.

The unsolved problem of the init scripts

One of probably the biggest problems with maintaining software in Gentoo where a daemon is involved, is dealing with init scripts. And it’s not really that much of a problem with just Gentoo, as almost every distribution or operating system has its own to handle init scripts. I guess this is one of the nice ideas behind systemd: having a single standard for daemons to start, stop and reload is definitely a positive goal.

Even if I’m not sure myself whether I want the whole init system to be collapsed into a single one for every single operating system out there, there at least is a chance that upstream developers will provide a standard command-line for daemons so that init scripts no longer have to write a hundred lines of pre-start setup code commands. Unfortunately I don’t have much faith that this is going to change any time soon.

Anyway, leaving the daemons themselves alone, as that’s a topic for a post of its own and I don care about writing it now. What remains is the init script itself. Now, while it seems quite a few people didn’t know about this before, OpenRC has been supporting since almost ever a more declarative approach to init scripts by setting just a few variables, such as command, pidfile and similar, so that the script works, as long as the daemon follows the most generic approach. A whole documentation for this kind of scripts is present in the runscript man page and I won’t bore you with the details of it here.

Beside the declaration of what to start, there are a few more issues that are now mostly handled to different degrees depending on the init script, rather than in a more comprehensive and seamless fashion. Unfortunately, I’m afraid that this is likely going to stay the same way for a long time, as I’m sure that some of my fellow developers won’t care to implement the trickiest parts that can implemented, but at least i can try to give a few ideas of what I found out while spending time on said init scripts.

So the number one issue is of course the need to create the directories the daemon will use beforehand, if they are to be stored on temporary filesystems. What happens is that one of the first changes that came with the whole systemd movements was to create /run and use that to store pidfiles, locks and other runtime stateless files, mounting it as tmpfs at runtime. This was something I was very interested in to begin with because I was doing something similar before, on the router with a CF card (through an EIDE adapter) as harddisk, to avoid writing to it at runtime. Unfortunately, more than an year later, we still have lots of ebuilds out there that expects /var/run paths to be maintained from the merge to the start of the daemon. At least now there’s enough consensus about it that I can easily open bugs for them instead of just ignore them.

For daemons that need /var/run it’s relatively easy to deal with the missing path; while a few scripts do use mkdir, chown and chmod to handle the creation of the missing directories , there is a real neat helper to take care of it, checkpath — which is also documented in the aforementioned man page for runscript. But there has been many other places where the two directories are used, which are not initiated by an init script at all. One of these happens to be my dear Munin’s cron script used by the Master — what to do then?

This has actually been among the biggest issues regarding the transition. It was the original reason why screen was changed to save its sockets in the users’ home instead of the previous /var/run/screen path — with relatively bad results all over, including me deciding to just move to tmux. In Munin, I decided to solve the issue by installing a script in /etc/local.d so that on start the /var/run/munin directory would be created … but this is far from a decent standard way to handle things. Luckily, there actually is a way to solve this that has been standardised, to some extents — it’s called tmpfiles.d and was also introduced by systemd. While OpenRC implements the same basics, because of the differences in the two init systems, not all of the features are implemented, in particular the automatic cleanup of the files on a running system —- on the other hand, that feature is not fundamental for the needs of either Munin or screen.

There is an issue with the way these files should be installed, though. For most packages, the correct path to install to would be /usr/lib/tmpfiles.d, but the problem with this is that on a multilib system you’d end up easily with having both /usr/lib and /usr/lib64 as directories, causing Portage’s symlink protection to kick in. I’d like to have a good solution to this, but honestly, right now I don’t.

So we have the tools at our disposal, what remains to be done then? Well, there’s still one issue: which path should we use? Should we keep /var/run to be compatible, or should we just decide that /run is a good idea and run with it? My guts say the latter at this point, but it means that we have to migrate quite a few things over time. I actually started now on porting my packages to use /run directly, starting from pcsc-lite (since I had to bump it to 1.8.8 yesterday anyway) — Munin will come with support for tmpfiles.d in 2.0.11 (unfortunately, it’s unlikely I’ll be able to add support for it upstream in that release, but in Gentoo it’ll be). Some more of my daemons will be updated as I bump them, as I already spent quite a lot of time on those init scripts to hone them down on some more issues that I’ll delineate in a moment.

For some, but not all!, of the daemons it’s actually possible to decide the pidfile location on the command line — for those, the solution to handle the move to the new path is dead easy, as you just make sure to pass something equivalent to -p ${pidfile} in the script, and then change the pidfile variable, and done. Unfortunately that’s not always an option, as the pidfile can be either hardcoded into the compiled program, or read from a configuration file (the latter is the case for Munin). In the first case, no big deal: you change the configuration of the package, or worse case you patch the software, and make it use the new path, update the init script and you’re done… in the latter case though, we have trouble at hand.

If the location of the pidfile is to be found in a configuration file, even if you change the configuration file that gets installed, you can’t count on the user actually updating the configuration file, which means your init script might get out of sync with the configuration file easily. Of course there’s a way to work around this, and that is to actually get the pidfile path from the configuration file itself, which is what I do in the munin-node script. To do so, you need to see what the syntax of the configuration file is. In the case of Munin, the file is just a set of key-value pairs separated by whitespace, which means a simple awk call can give you the data you need. In some other cases, the configuration file syntax is so messed up, that getting the data out of it is impossible without writing a full-blown parser (which is not worth it). In that case you have to rely on the user to actually tell you where the pidfile is stored, and that’s quite unreliable, but okay.

There is of course one thing now that needs to be said: what happens when the pidfile changes in the configuration between one start and the stop? If you’re reading the pidfile out of a configuration file it is possible that the user, or the ebuild, changed it in between causing quite big headaches trying to restart the service. Unfortunately my users experienced this when I changed Munin’s default from /var/run/munin/munin-node.pid to /var/run/munin-node.pid — the change was possible because the node itself runs as root, and then drops privileges when running the plugins, so there is no reason to wait for the subdirectory, and since most nodes will not have the master running, /var/run/munin wouldn’t be useful there at all. As I said, though, it would cause the started node to use a pidfile path, and the init script another, failing to stop the service before starting it new.

Luckily, William corrected it, although it’s still not out — the next OpenRC release will save some of the variables used at start time, allowing for this kind of problems to be nipped in the bud without having to add tons of workarounds in the init scripts. It will require some changes in the functions for graceful reloading, but that’s in retrospective a minor detail.

There are a few more niceties that you could do with init scripts in Gentoo to make them more fool proof and more reliable, but I suppose this would cover the main points that we’re hitting nowadays. I suppose for me it’s just going to be time to list and review all the init scripts I maintain, which are quite a few.

I’m doing it for you

Okay this is not going to be a very fun post to read, and the title can already make you think that I’m being an arrogant bastard this time around, but I got a feeling that lately people are missing the point that even being grumpy, I’m not usually grumpy just because, I’m usually grumpy because I’m trying to get things to improve rather than stagnate or get worse.

So let’s take an example right now. Thomáš postd about some of the changes that are to be expected on LibreOffice 4 — one of these is that the LDAP client libraries are no longer an optional dependency but have to be present. I wasn’t happy about that.

I actually stumbled across that just the other day when installing the new laptop: while installing KDE component with the default USE flags, OpenLDAP would have been installed. The reason is obviously that the ldap USE flag is enabled by default, which makes sense, as it’s (unfortunately) the most common “shared address book” database available. But why should I get an LDAP server if I selected explicitly a desktop profile?

So the first task at hand, was to make sure that the minimal USE flag was present on the package (it was), and if it did what was intended, i.e., not install the LDAP server — and that is the case indeed. Good, so we can install only the client libraries. Unfortunately the default dependencies were slightly wrong, with said USE flag, as some things like libtool (for libltdl) are only really used by the server components. This was easy to fix, together with a couple more fixes.

But as I proposed on the mailing list to change the defaults, for the desktop profile, to have the minimal USE flag enabled, hell broke loose — now the good point about it is that the minimal USE flag is definitely being over-used — and I’m afraid I’m at fault there as well, since both NRPE and NSCA have a minimal USE flag. I guess it’s time to reel back on that for me as well. And I now I have a patch to get openldap to gain a server USE flag, enabled by default – except, hopefully, on the desktop profile – to replace the old minimal flag. Incidentally looking into it I also found that said USE flag was actually clashing with the cxx one, for no good reason as far as I could tell. But Robin doesn’t even like the idea of going with a server USE flag for OpenLDAP!

On a different note, let’s take hwids — I originally created the package to reduce the amount of code our units’ firmware required, but while at it I ended up with a problematic file on my hands, as I wrote the oui.txt file downloaded from IEEE has been redistributed for a number of years, but when I contacted them to make sure I could redistribute it, they told me that it wasn’t possible. Unfortunately the new versions of systemd/udev use that file to generate some hardware database — finally implementing my suggestion from four years ago better late than never!

Well, I ended up having to take some flak, and some risk, and now the new hwids package fetches that file (as well as the iab.txt file) and also fully implements re-building the hardware database, so that we can keep it up to date from Portage, without having to get people to re-build their udev package over and over.

So, excuse me if I’m quite hard to work with sometimes, but the amount of crap I have to take when doing my best to make Gentoo better, for users and developers, is so high that sometimes I’d just like to say “screw it” and leave it to someone else to fix the mess. But I’m not doing that — if you don’t see me around much in the next few days, it’s because I’m leaving LA on Wednesday, and I can’t post on the blog while flying to New York (because the gogonet IP addresses are in virtually every possible blacklist, now and in the future —- so no way I can post to the blog, unless I figure out a way to set up a VPN and route traffic to my blog to said VPN …).

And believe it or not, but I do have other concerns in my life beside Gentoo.

Using BusyBox with OpenRC

As I said in my previous post I’ve decided to use BusyBox to strip down a Gentoo system to a bare minimum for the device I’m working on right now. This actually allows for a lot of things to be merged into a single package, but at the same time it has another very interesting side effect.

One of the reasons why systemd has been developed is that the traditional init systems for Unix use shell scripts for starting and stopping the services, which is slow both due to the use of bash (which is known to be slow and bloated for the task) and the use of fork/exec model for new commands. While we’ve had some work done to get the init scripts reduced and working with alternative shells, namely dash, what I’m using now is a slightly different variation: everything is done through BusyBox.

This has more than just the advantage of its shell scripting being faster: when using a “fat” build of busybox, which includes most of the utilities used in Linux, including coreutils, which and so on so forth, the execution of commands such as cut, sed, and the like is no longer happening through fork/exec, but rather through internal calls, as BusyBox is a multi-call binary. And even if the call is so complex it has to fork for it (which doesn’t happen so often for what I can tell), the binary is already well in memory, relocated and all.

Up to now I only found one issue in OpenRC when booting this way, related to the sysctl init script, so I submitted a patch which is already in OpenRC now. I’m not yet done with removing the separate packages, so there might be more issues, and of course I’m only using a very small subset of init scripts, so I wouldn’t be able to tell whether it would work on a full desktop system.

But this would for sure be a worthy experiment for somebody at some point.