IPv6 in 2020 — Nope, still dreamland

It’s that time of the year: lots of my friends and acquaintances went to FOSDEM, which is great, and at least one complained about something not working over IPv6, which prompted me to share once again my rant over the newcomer-unfriendly default network of a a conference that is otherwise very friendly to new people. Which then prompted the knee-jerk reaction of people who expect systems to work in isolation, calling me a hater and insulting me. Not everybody, mind you — on Twitter I did have a valid and polite conversation with two people, and while it’s clear we disagree on this point, insults were not thrown. Less polite people got blocked because I have no time to argue with those who can’t see anyone else’s viewpoint.

So, why am I insisting that IPv6 is still not ready in 2020? Well, let’s see. A couple of years ago, I pointed out how nearly all of the websites that people would use, except for the big social networks, are missing IPv6. As far as I could tell, nothing has changed whatsoever for those websites in the intervening two years. Even the number of websites that are hosted by CDNs like Akamai (which does support IPv6!), or service providers like Heroku are not served over IPv6. So once again, if you’re a random home user, you don’t really care about IPv6, except maybe for Netflix.

Should the Internet providers be worried, what with IPv4 exhaustion getting worse and worse? I’d expect them to be, because as Thomas said on Twitter, the pain is only going to increase. But it clearly has not reached the point where any of the ISPs, except a few “niche” ones like Andrews & Arnold, provide their own website over IPv6 — the exception appears to be Free, who if I understood it correctly, is one of the biggest providers in France, and does publish AAAA records for their website. They are clearly in the minority right now.

Even mobile phone providers, who everyone and their dog appear to always use as the example of consumer IPv6-only networks, don’t seem to care — at least in Europe. It looks like AT&T and T-Mobile US do serve their websites over IPv6.

But the consumer side is not the only reason why I insist that in 2020, IPv6 is still fantasy. Hosting providers don’t seem to have understood IPv6 either. Let’s put aside for a moment that Automattic does not have an IPv6 network (not even outbound), and let’s look at one of the providers I’ve been using for the past few years: Scaleway. Scaleway (owned by Iliad, same group as Online.net) charges you extra for IPv4. It does, though, provide you with free IPv6. It does not, as far as I understand, provide you with multiple IPv6 per server, though, which is annoying but workable.

But here’s a quote from a maintenance email they sent a few weeks ago:

During this maintenance, your server will be powered off, then powered on on another physical server. This operation will cause a downtime of a few minutes to an hour, depending on the size of your local storage. The public IPv4 will not change at migration, but the private IPv4 and the IPv6 will be modified due to technical limitations.

Scaleway email, 2020-01-28. Emphasis theirs.

So not only the only stable address the servers could keep is the IPv4 (which, as I said, is a paid extra), but they cannot even tell you beforehand which IPv6 address your server will get. Indeed, I decided at that point that the right thing to do was to just stop publishing AAAA records for my websites, as clearly I can’t rely on Scaleway to persist them over time. A shame, I would say, but that’s my problem: nobody is taking IPv6 seriously right now but a few network geeks.

But network geeks also appear to like UniFi. And honestly I do, too. It worked fairly well for me, most of the time (except for the woes of updating Mongodb), and it does mostly support IPv6. I have a full IPv6 setup at home with UniFi and Hyperoptic. But at the same time, the dashboard is only focused on IPv4, everywhere. A few weeks ago it looked like my IPv6 network had a sad (I only noticed because I was trying to reach one of my local machines with its AAAA hostname), and I had no way to confirm it was the case: I eventually just rebooted the gateway, and then it worked fine (and since I have a public IPv4, Hyperoptic gives me a stable IPv6 prefix, so I didn’t have to worry about that), but even then I couldn’t figure out if the gateway got any IPv6 network connection from its UIs.

I’m told OpenWRT got better about this. You’re no longer required to reverse engineer the source to figure out how to configure a relay. But at the same time, I’m fairly sure they are again niche products. Virgin Media Ireland’s default router supported IPv6 — to a point. But I have yet to see any Italian ISP providing even the most basic of DS-Lite by default.

Again, I’m not hating on the protocol, or denying the need to move onto the new network in short term. But I am saying that network folks need to start looking outside of their bubble, and try to find the reasons for why nothing appears to be moving, year after year. You can’t blame it on the users not caring: they don’t want to geek out on which version of the Internet Protocol they are using, they want to have a working connection. And you can’t really expect them to understand the limits of CGNs — 64k connections might sound ludicrously few to a network person, but for your average user it sounds too much: they only are looking at one website at a time! (Try explaining to someone who has no idea how HTTP works that you get possibly thousands of connections per tab.)

Fantasyland: in the world of IPv6 only networks

It seems to be the time of the year when geeks think that IPv6 is perfect, ready to be used, and the best thing after sliced bread (or canned energy drinks). Over on Twitter, someone pointed out to me that FontAwesome (which is used by the Hugo theme I’m using) is not accessible over an IPv6-only network, and as such the design of the site is broken. I’ll leave aside my comments on FontAwesome because they are not relevant to the rant at hand.

You may remember I called IPv6-only networks unrealistic two years ago, and I called IPv6 itself a geeks’ wet dream last year. You should then not be surprised to find me calling this Fantasyland an year later.

First of all, I want to make perfectly clear that I’m not advocating that IPv6 deployment should stop or slow down. I really wish it would be actually faster, for purely selfish reasons I’ll get to later. Unfortunately I had to take a setback when I moved to London, as Hyperoptic does not have IPv6 deployment, at least in my building, yet. But they provide a great service, for a reasonable price, so I have no intention to switch to something like A&A just to get a good IPv6 right now.

$ host hyperoptic.com
hyperoptic.com has address
hyperoptic.com has address
hyperoptic.com mail is handled by 0 hyperoptic-com.mail.eo.outlook.com.

$ host www.hyperoptic.com
www.hyperoptic.com has address
www.hyperoptic.com has address

$ host www.virginmedia.com
www.virginmedia.com has address

$ host www.bt.co.uk
www.bt.co.uk is an alias for www.bt.com.
www.bt.com has address
Host www.bt.com not found: 2(SERVFAIL)

$ host www.sky.com
www.sky.com is an alias for www.sky.com.edgekey.net.
www.sky.com.edgekey.net is an alias for e1264.g.akamaiedge.net.
e1264.g.akamaiedge.net has address

$ host www.aaisp.net.uk
www.aaisp.net.uk is an alias for www.aa.net.uk.
www.aa.net.uk has address
www.aa.net.uk has address
www.aa.net.uk has IPv6 address 2001:8b0:0:30::65
www.aa.net.uk has IPv6 address 2001:8b0:0:30::68

I’ll get back to this later.

IPv6 is great for complex backend systems: each host gets their own uniquely-addressable IP, so you don’t have to bother with jumphosts, proxycommands, and so on so forth. Depending on the complexity of your backend, you can containerize single applications and then have a single address per application. It’s a gorgeous thing. But as you move towards user facing frontends, things get less interesting. You cannot get rid of IPv4 on the serving side of any service, because most of your visitors are likely reaching you over IPv4, and that’s unlikely to change for quite a while longer still.

Of course the IPv4 address exhaustion is a real problem and it’s hitting ISPs all over the world right now. Mobile providers already started deploying networks that only provide users with IPv6 addresses, and then use NAT64 to allow them to connect to the rest of the world. This is not particularly different from using an old-school IPv4 carrier-grade NAT (CGN), which a requirement of DS-Lite, but I’m told it can get better performance and cost less to maintain. It also has the advantage of reducing the number of different network stacks that need to be involved.

And in general, having to deal with CGN and NAT64 add extra work, latency, and in general bad performance to a network, which is why gamers, as an example, tend to prefer having a single-stack network, one way or the other.

$ host store.steampowered.com
store.steampowered.com has address

$ host www.gog.com
www.gog.com is an alias for gog.com.edgekey.net.
gog.com.edgekey.net is an alias for e11072.g.akamaiedge.net.
e11072.g.akamaiedge.net has address

$ host my.playstation.com
my.playstation.com is an alias for my.playstation.com.edgekey.net.
my.playstation.com.edgekey.net is an alias for e14413.g.akamaiedge.net.
e14413.g.akamaiedge.net has address

$ host www.xbox.com
www.xbox.com is an alias for www.xbox.com.akadns.net.
www.xbox.com.akadns.net is an alias for wildcard.xbox.com.edgekey.net.
wildcard.xbox.com.edgekey.net is an alias for e1822.dspb.akamaiedge.net.
e1822.dspb.akamaiedge.net has address
e1822.dspb.akamaiedge.net has IPv6 address 2a02:26f0:a1:29e::71e
e1822.dspb.akamaiedge.net has IPv6 address 2a02:26f0:a1:280::71e

$ host www.origin.com
www.origin.com is an alias for ea7.com.edgekey.net.
ea7.com.edgekey.net is an alias for e4894.e12.akamaiedge.net.
e4894.e12.akamaiedge.net has address

But multiple other options started spawning around trying to tackle the address exhaustion problem, faster than the deployment of IPv6 is happening. As I already noted above, backend systems, where the end-to-end is under control of a single entity, are perfect soil for IPv6: there’s no need to allocate real IP addresses to these, even when they have to talk over the proper Internet (with proper encryption and access control, goes without saying). So we won’t see more allocations like Xerox’s or Ford’s of whole /8 for backend systems.

$ host www.xerox.com
www.xerox.com is an alias for www.xerox.com.edgekey.net.
www.xerox.com.edgekey.net is an alias for e1142.b.akamaiedge.net.
e1142.b.akamaiedge.net has address

$ host www.ford.com
www.ford.com is an alias for www.ford.com.edgekey.net.
www.ford.com.edgekey.net is an alias for e4213.x.akamaiedge.net.
e4213.x.akamaiedge.net has address

$ host www.xkcd.com
www.xkcd.com is an alias for xkcd.com.
xkcd.com has address
xkcd.com has address
xkcd.com has address
xkcd.com has address
xkcd.com has IPv6 address 2a04:4e42::67
xkcd.com has IPv6 address 2a04:4e42:200::67
xkcd.com has IPv6 address 2a04:4e42:400::67
xkcd.com has IPv6 address 2a04:4e42:600::67
xkcd.com mail is handled by 10 ASPMX.L.GOOGLE.com.
xkcd.com mail is handled by 20 ALT2.ASPMX.L.GOOGLE.com.
xkcd.com mail is handled by 30 ASPMX3.GOOGLEMAIL.com.
xkcd.com mail is handled by 30 ASPMX5.GOOGLEMAIL.com.
xkcd.com mail is handled by 30 ASPMX4.GOOGLEMAIL.com.
xkcd.com mail is handled by 30 ASPMX2.GOOGLEMAIL.com.
xkcd.com mail is handled by 20 ALT1.ASPMX.L.GOOGLE.com.

Another technique that slowed down the exhaustion is SNI. This TLS feature allows to share the same socket for applications having multiple certificates. Similarly to HTTP virtual hosts, that are now what just about everyone uses, SNI allows the same HTTP server instance to deliver secure connections for multiple websites that do not share their certificate. This may sound totally unrelated to IPv6, but before SNI became widely usable (it’s still not supported by very old Android devices, and Windows XP, but both of those are vastly considered irrelevant in 2018), if you needed to provide different certificates, you needed different sockets, and thus different IP addresses. It would not be uncommon for a company to lease a /28 and point it all at the same frontend system just to deliver per-host certificates — one of my old customers did exactly that, until XP became too old to support, after which they declared it so, and migrated all their webapps behind a single IP address with SNI.

Does this mean we should stop caring about the exhaustion? Of course not! But if you are a small(ish) company and you need to focus your efforts to modernize infrastructure, I would not expect you to focus on IPv6 deployment on the frontends. I would rather hope that you’d prioritize TLS (HTTPS) implementation instead, since I would rather not have malware (including but not limited to “coin” miners), to be executed on my computer while I read the news! And that is not simple either.

$ host www.bbc.co.uk
www.bbc.co.uk is an alias for www.bbc.net.uk.
www.bbc.net.uk has address
www.bbc.net.uk has address

$ host www.theguardian.com  
www.theguardian.com is an alias for guardian.map.fastly.net.
guardian.map.fastly.net has address
guardian.map.fastly.net has address
guardian.map.fastly.net has address
guardian.map.fastly.net has address

$ host www.independent.ie
www.independent.ie has address
www.independent.ie has address
www.independent.ie has address
www.independent.ie has address
www.independent.ie has address
www.independent.ie has address
www.independent.ie has address
www.independent.ie has address

Okay I know these snippets are getting old and probably beating a dead horse. But what I’m trying to bring home here is that there is very little to gain in supporting IPv6 on frontends today, unless you are an enthusiast or a technology company yourself. I work for a company that believes in it and provides tools, data, and its own services over IPv6. But it’s one company. And as a full disclosure, I have no involvement in this particular field whatsoever.

In all of the examples above, which are of course not complete and not statistically meaningful, you can see that there are a few interesting exceptions. In the gaming world, XBox appears to have IPv6 frontends enabled, which is not surprising when you remember that Microsoft even developed one of the first tunnelling protocols to kickstart adoption of IPv6. And of course XKCD, being ran by a technologist and technology enthusiast couldn’t possibly ignore IPv6, but that’s not what the average user needs from their Internet connection.

Of course, your average user spends a lot of time on platforms created and maintained by technology companies, and Facebook is another big player of the IPv6 landscape, so they have been available over it for a long while — though that’s not the case of Twitter. But at the same time, they need their connection to access their bank…

$ host www.chase.com
www.chase.com is an alias for wwwbcchase.gslb.bankone.com.
wwwbcchase.gslb.bankone.com has address

$ host www.ulsterbankanytimebanking.ie
www.ulsterbankanytimebanking.ie has address

$ host www.barclays.co.uk
www.barclays.co.uk has address

$ host www.tescobank.com
www.tescobank.com has address

$ host www.metrobank.co.uk
www.metrobank.co.uk has address

$ host www.finecobank.com
www.finecobank.com has address

$ host www.unicredit.it
www.unicredit.it is an alias for www.unicredit.it-new.gtm.unicreditgroup.eu.
www.unicredit.it-new.gtm.unicreditgroup.eu has address

$ host www.aib.ie
www.aib.ie has address

to pay their bills…

$ host www.mybills.ie
www.mybills.ie has address

$ host www.airtricity.ie
www.airtricity.ie has address

$ host www.bordgaisenergy.ie
www.bordgaisenergy.ie has address

$ host www.thameswater.co.uk
www.thameswater.co.uk is an alias for aerotwprd.trafficmanager.net.
aerotwprd.trafficmanager.net is an alias for twsecondary.westeurope.cloudapp.azure.com.
twsecondary.westeurope.cloudapp.azure.com has address

$ host www.edfenergy.com
www.edfenergy.com has address

$ host www.veritasenergia.it
www.veritasenergia.it is an alias for veritasenergia.it.
veritasenergia.it has address
veritasenergia.it mail is handled by 10 mail.ascopiave.it.
veritasenergia.it mail is handled by 30 mail3.ascotlc.it.

$ host www.enel.it
www.enel.it is an alias for bdzkx.x.incapdns.net.
bdzkx.x.incapdns.net has address

to do shopping…

$ host www.paypal.com
www.paypal.com is an alias for geo.paypal.com.akadns.net.
geo.paypal.com.akadns.net is an alias for hotspot-www.paypal.com.akadns.net.
hotspot-www.paypal.com.akadns.net is an alias for wlb.paypal.com.akadns.net.
wlb.paypal.com.akadns.net is an alias for www.paypal.com.edgekey.net.
www.paypal.com.edgekey.net is an alias for e3694.a.akamaiedge.net.
e3694.a.akamaiedge.net has address

$ host www.amazon.com
www.amazon.com is an alias for www.cdn.amazon.com.
www.cdn.amazon.com is an alias for d3ag4hukkh62yn.cloudfront.net.
d3ag4hukkh62yn.cloudfront.net has address

$ host www.ebay.com 
www.ebay.com is an alias for slot9428.ebay.com.edgekey.net.
slot9428.ebay.com.edgekey.net is an alias for e9428.b.akamaiedge.net.
e9428.b.akamaiedge.net has address

$ host www.marksandspencer.com
www.marksandspencer.com is an alias for prod.mands.com.edgekey.net.
prod.mands.com.edgekey.net is an alias for e2341.x.akamaiedge.net.
e2341.x.akamaiedge.net has address

$ host www.tesco.com
www.tesco.com is an alias for www.tesco.com.edgekey.net.
www.tesco.com.edgekey.net is an alias for e2008.x.akamaiedge.net.
e2008.x.akamaiedge.net has address

to organize fun with friends…

$ host www.opentable.com
www.opentable.com is an alias for ev-www.opentable.com.edgekey.net.
ev-www.opentable.com.edgekey.net is an alias for e9171.x.akamaiedge.net.
e9171.x.akamaiedge.net has address

$ host www.just-eat.co.uk
www.just-eat.co.uk is an alias for 72urm.x.incapdns.net.
72urm.x.incapdns.net has address

$ host www.airbnb.com
www.airbnb.com is an alias for cdx.muscache.com.
cdx.muscache.com is an alias for 2-01-57ab-0001.cdx.cedexis.net.
2-01-57ab-0001.cdx.cedexis.net is an alias for evsan.airbnb.com.edgekey.net.
evsan.airbnb.com.edgekey.net is an alias for e864.b.akamaiedge.net.
e864.b.akamaiedge.net has address

$ host www.odeon.co.uk
www.odeon.co.uk has address

and so on so forth.

This means that for an average user, an IPv6-only network is not feasible at all, and I think the idea that it’s a concept to validate is dangerous.

What it does not mean, is that we should just ignore IPv6 altogether. Instead we should make sure to prioritize it accordingly. We’re in a 2018 in which IoT devices are vastly insecure, so the idea of having a publicly-addressable IP for each of the devices in your home is not just uninteresting, but actively frightening to me. And for the companies that need the adoption, I would hope that the priority right now would be proper security, instead of adding an extra layer that would create more unknowns in their stack (because, and again it’s worth noting, as I had a discussion about this too, it’s not just the network that needs to support IPv6, it’s the full application!). And if that means that non-performance-critical backends are not going to be available over IPv6 this century, so be it.

One remark that I’m sure is going to arrive from at least a part of the readers of this, is that a significant part of the examples I’m giving here appear to all be hosted on Akamai’s content delivery network which, as we can tell from XBox’s website, supports IPv6 frontends. “It’s just a button to press, and you get IPv6, it’s not difficult, they are slackers!” is the follow up I expect. For anyone who has worked in the field long enough, this would be a facepalm.

The fact that your frontend can receive IPv6 connections does not mean that your backends can cope with it. Whether it is for session validation, for fraud detection, or just market analysis, lots of systems need to be able to tell what IP address a connection was coming from. If your backend can’t cope with IPv6 addresses being used, your experience may vary between being unable to buy services and receiving useless security alerts. It’s a full stack world.

FOSDEM and the unrealistic IPv6-only network

Most of you know FOSDEM already, for those who don’t, it’s the largest Free and Open Source Software focused conference in Europe (if not the world.) If you haven’t been to it I definitely suggest it, particularly because it’s a free admission conference and it always has something interesting to discuss.

Even though there is no ticket and no badge, the conference does have free WiFi Internet access, which is how the number of attendees is usually estimated. In the past few years, their network has also been pushing the envelope on IPv6 support, first providing a dualstack network when IPv6 was fairly rare, and in the recent (three?) years providing an IPv6-only network as the default.

I can see the reason to do this, in the sense that a lot of Free Software developers are physically at the conference, which means they can see their tools suffer in an IPv6 environment and fix them. But at the same time, this has generated lots of complaints about Android not working in this setup. While part of that noise was useful, I got the impression this year that the complaints are repeated only for the sake of complaining.

Full disclosure, of course: I do happen to work for the company behind Android. On the other hand, I don’t work on anything related at all. So this post is as usual my own personal opinion.

The complaints about Android started off quite healthy: devices couldn’t actually connect to an IPv6 dual-stack network, and then they couldn’t connect to a IPv6-only network. Both are valid complaints to begin with, though there is a bit more to it. This year in particular the complaints were not so healthy because current versions of Android (6.0) actually do support IPv6-only networks, though most of the Android devices out there are not running this version, either because they have too old hardware or because the manufacturer has not released a new build yet.

What does tick me though has really nothing to do with Android, but rather with the idea that people have that the current IPv6-only setup used by FOSDEM is a realistic approach to IPv6 networking — it really is not. It is a nice setup to test things out and stress the need for proper support for IPv6 in tools, but it’s very unlikely to be used in production by anybody as is.

The technique used (at least this year) by FOSDEM is NAT64. To oversimplify how this works, it is designed to modify the DNS replies when resolving hostnames so that they always provide an IPv6 address, even though they would only have A records (IPv4 addresses). The IPv6 addresses used would then map back to IPv4, and the edge router would then “translate” between the two connections.

Unlike classic NAT, this technique requires user-space components, as the kernel uses separate stacks for IPv4 and IPv6 which do not allow direct message passing between the two. This makes it complicated and significantly slower (you have to copy the data from kernel to userspace and back all the time), unless you use one of the hardware router that are designed to deal with this (I know both Juniper and Cisco have those.)

NAT64 is a very useful testbed, if your target is figuring out what in your stack is not ready for IPv6. It is not, though, a realistic approach for consumer networks. If your client application does not have IPv6 support, it’ll just fail to connect. If for whatever reason you rely on IPv4 literals, they won’t work. Even worse, if the code allows a connection to be established over IPv6, but relies on IPv4 semantics for things like logging, or (worse) access control, then you now have bugs, crashes or worse, vulnerabilities.

And while fuzzing and stress-testing are great for development environments, they are not good for final users. In the same way -Werror is a great tool to fix your code, but uselessly disrupts your users.

In a similar fashion, while IPv6-only datacenters are not that uncommon – Facebook (the company) talked about them two years ago already – they serve a definite different purpose from a customer network. You don’t want, after all, your database cluster to connect to random external services that you don’t control — and if you do control the services, you just need to make sure they are all available over IPv6. In such a system, having a single stack to worry about simplifies, rather than complicate, things. I do something similar for the server I divide into containers: some of them, that are only backends, get no IPv4 at all, not even in NAT. If they ever have to go fetch something to build on the Internet at large, they go through a proxy instead.

I’m not saying that FOSDEM setting up such a network is not useful. It actually hugely is, as it clearly highlights the problems of applications not supporting IPv6 properly. And for Free Software developers setting up a network like this might indeed be too expensive in time or money, so it is a chance to try things out and iron out bugs. But at the same time it does not reflect a realistic environment. Which is why adding more and more rant on the tracking Android bug (which I’m not even going to link here) is not going to be useful — the limitation was known for a while and has been addressed on newer versions, but it would be useless to try backporting it.

For what it’s worth, what is more likely to happen as IPv6 adoption needs to happen, is that providers will move towards solutions like DS-Lite (nothing to do with Nintendo), which couples native IPv6 with carrier-grade NAT. While this has limitations, depending on the size of the ISP pools, it is still easier to set up than NAT64, and is essentially transparent for customers if their systems don’t support IPv6 at all. My ISP here in Ireland (Virgin Media) already has such a setup.

Predictable persistently (non-)mnemonic names

This is part two of a series of articles looking into the new udev “predictable” names. Part one is here and talks about the path-based names.

As Steve also asked on the comments from last post, isn’t it possible to just use the MAC address of an interface to point at it? Sure it’s possible! You just need to enable the mac-based name generator. But what does that mean? It means that your new interface names will be enx0026b9d7bf1f and wlx0023148f1cc8 — do you see yourself typing them?

Myself, I’m not going to type them. My favourite suggestion to solve the issue is to rely on rules similar to the previous persistent naming, but not re-using the eth prefix to avoid collisions (which will no longer be resolved by future versions of udev). I instead use the names wan0 and lan0 (and so on), when the two interfaces sit straddling between a private and a public network. How do I achieve that? Simple:

SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="00:17:31:c6:4a:ca", NAME="lan0"
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="00:07:e9:12:07:36", NAME="wan0"

Yes these simple rules are doing all the work you need if you just want to make sure not to mix the two interfaces by mistake. If your server or vserver only has one interface, and you want to have it as wan0 no matter what its mac address is (easier to clone, for instance), then you can go for

SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="*", NAME="wan0"

As long as you only have a single network interface, that will work just fine. For those who use Puppet, I also published a module that you can use to create the file, and ensure that the other methods to achieve “sticky” names are not present.

My reasoning to actually using this kind of names is relatively simple: the rare places where I do need to specify the interface name are usually in ACLs, the firewall, and so on. In these, the most important part to me is knowing whether the interface is public or not, so the wan/lan distinction is the most useful. I don’t intend trying to remember whether enp5s24k1f345totheright4nextothebaker is the public or private interface.

Speaking about which, one of the things that appears obvious even from Lennart’s comment to the previous post, is that there is no real assurance that the names are set in stone — he says that an udev upgrade won’t change them, but I guess most people would be sceptic, remembering the track record that udev and systemd has had over the past few months alone. In this situation my personal, informed opinion is that all this work on “predictable” names is a huge waste of time for almost everybody.

If you do care about stable interface names, you most definitely expect them to be more meaningful than 10-digits strings of paths or mac addresses, so you almost certainly want to go through with custom naming, so that at least you attach some sense into the names themselves.

On the other hand, if you do not care about interface names themselves, for instance because instead of running commands or scripts, you just use NetworkManager… well what the heck are you doing playing around with paths? If it doesn’t bother you that the interface for an USB device changes considerably between one port and another, how can it matter to you whether it’s called wwan0 or wwan123? And if the name of the interface does not matter to you, why are you spending useless time trying to get these “predictable” names working?

All in all, I think this is just an useless nice trick, that will only cause more headaches than it can possibly solve. Bahumbug!

Predictably non-persistent names

This is going to be fun. The Gentoo “udev team”, in the person of Samuli – who seems to suffer from 0-day bump syndrome – decided to now enable by default the new predictable names feature that is supposed to make things so much nicer in Linux land where, especially for people coming from FreeBSD, things have been pretty much messed up. This replaces the old “persistent” names, that were often enough too fragile to work, as they did in-place renaming of interfaces, and would cause way too often conflicts at boot time, since swapping two devices’ names is not an atomic operation for obvious reasons.

So what’s this predictable name all around? Well, it’s mostly a merge of the previous persistent naming system, and the BIOS label naming project which was developed by RedHat for a few years already so that the names of interfaces for server hardware in the operating system match the documentation of said server, so that you can be sure that if you’re connecting the port marked with “1” on the chassis, out of four on the motherboard, it will bring up eth2.

But why were those two technologies needed? Let’s start first with explaining how (more or less) the kernel naming scheme works: unlike the BSD systems, where the interfaces are named after the kernel driver (en0, dc0, etc.), the Linux kernel uses generic names, mostly eth, wlan and wwan, and maybe a couple more, for tunnels and so on. This causes the first problem: if you have multiple devices of the same class (ethernet, wlan, wwan) coming from different drivers, the order of the interface may very well vary between reboots, either because of changes in the kernel, if the drivers are built-in, or simply because of locking and execution of modules load (which is much more common for binary distributions).

The reason why changes in the kernel can change the order is that the order in which drivers are initialized has changed before and might change again in the future. A driver could also decide to change the order with which its devices are initialized (PCI tree scanning order, PCI ID order, MAC address order, …) and so on, causing it to change the order of interfaces even for the same driver. More about this later.

But here’s my first doubt arises: how common is for people to have more than one interface of the same class from vendors different enough to use different drivers? Well it depends on the class of device; on a laptop you’d have to search hard for a model with more than one Ethernet or wireless interface, unless you add an ExpressCard or PCMCIA expansion card (and even those are not that common). On a desktop, I’ve seen a few very recent motherboards with more than one network port, and I have yet to see one with different chips for the two. Servers, that’s a different story.

Indeed, it’s not that uncommon to have multiple on-board and expansion card ports on a server. For instance you could use the two onboard ports as public and private interfaces for the host… and then add a 4-port card to split between virtual machines. In this situation, having a persistent naming of the interfaces is indeed something you would be glad of. How can you tell which one of eth{0..5} is your onboard port #2, otherwise? This would be problem number two.

Another situation in which having a persistent naming of interfaces is almost a requirement is if you’re setting up a router: you definitely don’t want to switch the LAN and WAN interface names around, especially where the firewall is involved.

This background is why the persistent-net rules were devised quite a few years ago for udev. Unfortunately almost everybody got at least one nasty experience with them. Sometimes the in-place rename would fail, and you’d end up with the temporary names at the end of boot. In a few cases the name was not persistent at all: if the kernel driver for the device would change, or change name at least, the rules wouldn’t match and your eth0 would become eth1 (this was the case when Intel split the e1000 and e1000e drivers, but it’s definitely more common with wireless drivers, especially if they move from staging to main).

So the old persistent net rules were flawed. What about the new predictable rules? Well, not only they combined the BIOS naming scheme (which is actually awesome when it works — SuperMicro servers such as Excelsior do not expose the label; my Dell laptop only exposes a label for the Ethernet port but doesn’t for either the wireless adapter or the 3G one), but it has two “fallbacks” that are supposed to be used when the labels fail, one based on the MAC address of the interface, and the other based on the “path” — which for most PCI, PCI-E, onboard, ExpressCard ports is basically the PCI address; for USB… we’ll see in a moment.

So let’s see, from my laptop:

# lspci | grep 'Network controller'
03:00.0 Network controller: Intel Corporation Centrino Advanced-N 6200 (rev 35)
# ifconfig | grep wlp3
wlp3s0: flags=4163  mtu 1500

Why “wlp3s0”? It’s the Wireless adapter (wl) PCI (p) card at bus 3, slot 0 (s0): 03:00.0. Matches lspci properly. But let’s see the WWAN interface on the same laptop:

# ifconfig -a | grep ww
wwp0s29u1u6i6: flags=4098  mtu 1500

Much longer name! What’s going on then? Let’s see, it’s reporting it’s card at bus 0, slot 29 (0x1d) — lspci will use hexadecimal numbers for the addresses:

# lspci | grep '00:1d'
00:1d.0 USB controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 05)

Okay so it’s an USB device, even though the physical form factor is a mini-PCIE card. It’s common. Does it match lsusb?

# lsusb | grep Broadband
Bus 002 Device 004: ID 413c:8184 Dell Computer Corp. F3607gw v2 Mobile Broadband Module

Not the Bus/Device specification there, which is good: the device number will increase every time you pop something in/out of the port, so it’s not persistent across reboots at all. What it uses is the path to the device standing by USB ports, which is a tad more complex, but basically means it matches /sys/bus/usb/devices/2-1.6:1.6/ (I don’t pretend to know how the thing works exactly, but it describe to which physical port the device is connected).

In my laptop’s case, the situation is actually quite nice: I cannot move either the WLAN or WWAN device on a different slot so the name assigned by the slot is persistent as well as predictable. But what if you’re on a desktop with an add-on WLAN card? What happens if you decide to change your video card, with a more powerful one that occupies the space of two slots, one of which happen to be the place where you WLAN card is? You move it, reboot and .. you just changed the interface name! If you’ve been using Network Manager, you’ll just have to reconfigure the network I suppose.

Let’s take a different example. My laptop, with its integrated WWAN card, is a rare example; most people I know use USB “keys”, as the providers give them away for free, at least in Italy. I happen to have one as well, so let me try to plug it in one of the ports of my laptop:

# lsusb | grep modem
Bus 002 Device 014: ID 12d1:1436 Huawei Technologies Co., Ltd. E173 3G Modem (modem-mode)
# ifconfig -a | grep ww
wwp0s29u1u2i1: flags=4098  mtu 1500
wwp0s29u1u6i6: flags=4098  mtu 1500

Okay great this is a different USB device, connected to the same USB controller as the onboard one, but at different ports, neat. Now, what if I had all my usual ports busy, and I decided to connect it to the USB3 add-on ExpressCard I got on the laptop?

# lsusb | grep modem
Bus 003 Device 004: ID 12d1:1436 Huawei Technologies Co., Ltd. E173 3G Modem (modem-mode)
# ifconfig -a | grep ww
wwp0s29u1u6i6: flags=4098  mtu 1500
wws1u1i1: flags=4098  mtu 1500

What’s this? Well, the USB3 controller provides slot information, so udev magically uses that to rename the interface, so it avoids using the otherwise longer wwp6s0u1i1 name (the USB3 controller is on the PCI bus 6).

Let’s go back to the on-board ports:

# lsusb | grep modem
Bus 002 Device 016: ID 12d1:1436 Huawei Technologies Co., Ltd. E173 3G Modem (modem-mode)
# ifconfig -a | grep ww
wwp0s29u1u3i1: flags=4098  mtu 1500
wwp0s29u1u6i6: flags=4098  mtu 1500

Seems the same, but it’s not. Now it’s u3 not u2. Why? I used a different port on the laptop. And the interface name changed. Yes, any port change will produce a different interface name, predictably. But what happens if the kernel decides to change the way the ports are enumerated? What happens if the USB 2 driver is buggy and is supposed to provide slot information, and they fix it? You got it, even in these cases, the interface names are changed.

I’m not saying that the kernel naming scheme is perfect. But if you’re expected to always only have an Ethernet port, a WLAN card and a WWAN USB stick, with it you’ll be sure to have eth0, wlan0 and wwan0, as long as the drivers are not completely broken as they are now (like if the WLAN is appearing as eth1), and as long as you don’t muck with the interface names in userspace.

Next up, I’ll talk about the MAC addresses based naming and my personal preference when setting up servers and routers. Have fun in the mean time figuring out what your interface names will be.

Gentoo Linux-based network routing, again

It seems like I’m specializing in setting up Gentoo-based routers. In my work here in California (for the short time I’ll be here, as it looks like my next destination is London by the end of the year), there was the need to change the previous network setup from the previous router (a Juniper ScreenOS device) to something more apt to work with FiOS as the uplink — in particular, we just got our 150Mbit down, 65Mbit up link and the router we had, from Juniper, is only rated up to a very optimistic 40Mbps in either direction.

After trying, and failing, to get the FiOS router/access-point and the VPN provided by the Juniper router, to play nice together, I picked up one of the (extremely old) HPs we had around (a desktop, not a server), ordered a couple of PCI gigabit network cards, and simply set up Gentoo on it. Actually, since the cards took a couple of days to arrive I first set everything up “dry” and then got the network cards in. The bright side is that the cards arrived at 11am, and by 4pm the whole thing was running better than before; by the end of the day I also got an IPv6 tunnel and we finally have support for IPv6 here in the office — which is important for me because of how my Excelsior is setup (I’ll write more about that later on).

Getting Linux to play nice with the Juniper router and its VPN has been the most bothersome part of the whole. Luckily this wasn’t Juniper’s “SSL VPN”, which requires their Java-based tool to run as root to work as a client on Linux — instead the VPN, completely unmarked, is using IPsec. It’s a bit of a burden to know what to tweak between the kernel and the userland, and everything is up.. unfortunately it seems like the racoon init script is a bit of a pain in the butt, as it failed to work properly for me, while my improvements fail to work for others — if you’re using it and feel like testing it, I’m pretty sure Anthony would be happy to have more hands on deck.

I have yet to set up OpenVPN to be honest, and there is another problem with VPN Tracker behind this router as there is no IPsec connection tracking helper, which means that the UDP packets required for negotiation are not working (the client does not support UPnP/IGD for port forwarding which is a definite pain). In general though it’s much easier for me to deal with a Gentoo Linux-based router than it is dealing with the stupid Juniper ScreenOS.

I’ve been doing some reading around on which parameters to tweak, but since I haven’t had much time to experiment with it yet, and on the other hand the office is now basically running with three people in at any time, there’s very little that doesn’t work out of the box. The one thing that I noticed, though, is that somehow IPv6 (over the tunnel) feels “snappier” than IPv4. Maybe it’s the NAT that has to be done, or the fact that the iptables rules are more complex for v4 than v6 (as they have DNAT as well) — the ping times are also quite good: they are halved for IPv6: 3ms vs 6ms over v4, to Google’s homepage; similar (but much higher) results happen for Yahoo! but they are reversed for Facebook.

A good reason not to use network bridges

So one of the things I’m working on for my job is to look to set up Linux Containers to separate some applications — yes I know I’m the one who said that they are not ready for prime time but please note that what I was saying is that I wouldn’t give root inside a container to anybody I would trust — which is not the same as to say that they are not extremely useful to limit the resource consumption of various applications.

Anyway, there is one thing that has to be considered, of which I already quickly wrote about : networking. The simplest way to set up a LXC host, if your network is a private one, with a DHCP server or something along those lines, is to create one single bridge between your public network interface and the host-side of virtual Ethernet pairs — this has one unfortunate side effect: to make it working, it puts the network interface in promiscuous mode, which means that it receives all the packets directed to any other interface, which slows it down quite a bit.

So how do you solve the issue? Well, I’m honestly not sure whether macvlan improves the situation, I’m afraid not. What I decided for Excelsior, since it is not on a private network, was to set up an internal bridge, and have static IP addresses set to internal IPs. When i need to jump into one of the containers, I simply use the main public IP as an SSH jumphost and then connect to the correct address. I described the setup before although I made then a further change so now I don’t have to bother with the private IP addresses in the configuration file: I use the public IPv6 AAAA record for the containers, which simply resolve as usual once inside my jumphosts.

Of course with the exception of jumphosts, that kind of settings, which involve using NAT on iptables, has no way to receive connections from the outside.

So what other options are there? One thing I’ve been thinking about was to use a level-3 managed switch and set it to route a subnet to the LXC host — but that wouldn’t fly too much. So at the end the question would be “what is it that I need access on the containers form the outside?” and the answer is simply “the websites”. The containers provide a number of services, but only the websites are mapped to the outside. So, do I need IPs that are even partially public? Not really.

The solution I’m planning right now is that I’ll set up a box with either an Apache reverse-proxy or some other reverse proxy (depending on how much we want to handle on the proxy itself), and have that contact the internal containers, the same way it would be if you had one reverse proxy on the Internet, and the servers on the internal network.

I guess at some point I should overhaul the LXC wiki page for what concerns networking; I already spent some time to remove some duplicated content and actually sync it with what’s going on on the ebuild…

My problem with networking

After my two parter on networking, IPv6 and wireless, I got a few questions on why I don just use a cable connection rather than dealing with wireless bridges. The answer is, unfortunately, that I don’t have a clean way to reach with a cable from the point where my ADSL is and where my office is, on the floor above.

This is mostly due to bad wiring in the house: too little space to get cables through, and too many cables already in there. One of the projects we have going on the house now (we’ve been working on a relatively long list of chores that has to be done since neither me nor my mother foresee leaving this house soon), is to rewire the burglar alarm system, in which case, I should get more space for my cables — modern burglar alarms do not require the equivalent of four Ethernet cables running throughout the house.

Unfortunately that is not going to be the end of the trouble. While I might be able to get the one cable running from my office to the basement (where the cable distribution ties up) and from there to the hallway (where the ADSL is), I’m not sure of how many metres of cables that would be. When I wired with cat5e cable between my office and bedroom (for the AppleTV to stream cleanly), I already had to sacrifice Gigabit speed. And I’m not even sure if passing the cable through there will allow the signal to pass cleanly, as it’ll be running together with the mains’ wires — the house is almost thirty years old, I don’t have a chance to get separate connection for the data cable and the power; I’m lucky enough that the satellite cable fits. And I should shorten that.

To be honest, I knew a way around my house if I wanted to pass a cable to reach here already. But the problem with that is that it would require me to go the widest route possible: while my office is stacked on top of the hallway (without a direct connection, that would have been too easy), to get from one to the other, without the alarm rewiring, I would have to get to the opposite side of the house, bring the cable upstairs and then back, using a mixture of passageways designed for telephone, power and aerial wiring; and crawling outside the wall for a few metres as well.

The problem with that solution, beside the huge amount of time that it would require me to invest in it, is that the total cable length is almost certainly over a hundred metres, which is the official physical limit of cat5e Ethernet cables. Of course many people would insist telling me that “it’s okay, there are high chance it would still work” .. sure, and what if it doesn’t? I mean I have to actually make a hole in the wall at one place, then spend more than a day (I’m sure I wouldn’t be able to do this in just a day, already had to deal with my wiring before), with the risk of not getting a clear enough signal for the connection to be established. No thanks.

I also considered the option of going fibre optic. I have no clue about the cabling itself, and I know it requires even more specific tools than the RJ45 plugs to be wired, but I have looked at the prices of the hardware capable of converting the signal between fibre and good old RJ45 cabling… and it’s way out of my range.

Anyway, back on topic of the current plan for getting the cable running. As I said the current “cable hub” is in the basement, which is mostly used as a storage room for my mother’s stuff. She’s also trying to clean that up, so in a (realistically, remote) future I might actually move most of my hardware down there rather than in the office — namely Yamato, the router itself (forwarding the ADSL connection rather than the whole network) and Archer, the NAS. Our basement is not prone to floods, and is generally cool in the summer, definitely cooler than my office is. Unfortunately for that to work out, I’ll probably need a real-life rack, and rackmount chassis, neither of which is really cheap.

Unfortunately with that being, as I said, in the future, if I were to pass the cable next month from there, and the signal wouldn’t be strong enough, the only option I’d have would be to add a repeater. Adding a repeater there, though, is troublesome. As I said in the other posts, and before as well, my area is plagued with a very bad power supply situation. To the point that I have four UPS units in the house, for a total of 3750 VA (which is, technically, probably more than the power provided by supplier). I don’t really like the idea of having to make room for yet another UPS unit just for a repeater; even less so considering that the cables would end up being over my head, on the stairs’ passage (yes it is a stupid position to add a control panel in the first place), and while most repeaters seem to be wall-mountable, UPS units are a different story.

So the only solution I can think for such a situation would be to add a PoE repeater there, if needed, and then relay its power through a switch, either in my office (unlikely) or in the hallway near the router (most likely), behind the UPS. Once again here, the factor is the cost.

Honestly, even though I decided not to get an office after seeing costs jumping higher and higher – having an office would increase my deductibles of course, but between renting the office, daily transportation, twice the power bill, and so on so forth, it’s not the taxes that worry me – I wonder if it is really as cheap as I prospected it to be, to keep working at home.

Sigh. I guess it’s more paid work, less free time next year as well.

The problem with wireless bridging

I want to pick up where I left with my previous post and expand a bit upon the issue with wireless bridging, and why “just use dd-wrt” is not an answer to the problem.

As I said a number of issues I learnt the hard way, by trying to get them to work… and failing. In particular, there is a limitation in 802.11, that even the dd-wrt documentation notes:

Client Bridge mode will only recognize one mac address on the bridged setup, due a limitation in the 802.11 protocol, even if there are multiple clients (with multiple mac addresses) connected to the client router. If you want to bridge a full LAN you must use WDS. The problem is that the 802.11 protocol just supports one MAC address, but in a LAN there is the possibility for more than one MAC address. It may cause ARP table problems, if you connect more than one computer on the far end of a Client Bridge mode setup. You will not be able to, for example, block mac addresses of client of the bridged routers or set access restrictions based on mac addresses in the bridged router

This is actually putting it more bright than it is. Anything relying on proper mac address communication will fail. Indeed, if you wish to use a single DHCP server, your only choice is to run dhrelay on the bridge itself. And that’s not a good idea.

Due to the fact that 802.11 decides where to send the packets depending on the mac address, you only have two choices for this to work: you either go with what OpenRG/Linksys do, and translate addresses at second level (with probably a dhrelay to make sure that dhcp still works), or you do what D-Link did with the DAP-1160 and create a custom work mode, which I guess encapsulates the packets to preserve their addresses (I could probably have tried AP+Bridge mode and sniffed the traffic to find that out but I didn’t care), probably something along the lines of a generic Ethernet-in-Ethernet encapsulation.

Interestingly enough, there is an RFC describing Ethernet-in-IP encapsulation, and then there is a patch for Linux 2.6.10 that implements it .. it would be quite an interesting approach, to have the router listen to an EtherIP device, and have another EtherIP device here to encapsulate the packets.. unfortunately this would still require a very shallow router up here, which is what I’m trying to avoid altogether. And as it happens, looks like the patch never made it to the Kernel, and the author’s website seems to be gone as well (the domain does not have an answering webserver, even though the whois data confirms its registration .. I should try to see if the email address is still valid or not — there is a valid mx record and an answering mail server at least).

I guess I can add this to the long list of projects I’ll work with once I made enough money not to have to work twelve hours a day to pay the bills…

IPv6 and networking pain

I’m honestly reconsidering my scepticism towards curses.. mostly because the past two months don’t make much sense without taking that into consideration. I’ve had a long list of hardware, network and power issues, and jobs ended up being bottled up due to that.

Not the latest, and not the worse (but there on the upper side of the list) of said issue happened with the DAP-1160 bridges/access points I used to connect the network segment in my office to the router downstairs. The problem there is that for a long series of reasons I can’t reach it with either an ethernet cable or a powerline adapter, and so I decided to use gigabit within the office, and jump with wireless to the router.

I’ve got those two bridges for about two years now, and they worked mostly well. Mostly, not perfectly. In the past month, though, they started acting up, requiring too often a reboot… the problem is likely tied with them running continuously for a few months and then being turned on and off repeatedly due to the power company blacking me out (14 hours in 14 days.. two lumps of 5 hours, plus a number of on-and-off spikes).

My original implementation for getting this setup to work involved an OpenWRT powered router, and subnetting the office.. but the subnetting became easily a bother, as it added one more router for me to manage, and I didn’t intend to proceed that way. I then replaced said router with Enterprise/Yamato with a WLAN card, but that had its share of troubles as well. At the end I went with the two D-Link devices that created a seamless Ethernet bridge between the two segments, yai!

And now they started failing, so I had to replace them. And since I was out to replace them I wanted to use 11n hardware to run on the 5GHz band rather than 2.4, to avoid most of the interference otherwise present. So after a bit of googling around I ended up buying two Cisco Linksys devices, a WAP610N access point and a WET610N bridge. They are designed to work together, and thus they should have been perfect. Should being the keyword.

What happens with these? Well, the throughput is nice indeed, it’s much faster to connect to the router now. But at the same time.. I lost all IPv6 capabilities.

Now, I learnt the hard way at the time that the 802.11 specifications do not include provisions for wireless-to-Ethernet transparent bridges, and all implementations of those are custom implementations of the manufacturers. I thought Linksys solved that in such a level as well.. but it turns out it didn’t. It actually did something a tad smarter, for the kind of usage they foresaw their hardware to be used for. They parse the third level packages, in particular it seems they parse the ARP packets, to tell the access point which address to send their way… a sort of Network Address Translation at the second level.

Unfortunately, they do not do the same for what concern the IPv6 NDP, so IPv6 is simply broken here. To be honest, IPv6 works fine in the network segment, becaues the router advertisement is sent in broadcast, and thus received probably, but all the unicast IPv6 traffic from the router to the bridge (not the other way around, btw) is dropped.

I’m not sure if I should just live with it or if I should find a more proper replacement for the 1160 devices. If somebody know hardware capable of doing such a transparent bridge between wireless and ethernet on the 5GHz band, it would definitely be welcome.. in that case, the Linksys bridge will just limit itself to my bedroom (where it would connect just the consoles and TV, none of which is IPv6 compatible anyway), and the access point would replace the current 11g public network I use for the devices outside of my office.

In the mean time I have more issues to solve. Sigh.