May I have a network connection, please?

If you’re running ~arch, you probably noticed by now that the latest OpenRC release no longer allows services to “need net” in their init scripts. This change has caused quite a bit of grief because some services no longer started after a reboot, or no longer start after a restart, including Apache. Edit: this only happens if you have corner case configurations such as an LXC guest. As William points out, the real change is simply that net.lo no longer provides the net virtual, but the other network interfaces do.

While it’s impossible to say that this is not annoying as hell, it could be much worse. Among other reasons, because it’s really trivial to work it around until the init scripts themselves are properly fixed. How? You just need to append to /etc/conf.d/$SERVICENAME the line rc_need="!net" — if the configuration file does not exist, simply create it.

Interestingly enough, knowing this workaround also allows you to do something even more useful, that is making sure that services requiring a given interface being up depend on that interface. Okay it’s a bit complex, let me backtrack a little.

Most of the server daemons that you have out there don’t really care of how many, which, and what name your interfaces are. They open either to the “catch-all” address (0.0.0.0 or :: depending on the version of the IP protocol — the latter can also be used as a catch-both IPv4 and IPv6, but that’s a different story altogether), to a particular IP address, or they can bind to the particular interface but that’s quite rare, and usually only has to do with the actual physical address, such as RADVD or DHCP.

Now to bind to a particular IP address, you really need to have the address assigned to the local computer or the binding will fail. So in these cases you have to stagger the service start until the network interface with that address is started. Unfortunately, it’s extremely hard to do so automatically: you’d have to parse the configuration file of the service (which is sometimes easy and most of the times not), and then you’d have to figure out which interface will come up with that address … which is not really possible for networks that get their addresses automatically.

So how do you solve this conundrum? There are two ways and both involve manual configuration, but so do defined-address listening sockets for daemons.

The first option is to keep the daemon listening on the catch-all addresses, then use iptables to set up filtering per-interface or per-address. This is quite easy to deal with, and quite safe as well. It also has the nice side effect that you only have one place to handle all the IP address specifications. If you ever had to restructure a network because the sysadmin before you used the wrong subnet mask, you know how big a difference that makes. I’ve found before that some people think that iptables also needs the interfaces to be up to work. This is not the case, fortunately, it’ll accept any interface names as long as they could possibly be valid, and then will only match them when the interface is actually coming up (that’s why it’s usually a better idea to whitelist rather than blacklist there).

The other option requires changing the configuration on the OpenRC side. As I shown above you can easily manipulate the dependencies of the init scripts without having to change those scripts at all. So if you’re running a DHCP server on the lan served by the interface named lan0 (named this way because a certain udev no longer allows you to swap the interface names with the permanent rules that were first introduced by it), and you want to make sure that one network interface is up before dhcp, you can simply add rc_need="net.lan0" to your /etc/conf.d/dhcpd. This way you can actually make sure that the services’ dependencies match what you expect — I use this to make sure that if I restart things like mysql, php-fpm is also restarted.

So after I gave you two ways to work around the current not-really-working-well status, but why did I not complain about the current situation? Well, the reason for which so many init scripts have that “need net” line is simply cargo-culting. And the big problem is that there is no real good definition of what “net” is supposed to be. I’ve seen used (and used it myself!) for at least the following notions:

  • there are enough modules loaded that you can open sockets; this is not really a situation that I’d like to find myself to have to work around; while it’s possible to build both ipv4 and ipv6 as modules, I doubt that most things would work at all that way;
  • there is at least one network interface present on the system; this usually is better achieved by making sure that net.lo is started instead; especially since in most cases for situations like this what you’re looking for is really whether 127.0.0.1 is usable;
  • there is an external interface connected; okay sure, so what are you doing with that interface? because I can assure you that you’ll find eth0 up … but no cable is connected, what about it now?
  • there is Internet connectivity available; this would make sense if it wasn’t for the not-insignificant detail that you can’t really know that from the init system; this would be like having a “need userpresence” that makes sure that the init script is started only after the webcam is turned on and the user face is identified.

While some of these particular notions have use cases, the fact that there is no clear identification of what that “need net” is supposed to be makes it extremely unreliable, and at this point, especially considering all the various options (oldnet, newnet, NetworkManager, connman, flimflam, LXC, vserver, …) it’s definitely a better idea to get rid of it and not consider it anymore. Unfortunately, this is leading us into a relative world of pain, but sometimes you have to get through it.

7 thoughts on “May I have a network connection, please?

  1. Hmm. Yanking out the “need net” bug is pretty precipitous. I was going to look for a discussion on it in the forums, but I find that like a lot of Gentoo sites in the last two or three days, the forums are having problems.Maybe they’re running keyworded OpenRC :)I seem to recall that the systemd people ran into this same kind of issue a while back: is it the loading of modules, the bringing up of loopback, the bringing up of an external interface, the existence of a default route, connectivity to a specific host, or, as you say, the bringing up of a particular interface. They may have solved the issue–likely with some klumsy, heavy-handed solution–and it looks like we’re up against the same issue.I’m guessing that the solution is to split the ‘net’ of “need net” into a set of finer dependencies like net.modules, net.lo, net.eth0, net.routable, and so on. That also means if that you need something finer like a connection to a particular host, you could write your own service script for the purpose.Of course, if I could read the forums, I might be able to see the discussion about this.

    Like

  2. For everything that just wants to bind, there’s a nice IP_FREEBIND (or even ip_nonlocal_bind in proc) to just let the application bind and the kernel later match it up with an interface. See http://man7.org/linux/man-p…And for anything that needs to connect out… Well things get a lot more complicated anyway.

    Like

  3. Ah thanks martin, didn’t know about FREEBIND.Mike, as you points out, the only option one can have is to split the dependencies … for the most part, this is not really needed as even if you decide to make it “need net.lo”, @lo@ is available starting from boot time, so there’s nothing stopping you there. The routable and so on is really difficult to deal with in generic terms though.

    Like

  4. I just sent Diego a private message about this.We didn’t actually remove “provide net” from the OpenRC scripts; we just changed the definition so that the loopback interface doesn’t provide net. In other words, the “net” virtual now refers to non-local connections. This was discussed some time back on the gentoo-dev mailing list.However, everything in the post is correct besides that. For the future, I highly recommend not referring to the net virtual at all for the same reasons Diego mentions in the post.

    Like

  5. Hi DiegoMaybe there would be good idea to make optional Provide parameter for network connection.And in that case admin would have possibility to define Provided service as for example net_vlan, net_local. And then make by himself what should be depend on such net services.In case if nothing defined such a services would provide standard net.

    Like

  6. If I understood correctly, one can revert to the previous behavior by adding:rc_provide=”net”… to /etc/conf.d/net.loI personally prefer this behavior, as most of my network services are completely internal, and don’t need any external access… Well, I do currently have rc_depend_strict=”YES” (default) in my /etc/rc.conf, so if a NIC fails to initialize, no network service are started anyway… not sure if anything beside network interfaces are concerned by multiple providers in my current configuration… Well, considering my NICs use a static configuration, there is seldom any problem bringing them up, so I guess I should just not care :3

    Like

  7. As far as I understand the situation now (late January 2013, OpenRC 0.11.8) LXC network setup can be viewed as having two main classes of use case.1. Host-preconfigured interfaces prior to the container start can provide immediate connectivity.2. Container-based processes need to run to establish connectivity (eg. VPN setup, DHCP, etc.)For class 1, the best solution I’ve found seems to be adding ‘rc_provide=”net”‘ to /etc/rc.conf which mutes loads of errors due to the unmet dep (pydoc, netmount, etc.). This usually also requires an lxc.conf directive for a net.x.up script to configure the host-local end of the veth connection.For class 2, adding the appropriate net.x init script to the default runlevel as per normal seems to work (though I am not really testing this just yet, I will get around to it in the next couple of days. I am happy to note the ~recentish(?) use of busybox udhcpd by gentoo, which saves setup hassle).Note that for virtual infrastructure maintenance/portability purposes (ie. easily shift a container to a totally new host on a new network topology), I assume that class 2 will eventually become dominant for servers.The above is reflected in the latest lxc-gentoo source, which boots a gentoo guest in a couple of hundred milliseconds on my test platform (relatively ‘slow’ itself due to being paravirtualized in VMware).LXC and control groups are great! :) Go team.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s