This Time Self-Hosted
dark mode light mode Search

A good reason not to use network bridges

So one of the things I’m working on for my job is to look to set up Linux Containers to separate some applications — yes I know I’m the one who said that they are not ready for prime time but please note that what I was saying is that I wouldn’t give root inside a container to anybody I would trust — which is not the same as to say that they are not extremely useful to limit the resource consumption of various applications.

Anyway, there is one thing that has to be considered, of which I already quickly wrote about : networking. The simplest way to set up a LXC host, if your network is a private one, with a DHCP server or something along those lines, is to create one single bridge between your public network interface and the host-side of virtual Ethernet pairs — this has one unfortunate side effect: to make it working, it puts the network interface in promiscuous mode, which means that it receives all the packets directed to any other interface, which slows it down quite a bit.

So how do you solve the issue? Well, I’m honestly not sure whether macvlan improves the situation, I’m afraid not. What I decided for Excelsior, since it is not on a private network, was to set up an internal bridge, and have static IP addresses set to internal IPs. When i need to jump into one of the containers, I simply use the main public IP as an SSH jumphost and then connect to the correct address. I described the setup before although I made then a further change so now I don’t have to bother with the private IP addresses in the configuration file: I use the public IPv6 AAAA record for the containers, which simply resolve as usual once inside my jumphosts.

Of course with the exception of jumphosts, that kind of settings, which involve using NAT on iptables, has no way to receive connections from the outside.

So what other options are there? One thing I’ve been thinking about was to use a level-3 managed switch and set it to route a subnet to the LXC host — but that wouldn’t fly too much. So at the end the question would be “what is it that I need access on the containers form the outside?” and the answer is simply “the websites”. The containers provide a number of services, but only the websites are mapped to the outside. So, do I need IPs that are even partially public? Not really.

The solution I’m planning right now is that I’ll set up a box with either an Apache reverse-proxy or some other reverse proxy (depending on how much we want to handle on the proxy itself), and have that contact the internal containers, the same way it would be if you had one reverse proxy on the Internet, and the servers on the internal network.

I guess at some point I should overhaul the LXC wiki page for what concerns networking; I already spent some time to remove some duplicated content and actually sync it with what’s going on on the ebuild…

Comments 14
  1. Have you looked at the virtual distributed ethernet project, and if, what do you think?

  2. As far as I can tell vde doesn’t do anything that the bridge already doesn’t do. The problem is that Ethernet cards only listen to one mac address (maybe two if they are shared with an IPMI management interface).

  3. Isn’t this a bit unnecessary micro-optimization? I run quite a few beefy machines with dozens or so VMs on top in bridging mode and have never felt unnecessary load from the network.

  4. Ethernet bridging is as you say not the most performant mode. There is project that strives to fix that, and the tests I’ve run looks promising, though I’m more interested in throughput. The project is run by Luigi Rizzo:

  5. Kevin, I’ve seen systems being brought down by interrupt storms when connected to a high-traffic network, recently as well. When you run your servers way too near to the operation limits, it’s not a good idea to abuse network bridges either.Andreas it seems to me like that’s an improvement over vde, but with two main twists: it’s implemented within QEmu instead of deferring to the kernel, and doesn’t (seem to) require a daemon to run. This is all fine and dandy but I don’t think it solves much for LXC, where it would require to have userspace-kernel-userspace connections instead.I think MACVLAN would still solve the issue for LXC, in regard to performance; its VEPA mode should make it very easy for the containers to talk with one another, the only problem is that you wouldn’t be able to connect from the host to the guests.

  6. Why have you avoided Linux-vservers? Battle hardened, can be integrated with pax/grsec with some work and reverse nat mapping is very simple for networking? I love it…

  7. Why is it necessary to run interfaces in promiscuous mode? I imagine it would be necessary to imitate a full-function L2 bridge on Linux, but you don’t need that. Find a stack that supports Proxy ARP and away you go.

  8. Have any suggestions? The main issue is that by handling LXC manually, it seems like there is very little you can do about ProxyARP unless you actually set it up manually. Which I guess is also an option — _but_ it doesn’t work for IPv6 if I’m not mistaken.

  9. I might be missing something here but can’t you just use regular routing towards your subnet using your LXC host as the gateway (without the managed switch in between)?

  10. kitanatahu, that’s one easy way out _if_ you can set up static routes to know how to reach a given subnet. Which might actually be what I end up doing, but it’s not immediate anyway.Nico, openvswitch is a solution to a much more complex problem, and I guess it might be more complex than it’s worth in my use cases.

  11. (Haven’t read the post yet, sorry)It appears your fancy bot filtering system blocks the Opera Turbo service. You might want to look into that.

  12. Can you describe the problem a bit more ?As we all use switched networking, enabling promiscuous mode on a nic shouldhave very little effect – that link to that NIC is not going to receive much other trafficthat is not destined for itself. Except if there’s some broadcast storms on the network,in which case you usually have more severe network problems.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.