Wrong abstractions; bad abstractions

In the past weeks I’ve been working again on LScube and in particular working toward improving the networking layer of feng. Up until now, we relied heavily on our own library (netembryo) to do almost all of the networking work. While this design looked nice at the start, we reached the point where:

  • the original design had feng with multiple threads to handle “background tasks”, and a manually-tailored eventloop, which required us to reinvent the wheel so much that it was definitely not funny; to solve this we’ve gone to use libev, but that requires us to access the file descriptor directly to set the watcher;
  • we need to make sure that the UDP socket is ready to send data to before sending RTP and RTCP data, and to do so, we have to go down again on the file descriptor;
  • while the three protocols currently supported by feng for RTP transports (UDP, TCP and SCTP) were all abstracted by netembryo, we had to branch out depending on the used protocol way deep inside feng, as the working of the three of them was very different (we had similitudes between UDP and SCTP, and between SCTP and TCP interleaved, but they were not similar enough in any way!); this resulted in double-branching, which the compiler – even with LTO – will have a hard time understanding (since it depends on an attribute of the socket that we knew already;
  • the size of objects in memory was quite bigger than it should have been, because we had to keep them around all the needed information for all the protocols, and at the same time we had to allocate on the stack a huge number of identical objects (like the SCTP stream information to provide the right channel to send the RTP and RTCP packets to);
  • we’ve been resolving the client’s IP addresses repeatedly every time we wanted to connect the UDP sockets for RTP and RTCP (as well as resolving our own UP address0;
  • we couldn’t get proper details about the errors with network operations, nor we could fine-tune those operations, because we were abstracting all of it away.

While I tried fixing this by giving netembryo a better interface; I ended up finding that it was much, much simpler to deal with the networking code within feng without the abstraction; indeed, in no case we needed to go the full path down to what netembryo used to be beforehand; we always ended up short from that, skipping more than a couple of steps. For instance, the only place where we actually go through with the address resolution is the main socket binding code, where we open the port we’ll be listening on. And even there, we don’t need the “flexibility” (faux-flexibility actually) that netembryo gave us before: we don’t need to re-bind an already open socket; we also don’t want to accept one out of a series of possible addresses, we want all of them or none (this is what helps us supporting IPv6 by the way).

The end result is not only a much, much smaller memory footprint (the Sock structure we used before was at least 440 bytes, while we can stay well behind the 100 bytes per socket right now), but also less dependencies (we integrate all the code within the feng project itself), less source code (which is always good, because it means less bug), tighter handling of SCTP and interleaved transmission, and more optimised code after compilation. Talk about win-win situations.

Unfortunately not everything is positive from what we saw up to now; the fact that we have no independent implementation (yet) of SCTP makes it quite difficult to make sure that our code is not bugged in some funny way; and even during this (almost entire) rewrite of network code, I was able to find a number of bugs, and a few strange situations that I’ll have to deal with right now, spending more than a couple of hours to make sure that it’s not going to break further on. It also shows we need to integrate valgrind within our current white-box testing approach to make sure that our functions don’t leak memory (I found a few by running valgrind manually).

To be honest, most of the things I’ve been doing now are nothing especially difficult for people used to work with Unix networking APIs, but I most definitely am not an expert of those. I’m actually quite interested in the book if it wasn’t that I cannot get it for the Reader easily. So if somebody feels like looking at the code and tell me what I can improve further, I’d be very happy.

One thing we most likely will have to pick up, though, is the SCTP userland code which right now has at least a few bugs regarding the build system, a couple of which we’re working around in the ebuild. So I won’t have time to be bored for a while still…

Debian, Gentoo, FreeBSD, GNU/kFreeBSD

To shed some light and get around the confusion that seems to have taken quite a bit of people who came to ask me what I think about Debian adding GNU/kFreeBSD to the main archive, I’d like to point out, once again, that Gentoo/FreeBSD has never been the same class of project as Debian’s GNU/kFreeBSD port. Interestingly enough, I already said this before more than three years ago.

Debian’s GNU/kFreeBSD uses the FreeBSD kernel but keeps the GNU userland, which means the GNU C Library (glibc), the GNU utilities (coreutils) and so on so forth; on the other hand, Gentoo/FreeBSD uses both the vanilla FreeBSD kernel, and mostly vanilla userland. With mostly I mean that some parts of the standard FreeBSD userland are replaced, with either compatible, selectable or updated packages. For instance instead of shipping sendmail or the ISC dhcp packages as part of the base system, Gentoo/FreeBSD leaves them to be installed as extra packages, just like you’d do with Gentoo. And you can choose whichever cron software you’d like instead of using the single default provided by the system.

But, if a software is designed to build on FreeBSD, it usually builds just as fine on Gentoo/FreeBSD; rarely there are troubles, and most of the time the trouble are with different GCC versions. On the other hand, GNU/kFreeBSD require most of the system-dependant code to be ported, xine already has undergone this at least a couple of time for instance.

I sincerely am glad to see that Debian finally came to the point of accepting GNU/kFreeBSD into main; on the other hand, I have no big interest on it beside as a proof of concept; there are things that are not currently supported by glibc even on Linux, like SCTP, which on FreeBSD are provided by the standard C library; I’m not sure if they are going to port the Linux SCTP library to kFreeBSD or if they decided to implement the interface inside glibc. If that last one is the case, though, I’d be glad because it would finally mean that the code wouldn’t be left as stale.

So please, don’t mix in Gentoo/FreeBSD with Debian’s GNU/kFreeBSD. And don’t even try to call it Gentoo GNU/FreeBSD like the Wikipedia people tried to do.

More virtually real troubles

So after fighting with QEmu and surrendering to KVM I finally got a FreeBSD 7.1 vanilla instance, and an OpenSolaris instance running; I made sure that feng builds on both, and since I was there I also fixed up the SCTP autoconf check on both, so that feng can ideally speak SCTP with both of them.

A note here for those interested: SCTP (Stream Control Transmission Protocol) is a protocol, alternative to TCP and UDP, that is designed to work well for streaming applications; the fact that feng supports it is more a proof of concept than an actually useful feature, I’m sincerely not sure how well it works nowadays, but since I had to fight to get it to build correctly on Linux already, I just wanted to fix it up for FreeBSD and Solaris implementations as well; I assumed that Apple had its own implementation as well but even though there are APPLE defines in the FreeBSD implementations, at least OS X 10.5 lacks any SCTP support that I can see.

I already have reserved a logical volume for Gentoo/FreeBSD 7.1 which I’m hopefully going to test today, but in the mean time I wanted to fix up NetBSD too, since I have seen that it also has an SCTP stack, and since none of the three we support now is identical to the other it seemed worth looking into it; unfortunately NetBSD is proving to have no network to offer me. While I set up the KVM instance just like any other, no matter which model I use I can see no device in ifconfig -a output of NetBSD; I have chosen the full installation, but still it doesn’t seem to have much. The documentation also doesn’t seem to help.

I guess NetBSD will keep waiting in line for now, unless somebody has a suggestion on how to deal with it.