Protecting yourself from R-U-Dead-Yet attacks on Apache

Do you remember the infamous “slowloris” attack over HTTP webservers? Well, it turns out there is a new variant of the same technique, that rather than making the server wait for headers to arrive, makes the server wait for POST data before processing; it’s difficult to explain exactly how that works, so I’ll leave it to the expert explanation from ModSecurity

Thankfully, since there was a lot of work done to cover up the slowloris attack, there are easy protections to be put in place, the first of which would be the use of mod_reqtimeout… unfortunately, it isn’t currently enabled by the Gentoo configuration of Apache – see bug #347227 – so the first step is to workaround this limitation. Until the Gentoo Apache team appears again, you can do so simply by making use of the per-package environment hack, sort of what I’ve described in my previous nasty trick spost a few months ago.

# to be created as /etc/portage/env/www-servers/apache

export EXTRA_ECONF="${EXTRA_ECONF} --enable-reqtimeout=static"

*Do note that here I’m building it statically; this is because I’d suggest everybody to build all the modules as static; the overhead of having them as plugins is usually quite higher than what you’d have for loading a module that you don’t care about.*

Now that you got this set up, you should ensure to set a timeout for the requests; the mod_reqtimeout documentation is quite brief, but shows a number of possible configurations. I’d say that in most cases, what you want is simply the one shown in the ModSecurity examples. Please note that they made a mistake there, it’s not RequestReadyTimeout but RequestReadTimeout.

Additionally, when using ModSecurity you can stop the attack on its track after a few requests timed out, by blacklisting the IP and dropping its connections, allowing slots to be freed for other requests to arrive; this can be easily configured through this snipped, taken directly from the above-linked post:

RequestReadTimeout body=30

SecRule RESPONSE_STATUS "@streq 408" "phase:5,t:none,nolog,pass, setvar:ip.slow_dos_counter=+1,expirevar:ip.slow_dos_counter=60"
SecRule IP:SLOW_DOS_COUNTER "@gt 5" "phase:1,t:none,log,drop, msg:'Client Connection Dropped due to high # of slow DoS alerts'"

This should let you cover yourself up quite nicely, at least if you’re using hardened, with grsecurity enforcing per-user limits. But if you’re using hosting where you don’t have decision over the kernel – as I do – there is one further problem: the init script for apache does not respect the system limits at all — see bug #347301 .

The problem here is that when Apache is started during the standard system init, there are no limits set for the session is running from, and since it doesn’t use start-stop-daemon to launch the apache process itself, no limits are applied at all. This results in a quite easy DoS over the whole host as it will easily exhaust the system’s memory.

As I posted on the bug, there is a quick and dirty way to fix the situation by editing the init script itself, and change the way Apache is started up:

# Replace the following:
        ${APACHE2} ${APACHE2_OPTS} -k start

# With this

        start-stop-daemon --start --pidfile "${PIDFILE}" ${APACHE2} -- ${APACHE2_OPTS} -k start

This way at least the system generic limits are applied properly. Though, please note that start-stop-daemon limitations will not allow you to set per-user limits this way.

On a different note, I’d like to spend a few words on telling why this particular vulnerability is interesting to me: this attack relies on long-winded POST requests that might have a very low bandwidth, because just a few bytes are sent before the timeout is hit… it is not unlike the RTSP-in-HTTP tunnelling that I have designed and documented in feng during the past years.

This also means that application-level firewalls will start sooner or later filtering these long-winded requests, and that will likely put the final nail on the coffin of the RTSP-in-HTTP tunnelling. I guess it’s definitely time for feng to move on and implement real HTTP-based pseudo-streaming instead.

Implementing the future! (or, how feng is designing secure RTSP)

While LScube is no longer – for me – a paid job project (since I haven’t been officially paid to work on it since May or so), it’s a project that intrigues me well enough to work on it on free time, rather than, say, watch a movie. The reason is probably to find in situations like those in the past few days, when Luca brings up a feature to implement that either no one else has implemented before, or simply needs to be re-implemented from scratch.

Last time I had to come up with a solution to implement something this intricate was when we decide to implement Apple’s own HTTP tunnelling feature whose only other documentation I know of, beside the original content from Apple is the one I wrote for LScube itself.

This time, the challenge Luca shown me is something quite more stimulating because it means breaking new ground that up to now seems to have been almost entirely missing: RTSP recording and secure RTSP.

The recording feature for RTSP has been somewhat specified by the original RFC but it was left quite up in the air as to what it should have been used for. Darwin Streaming Sever has an implementation, but beside the fact that we try not to look at it too closely (to avoid pollination), we don’t agree on the minimum amount of security that it provides as far as we can tell. But let’s start from the top.

The way we need RECORD to work, is to pass feng (the RTSP server) a live stream, which feng will then multiplex to the connected users. Up to now, this task was care of a special protocol between flux (the adaptor) and feng, which also went to a few changes over time: originally it used UDP over localhost, but we had problems with packet loss (no I can’t even begin to wonder why) and in performance; nowadays it uses POSIX message queues, which are supposedly portable and quite fast.

This approach has, objectively, a number of drawbacks. First of all the MQ stuff is definitely not a good idea to use; it’s barely implemented in Linux and FreeBSD, it’s not implemented on Darwin (and for a project whose aim is to replace Darwin Streaming Server this is quite a bad thing) and even though it does work on Linux, its performances aren’t exactly exciting. Also, using POSIX MQs is quite tricky: you cannot simply write an eventloop-based interface with libev because, even though they are descriptors in their implementation, an MQ descriptor does not support the standard poll() interface (nor any of the more advanced ones) so you end up with a thread blocking in reading… and timing out to make sure that flux wasn’t restarted in the mean time, since in that case, a new pair of MQs would have been created.

This situation was also not becoming any friendlier by the fact that, even though the whole team is composed of fewer than five developers, flux and feng are written in two different languages, so the two sides of the protocol implemented over MQs are also entirely separated. And months after I dropped netembryo from feng it is still used within flux, which is the one piece of the whole project that needed it the least, as it uses none of SCTP, TCP and SSL abstractions, and the interfaces of the two couldn’t be more far apart (flux is written in C++ if you didn’t guess, while netembryo is also written in C).

So it’s definitely time to get rid of the middleware and let the producer/adapter (ffmpeg) talk directly with the server/multiplexer (feng). This is where RECORD is necessary: ffmpeg will open a connection to feng via the same RTSP protocol used by the clients, but rather than ask to DESCRIBE a resource, it will ANNOUNCE one; then it will SETUP the server so that it would open RTP/RTCP ports, and start feeding data to the server. Since this means creating new resources and making it available to third parties (the clients) – and eventually creating files on the server’s disk, if we proceed to the full extent of the plan – it is not something that you want just any client to be able to do: we want to make sure that it is authorised to do so — we definitely don’t want to pull a Diaspora anytime soon.

Sequence Diagram of feng auth/record support

And in a situation like this, paying for a decent UML modeler, shows itself as a pretty good choice. A number of situations became apparent to me only when I went to actually think of the interaction to draw the pretty diagrams that we should show to our funders.

Right now, though, we don’t even support authentication, so I went to look for it; RTSP was designed first to be an HTTP-compatible protocol, so the authentication is not defined in RFC2326, but rather to be defined by HTTP itself; this means that you’ll find it in RFC 2616 instead — yes it is counterintuitive, as the RFC number is higher; the reason is simple though: RFC2616 obsoletes the previous HTTP description in 2068 which is the one that RTSP refers to; we have no reason to refer to the old one knowing that a new one is out, so…

You might know already that HTTP defines at least two common authentication schemes: Basic (totally insecure) and Digest (probably just as insecure, but obfuscated enough). The former simply provides username and password encoded in base64 — do note: encoded, not encrypted; it is still passed in what can be called clear text, and is thus a totally bogus choice to take. On the other hand, Digest support provides enough features and variables to require a very complex analysis of the documentation before I can actually come up with a decent implementation. Since we wanted something more akin to a Proof-of-Concept, but also something safe enough to use in production, we had to find a middle ground.

At this point I had two possible roads in front of me. On one side, implementing SASL looked like a decent solution to have a safe authentication system, while keeping the feng-proper complexity at bay; unfortunately, I still haven’t found any complete documentation on how SASL is supposed to be implemented as part of HTTP; the Wikipedia page I’m linking to also doesn’t list HTTP among the SASL-supported protocols — but I know it is supposed to work, both because Apache has a mod_authn_sasl and because if I recall correctly, Microsoft provides ActiveDirectory-authenticated proxies, and again if my memory serves me correctly, ActiveDirectory uses SASL. The other road looked more complex at first but it promised much with (relatively) little effort: implementing encrypted, secure RTSP, and require using it to use the RECORD functionality.

While it might sound a bit of overkill to require an encrypted secure connection to send data that has to be further streamed down to users, keep in mind that we’re talking only of encrypting the control channel; in most cases you’d be using RTP/RTCP over UDP to send the actual data, and the control stream can even be closed after the RECORD request is sent. We’re not yet talking about implementing secure RTP (which I think was already drafted up or even implement), but of course if you were to use a secure RTSP session and interleaved binary data, you’d be getting encrypted content as well.

At any rate, the problem with this became how to implement the secure RTSP connection; one option would have been using the oldest trick in the book and use two different ports, one for clear-text RTSP and another for the secure version, as is done for HTTP and HTTPS. But I also remembered that this approach has been frown upon and deprecated for quite a bit. Protocols such as SMTP and IMAP now use TLS through a STARTTLS command that replaces a temporarily clear-text connection with an encrypted TLS on; I remember reading about a similar approach for HTTP and indeed, RFC 2817 describes just that. Since RTSP and HTTP are so similar, there should be no difficulty into implementing the same idea over RTSP.

Sequence Diagram of feng TLS/auth/RECORD support

Now in this situation, implementing secure ANNOUNCE support is definitely nicer; you just refuse to accept methods (in write-mode) for the resource unless you’re over a secure TLS connection; then you also ensure that the user authenticated before allowing access. Interestingly, here the Basic authentication, while still bad, is acceptable for an usable proof of concept, and in the future it can easily be extended to authenticate based on client-side certificates without requiring 401 responses.

Once again, I have to say that having translated Software Engineering wasn’t so bad a thing; even though I usually consider UML being just a decent way to produce pretty diagrams to show the people with the gold purse, doing some modelling of the interaction between the producer and feng actually cleared up my mind enough to avoid a number of pitfalls, like doing authentication contextually to the upgrade to TLS, or forgetting about the need to provide two responses when doing the upgrade.

At any rate, for now I’ll leave you with the other two diagrams I have drawn but wouldn’t fit the post by themselves, they’ll end up on the official documentation — and if you wonder what “nq8” stands for, it’s the default extension generated by pngnq: Visual Paradigm produces 8-bit RGB files when exporting to PNG (and the SVG export produces files broken with RSVG, I have to report that to the developers), but I can cut in half the size of the files by quantising them to a palette.

I’m considering the idea of RE part of the design of feng into UML models, and see if that gives me more insight on how to optimise it further, if that works I should probably consider spending more time over UML in the future… too bad that reading O’Reilly PDF books on the reader is so bad, otherwise I would be reading through all of “Learning UML 2.0” rather than skimming through it when I need.

Why do FLOSS advocates like Adobe so much?

I’m not sure how this happens, but I see more and more often FLOSS advocates that support Adobe, and in particular Flash, in almost any context out there, mostly because they are now appearing a lot like an underdog, with Microsoft and Apple picking on them. Rather than liking the idea of cornering Flash as a proprietary software product out of the market, they seem to acclaim any time Adobe gets a little more advantage over the competition, and cry foul when someone else tries to ditch them:

  • Microsoft released Silverlight; which is evil – probably because it’s produced by Microsoft, or in alternative because it uses .NET that is produced by Microsoft – we have a Free as in Speech implementation of it in Novell’s Moonlight; but FLOSS advocates ditch on that: it’s still evil, because there are patents in .NET and C#; please note that the only implementation I know of Flash in the FLOSS world is Gnash which is not exactly up-to-speed with the kind of Flash applets you find in the wild;
  • Apple’s iPhone and iPad (or rather, all the Apple devices based on iPhone OS iOS) don’t support Flash, and Apple pushes content publishers to move to “modern alternatives” starting from the <video> tag; rather than, for once, agreeing with Apple and supporting that idea, FLOSS advocates decide to start name-calling them because they lack support for an ubiquitous technology such as Flash — the fact that Apple’s <video> tag suggestions were tied to the use of H.264 shouldn’t have made any difference at all, since Flash does not support Theora, so with the exclusion of the recently released WebM in the latest 10.1 version of the Flash Player, there wouldn’t be any support for “Free formats”;
  • Adobe stirs up a lot of news declaring support for Android; Google announces Android 2.2 Froyo, supporting Flash; rather than declaring Google an enemy of Free Software for helping Adobe spread their invasive and proprietary technology, FLOSS advocates start issuing “take that” comments toward iPhone users as “their phone can see Flash content”;
  • Mozilla refuses to provide any way at all to view H.264 files directly in their browser, leaving users unable to watch Youtube without Flash unless they do a ton of hacky tricks to convert the content into Ogg/Theora files; FLOSS advocates keep on supporting them because they haven’t compromised;

What is up here? Why should people consider Adobe a good friend of Free Software at all? Maybe because they control formats that are usually considered “free enough”: PostScript, TIFF (yes they do), PDF… or because some of the basic free fonts that TeX implementations and the original X11 used come from them. But all of this doesn’t really sound relevant to me: they don’t provide a Free Software PDF implementation, rather they have their own PDF reader, while the Free implementations often have to run fast towards, with mixed results, to keep opening new PDF files. As much as Mike explains the complexity of it all, the Linux Flash player is far from being a nice piece of software, and their recent abandon of the x86-64 version of the player makes it even more sour.

I’m afraid that the only explanation I can give to this phenomenon is that most “FLOSS advocates” line themselves straight with, and only with, the Free Software Foundation. And the FSF seem to have a very personal war against Microsoft and Apple; probably because the two of them actually show that in many areas Free Software is still lagging behind (and if you don’t agree with this statement, please have a reality check and come back again — and this is not to say that Free Software is not good in many areas, or that it cannot improve to become the best), which goes against their “faith”. Adobe on the other hand, while not really helping Free Software out (sorry but Flash Player and Adobe Reader are not enough to say that they “support” Linux; and don’t try to sell me that they are not porting Creative Suite to Linux just so people would use better Free alternatives).

Why do I feel like taking a shot at FSF here? Well, I have already repeated multiple times that I love the PDFreaders.org site from the FSFe; as far as I can see, FSF only seem to link to it in one lost and forgotten page, just below a note about CoreBoot … doesn’t make it any prominent. Also, I couldn’t find any open letter that blame PDF for being a Patent-risky format, which instead is present in the PDFreaders site:

While Adobe Systems grants a royalty-free use of any patents to the PDF format, in any application that adheres to the PDF specifications, other companies do hold patents that may limit the openness of the standard if enforced.

As you can see, the first part of the sentence admits that there are patents over the PDF format, but royalty-free use is granted… from Adobe at least, but nothing from eventual other parties that might have them.

At any rate, I feel like there is a huge double-standard issue here: anything that comes out of Microsoft or Apple, even with Free Software licenses or patent pledges is evil; but proprietary software and technologies from Adobe are fine. It’s silly, don’t you think so?

And for those who still would like to complain about websites requiring Silverlight to watch content, I’d like to propose a different solution to ask for: don’t ask for them to provide it with Flash, but rather with a standard protocol, for which we have a number of Free Software implementations, as well as being supported on the mainstream operating systems for both Desktops and mobile phones: RTSP is such a protocol.

Wrong abstractions; bad abstractions

In the past weeks I’ve been working again on LScube and in particular working toward improving the networking layer of feng. Up until now, we relied heavily on our own library (netembryo) to do almost all of the networking work. While this design looked nice at the start, we reached the point where:

  • the original design had feng with multiple threads to handle “background tasks”, and a manually-tailored eventloop, which required us to reinvent the wheel so much that it was definitely not funny; to solve this we’ve gone to use libev, but that requires us to access the file descriptor directly to set the watcher;
  • we need to make sure that the UDP socket is ready to send data to before sending RTP and RTCP data, and to do so, we have to go down again on the file descriptor;
  • while the three protocols currently supported by feng for RTP transports (UDP, TCP and SCTP) were all abstracted by netembryo, we had to branch out depending on the used protocol way deep inside feng, as the working of the three of them was very different (we had similitudes between UDP and SCTP, and between SCTP and TCP interleaved, but they were not similar enough in any way!); this resulted in double-branching, which the compiler – even with LTO – will have a hard time understanding (since it depends on an attribute of the socket that we knew already;
  • the size of objects in memory was quite bigger than it should have been, because we had to keep them around all the needed information for all the protocols, and at the same time we had to allocate on the stack a huge number of identical objects (like the SCTP stream information to provide the right channel to send the RTP and RTCP packets to);
  • we’ve been resolving the client’s IP addresses repeatedly every time we wanted to connect the UDP sockets for RTP and RTCP (as well as resolving our own UP address0;
  • we couldn’t get proper details about the errors with network operations, nor we could fine-tune those operations, because we were abstracting all of it away.

While I tried fixing this by giving netembryo a better interface; I ended up finding that it was much, much simpler to deal with the networking code within feng without the abstraction; indeed, in no case we needed to go the full path down to what netembryo used to be beforehand; we always ended up short from that, skipping more than a couple of steps. For instance, the only place where we actually go through with the address resolution is the main socket binding code, where we open the port we’ll be listening on. And even there, we don’t need the “flexibility” (faux-flexibility actually) that netembryo gave us before: we don’t need to re-bind an already open socket; we also don’t want to accept one out of a series of possible addresses, we want all of them or none (this is what helps us supporting IPv6 by the way).

The end result is not only a much, much smaller memory footprint (the Sock structure we used before was at least 440 bytes, while we can stay well behind the 100 bytes per socket right now), but also less dependencies (we integrate all the code within the feng project itself), less source code (which is always good, because it means less bug), tighter handling of SCTP and interleaved transmission, and more optimised code after compilation. Talk about win-win situations.

Unfortunately not everything is positive from what we saw up to now; the fact that we have no independent implementation (yet) of SCTP makes it quite difficult to make sure that our code is not bugged in some funny way; and even during this (almost entire) rewrite of network code, I was able to find a number of bugs, and a few strange situations that I’ll have to deal with right now, spending more than a couple of hours to make sure that it’s not going to break further on. It also shows we need to integrate valgrind within our current white-box testing approach to make sure that our functions don’t leak memory (I found a few by running valgrind manually).

To be honest, most of the things I’ve been doing now are nothing especially difficult for people used to work with Unix networking APIs, but I most definitely am not an expert of those. I’m actually quite interested in the book if it wasn’t that I cannot get it for the Reader easily. So if somebody feels like looking at the code and tell me what I can improve further, I’d be very happy.

One thing we most likely will have to pick up, though, is the SCTP userland code which right now has at least a few bugs regarding the build system, a couple of which we’re working around in the ebuild. So I won’t have time to be bored for a while still…

RTSP clients’ special hell

This week, in Orvieto, Italy, there was OOoCon 2009 and the lscube team (also known as “the rest of the feng developer beside me”) was there to handle the live audio/video steaming.

During the preparations, Luca called me one morning, complaining that the new RTSP parser in feng (which I wrote almost single handedly) refused to play nice with the VLC version shipped with Ubuntu 9.04: the problem was tracked down to be in the parser for the Range header, in particular in the normal play time value parsing: the RFC states that I’m expecting a decimal value with a dot (.) as the separator, but VLC is sending a comma (,) which my parser is refusing.

Given Luca actually woke me up while I was in bed, it was a strange presence of mind that let me ask him which language (locale) was the system set in: Italian. Telling him to try using the C locale was enough to get VLC to comply with the protocol. The problem here is that the separators for decimal places and thousands are locale-dependent characters; while most programming languages obviously limit themselves at supporting the dot, and a lot of software likewise use that no matter what the locale is (for instance right now I have Transmission open and the download/upload stats use the dot, even though my system is configured in Italian). Funny that this problem came up during an OpenOffice event, given that’s definitely one of the most known software that actually rely (and sometimes messes up) with that difference.

To be precise, though, the problem here is not with VLC by itself: the problem is with the live555 (badly named media-plugins/live in Gentoo) library, which provides the generic RTSP code for VLC (and MPlayer). If you ever wrote software that dealt with float to string conversion you probably know that the standard printf()-like interface does not respect locale settings; but live555 is a C++ library and it probably uses string streams.

At any rate, the bug was known and fixed already in live555, which is what Gentoo already have, and the contributed bundled libraries of VLC have (for the Windows and OS X builds), so those three VLC instances are just fine, but the problem is still present in both the Debian and Ubuntu versions of the package which are quite outdated (as xtophe confirmed). Since the RFC does not have any conflicting use of the comma in that particular place, given the extension of the broken package (Ubuntu 9.10 also have the same problem), we decided for working it around inside the feng parser, and accepting the comma-separated decimal value instead.

From this situation, I also ended up comparing the various RTSP clients that we are trying to work with, and the results are quite mixed, which is somewhat worrisome to me:

  • latest VLC builds for proprietary operating systems work fine (Windows and OS X);
  • VLC as compiled in Gentoo also work fine, thanks Alexis!
  • VLC as packaged for Debian (and Ubuntu) uses a very old live555 library; the problem described here is now worked around, but I’m pretty sure it’s not the only one that we’re going to hit in the future, so it’s not a good thing that the Debian live555 packaging is so old;
  • VLC as packaged in Fedora fails in many different ways: it goes in a loop for about 15 minutes saying that it cannot identify the host’s IP address, then it finally seem to be able to get a clue, so it’s able to request the connection but… it starts dropping frames, saying that it cannot decode and stuff like that (I’m connected over gigabit lan);
  • Apple’s QuickTime X is somewhat strange; on Merrimac, since I used it to test the HTTP tunnel implementation it now only tries connecting to feng via HTTP rather than using RTSP; this works fine with the branch that implements it but fails badly in master obviously (and it doesn’t look like QuickTime gets the hint of changing to RTSP protocol); on the other hand it works fine on the laptop (that has never used the tunnel in the first place), where it uses RTSP properly;
  • again Apple’s QuickTime, this time on Windows, seems to be working fine.

I’m probably going to have to check the VLC/live packaging of other distributions to see how many workaround for broken stuff we might have to look out for. Which means more and more virtual machines, I’ll probably have to get one more hard drive by this pace (or I could probably replace one 320G drive with a 500G drive that I still have at home…). And I should try totem as well.

Definitely, RTSP clients are a hell of a thing to test.

Apple’s HTTP tunnel, and new HTTP streaming

Finally, last night, I’ve been able to finish, at least in a side-branch, to support Apple’s RTSP-in-HTTP tunnelling support, as dictated by their specifications. Now that the implementation is complete (and really didn’t take that much work to support once the parser worked as needed), I can tell a few things about that specification and about Apple phasing it out in favour of a different, HTTP-only streaming system.

First of all the idea of supporting both the RTSP and the RTSP-in-HTTP protocol, while working with the same exact streaming logic behind the scenes, requires a much more flexible parser, which isn’t as easy because of the HTTP design which I already discussed. While of course, once the work is done, it’s done, the complexity of such a parser isn’t ignorable.

But, since the work was done in quite a short time for me, it’s really not that bad, if the technique worked as good as it’s supposed to. Unfortunately, that’s not the case. For instance, the default configuration of net-proxy/polipo (a French HTTP proxy), does not allow for the technique to work, because of the way this is designed to work: pipelining and re-use of the connection, which are very common things to do with proxies to try improving performance, usually wait for the server to complete a request before they are returned to the client; unfortunately the GET request that is made by the client is one that will never complete, as it is where the actual streaming will happen.

At the end, for testing, I found it definitely easier to use the good old squid for testing purposes, even though the documentation at one (very hidden) point explains which parameters to set to make it work with QuickTime. But it definitely mean that not all HTTP proxy will let this technique work correctly.

And it’s definitely not the only reason. Since the HTTP and RTSP protocols are pretty similar, even the documentation says that if it POSTed the RTSP requests directly, it would have been seen as a bad HTTP requet by the proxy; to avoid that the requests are sent base64-encoded (which means, bigger than the original). But while the data coming from the client is usually scrutinised more, proxies nowadays probably scrutinise the responses as well as the requests, to make sure that they are not dealing with a malicious server (phising or stuff like that); and if they do, they are very likely to find the response coming from the GET request quite suspicious, likely considering it a tentative to HTTP response splitting (which is a common webapp vulnerability).

Now, of course it would have been possible for Apple to simply upgrade the trick by encoding the response as well as the request, but that has one huge drawback: it would both increase the latency of the stream (because the base64 content would have to be decoded before it’s used) and at the same time it would increase the size of the response, by ⅓, one third, due to that kind of encoding). Another alternative would have been to simply encode with base64 the pure RTSP responses, and keep unencoded the RTP streams (which are veicolated over interleaved RTSP). Unfortunately this would have required more work, since at that point, the GET body wouldn’t be simply be stream-compatible with a pure RTSP stream , and thus wouldn’t be very transparent for either the client nor the server.

On the other hand, the idea of implementing that as an extension hasn’t entirely disappeared in my mind; since the channels one and following are used by the RTP streams, the channel code zero is still unused, and would make it possible to simply use that to send the RTSP response encoded in base64. At least in feng this wouldn’t require huge changes to the code, since we already consider a special channel zero for the SCTP connection.

With all these details considered, I can understand why Apple was looking into alternatives. What I cannot understand is, still, what they decided to use as alternative, since the new HTTP Live Streaming protocol still looks tremendously hacky to me. Hopefully, our next step is rather going to be Adobe’s take at a streaming protocol .

HTTP-like protocols have one huge defect

So you might or might not remember that my main paid job in the past months (and right now as well) has been working on feng, the RTSP server component of the lscube stack .

The RTSP protocol is based off HTTP, and indeed uses the same message format as defined by the RFC822 text (the same used for email messages), and a request line “compatible” with HTTP.

Now, it’s interesting to know that this similitude between the two has been used, among other things, by Apple to implement the so-called HTTP tunnelling (see the QuickTime Streaming Server manual Chapter 1 Concepts, section Tunneling RTSP and RTP Over HTTP for the full description of that procedure). This feature allows clients behind standard HTTP proxies to access the stream, creating a virtual full-duplex communication between the two. Pretty neat stuff, even though Apple recently superseded it with the pure HTTP streaming that is implemented in QuickTime X.

For LScube we want to implement at a very least this feature, both server and client side, so that we can get on par with the QuickTime features (implementing the new HTTP-based streaming is part of the long haul TODO, but that’s beside the point now). To do that, our parser has to be able to accept the HTTP request and deal with them appropriately. For this reason, I’ve been working to replace the RTSP-specific parser to a more generic parser that accepts both HTTP and RTSP. Unfortunately, this turned out not to be a very easy task.

The main problem is that what we wanted to do was to do the least passes over the request line to get the data out; when we only supported RTSP/1.0 this was trivial: we knew exactly which method were supported, which ones appeared valid but weren’t supported (like RECORD) and which ones were simply invalid to begin with, so we set the value for the method passing by and then moved on to check the protocol. If the protocol was not valid, we cared not about the method anyway, but at worse we had to pass through a series of states for no good reason, but that wasn’t especially bad.

With the introduction of a simultaneous HTTP parser, the situation became much more complex: the methods are parsed right away, but the two protocols have different methods: the GET method that is supported for HTTP is a valid but not supported method for RTSP, and vice-versa when it comes to the PLAY method. The actions that handled the result of parsing of the method for the two protocols ended up executing simultaneously, if we were to use a simple union of state machines, and that, quite obviously, couldn’t have been the right thing to do.

Now, it’s really simple to understand that what we needed was a way to discern which protocol we’re trying to parse first, and then proceed to parse the rest of the line as needed. But this is exactly what I think is the main issue with the HTTP protocol and all the protocols, like RTSP, or WebDAV, that derive, or extend, it: the protocol specification is at the end of the request line. Since you usually parse a line in the latin order of characters (from left to right), you read the method before you know which protocol the client is speaking. This is easily solved by backtracking parsers (I guess LALR parsers is the correct definition, but parsers aren’t my field of work, usually, so I might be mistaken), since they first pass through the text to parse to identify which syntax to apply, and then they apply the syntax; Ragel is not such a parser, while kelbt (by the same author) is.

Time constrain and the fact that kelbt is even more sparingly documented than Ragel mean that I won’t be trying to use kelbt just yet, and for now I settled at trying to find an overcomplex and nearly unmaintainable workaround to have something working (since the parsing is going to be a black-box function, the implementation can easily change in the future when I learn some decent way to do that).

This all thing would have been definitely simpler if the protocol specification was at the start of the line! At that point we could just have decided the parsing further down the line depending on the protocol.

At this point I’m definitely not surprised that Adobe didn’t use RTSP and instead invented their own Real-Time Message Protocol not based on HTTP but is rather a binary protocol (which should also make it much easier to parse, to an extent).

Why RTSP?

Lately I’ve been writing more often about my work on feng and the lscube project ; the idea behind lscube is to get a well-working and well-scaling entirely free streaming software stack, both server-side (feng) and client-side (libnemesi), with the ability to stream live content (with flux). The protocol used by this stack is the Real Time Streaming Protocol, currently version 1, as designed by the RFC2326. RTSP, originated from RealNework, is just the control protocol, and uses out of band connections for sending the data, using, in our case, the RTP protocol (or it can use multiplexed connection, like interleaved RTSP or SCTP). The whole protocol description is quite tedious and is not what I’m interested in writing about right now.

What I think might be worth explaining is why we still care about RTSP, given that the reality of audio/video streaming lately seems to focus on the much more generic HTTP protocol (calling it Hyper Text Transfer Protocol was probably underestimating its actual use, I guess). Indeed, even cherokee implemented support for a/v streaming and while Alvaro shows how this can be used to implement the <video> tag, we also know that the video tag is not going to be the future at least not for the “big” stremaing sites. Indeed most streaming sites will try their best to avoid external players to access their content. But RTSP is, after all, being implemented by a very wide range of companies, including two open-source server software projects (Helix DNA Server by RealNetworks and Darwin Streaming Server by Apple), and both Apple and Microsoft with their own multimedia stacks.

The idea is not to use RTSP for small video streaming, which can, after all, very well be cached, but rather to have longer-playing content be streamed, with a few different advantages over the HTTP method. First of all, HTTP isn’t really practical for live streaming, either unicast or multicast doesn’t really matter here, it’s much easier to do with RTSP than HTTP. Also, RTSP allows for precise seeking and pausing, which HTTP does not allow, without doing lots of tricks and hacks at least. And then there is multicast, the magic keyword that I’ve heard spoken many times since I had my first internet connection. Indeed, my ISP used to have some experimental multicast-based stream for 56k dial-up and 256kbit ADSL; nowadays they don’t provide that feature any longer (I know nobody who was ever able to get it to work anyway) but I guess they did use the data they researched at the time to implement their IPTV system on ADSL2.

I have one situation clear in mind where mutlicast streaming, together with precise seeking, would be very helpful. At my high school we had a “English multimedia laboratory”, which basically was a classroom with fourteen crusty PCs wired up together; half the time we would use them as normal computers to browse the net for whatever reason (usually because neither the teacher nor the lab assistant wanted to do anything during the day), the other half they would be switched to just repeat the same video signal to all the monitors (which were, obviously for the time, CRTs; on the other hand the way the stuff was wired up, in either mode the monitors further at the back had very bad signal). Doing the same thing, all digital, with a simple Gigabit Ethernet connection, would probably have given quite better results (on the other hand, one could argue that having a single big TV would have saved the hassle).

Now, while all these things can probably be forced down HTTP’s throat with webapps, services, protocol extensions and whatever, having a dedicated protocol, like RTSP, that handles them, is probably quite an improvement; I’m certainly looking forwards for the day when my set top box in my bedroom (the AppleTV right now), would be able to stream my anime down the Ethernet connection with RTSP, so that I could seek without having to wait for the buffer to catch up, and easily skip to the middle of an episode without having to wait for all of it to be downloaded.

So basically, yeah RTSP is a bit more niche than HTTP right now but I don’t see it as dead yet at all; it’s actually technologically pretty cool, just underutilized.

Server Software Design

When I started to help Luca with feng I hadn’t worked on server software for many years; while server software was what I started working on when I started seriously working on collaborative free software projects, it really was much lower profile, yet I see that some of the things I did there seem to reflect here.

One of this is my (bad) habit of rewriting lots of parts together when I see that something does not really comply with the specifications; in this case while trying to make the parser more robust, I found that the whole thing works out of luck and started rewriting it again. The nice part is that in two days I wrote (again) the whole RTSP parser (before I only rewrote the actual parser but not the logic that reads the data), with much less lines of code, with per-client worker threads, with SCTP support and so on.

The problem now is, though, that we haven’t been able to release a stable 2.0 release yet, and since 2.0 was supposedly still using bufferpool, we’re probably just going to skip it and go with 2.1; on the other hand, the worker threads are still experimental and should not be in 2.1 either, but rather in 2.2 (I need the new parser to implement the proxy-passthrough used by QuickTime and QTSS, that encapsulates RTSP and RTP over HTTP). So the worker threads will be postponed and looked at in the future; on the other hand, the new parser was also designed to solve a series of security concerns I had with the previous parser code.

Even more complex, the message parser I wrote as my first rewrite of the request parsing and handling was split off in a separate library (related to this earlier idea I had), but the new parser is much more tied to feng’s code and cannot be easily shared directly, so the library has to go away. But does it make any sense to have a 2.1 release using a new library that the 2.2 release will blow away? Nope, so there I go merging it back.

And since I’m at that point, and merging is hard.. I’m basically working on making sure that all the non-feature non-rewrite changes that we got are merged in for 2.1, so that the walk toward 2.2 is a cakewalk. Not an easy thing but I should be able to pull it off with not an overly complex work.

The other problem is implementing IPC between two components of the lscube stack, flux and feng, to provide live streaming support. Now this is trickier because we’re actually mixing different programming languages: flux is written in C++ while feng is written in good old C. The idea for now is to use POSIX message queues to send data from one to the other, but here comes the problem: earlier the IPC mechanism was provided by the(pretty broken) bufferpool library, and thus lied away from either feng or flux; now that library is no longer present, which means we’ll have to find another way to deal with it. I’d probably look into installing some development header and libraries for feng.

I also have to design a stress-test suite to make sure that there are no buffer overflow in the old parser, which is not an easy thing to do actually; thinking outside of the box, which is what is needed for security testing, is not exactly what I do to have a standard-compliant server.

Oh yeah and there is the problem of standards. While rewriting the seeking support for feng with Luca, we discovered that either live555 or VLC are pretty stupid when it comes to seeking, and send multiple PLAY requests instead of a series of PAUSE/PLAY pairs, which in RTSP actually have a very different meaning as they create an edit list.

And all what I described up to now only relates to RTSP/1.0, not even to extensions (beside the QT/QTSS I noted before), nor to RTSP/2.0. Seems like we still have a long road ahead before even completing the basic features, or being standard compliant (did I say that we also don’t send a Connection: Close header when we’re going to close the RTSP connection?). Alas.

On the other hand I have to say that the experience is proving very interesting, and I guess that once the server part is decently ready and we can start focusing on the client part, I might be able not only to get xine in a state where the RTSP code does not decide to security mess-up every other year, and it actually works with more than just Real’s own servers.

Code reuse and RFC 822 message parsing

If you’re just an user with no knowledge of network protocols you might not think there is any difference between an email, a file downloaded through the web, or a video streamed from a cerntral site. If you have some basic knowledge, you might expect the three to instead have little in common, since they come in three different protocols, IMAP (for most modern email systems, that is), HTTP and (for the sake of what I’m going to say), RTSP. In truth, the three of them have quit a bit in common, represented by RFC 822. A single point of contact between this, and many other, technologies.

The RTSP protocol (commonly used by both Real Networks and Apple, beside being a quite nice open protocol) uses a request/response system based on the HTTP protocol, so the similarity between the two is obvious. And both requests and responses of HTTP and RTSP are almost completely valid messages for the RFC822 specifications; the same used for email messages.

This is osmething htat is indeed very nice because it means that the same code that can be used to parse email messages can be used to parse requests and responses for those two protocols. Unfortunately, it’s easier said than done. Since I’ve been working on feng, I’ve been trying to reduce the amount of specific code that we ship, trying to re-use as much generic code as possible, which is what brought us to use ragel for parsing, and glib for most of the utility functions.

For these reason, I also considered using the gmime library to handle the in-memory representation of the messages, as well as possibly the whole parsing futher on. Unfortunately, when trying to implement it I noticed that in quite a few places I would end up doing more work than needed, duplicating parts of the strings, and freeing them right away, with the gmime library doing the final duplication to save it in the hash table (because both my original parser and gmime end up with a GHashTable object).

For desktop applications, this overhead is not really important, but it really is for a server project like feng, since not only it adds an overhead that can be considerable for the target of hundreds of requests a second that the project aims towards, but also adds one more failure point where the code can abort for out of memory. Unfortunately, Jeffrey Stedfast, the gmime maintainer, is more concerned with the cleanness of the API, and its use on the desktop, than of its micro-optimisation; I understand his point, and I thus think it might be a better choice for me to write my own parser to do what I need.

Since the parser can be a component on its own self that can be reused, I’m also going to make sure that it can sustain a high load of messages to parse. Unfortunately, I have no idea how to properly benchmark the code; I’d sincerely like to compare, after at least a draft work, the performance of gmime’s parser against mine, both in term of memory usage and speed. For the former I would have used the old massif tool from valgrind, but I can’t get myself to work with the new one. And I have no idea how to benchmark the speed of the code. If somebody does know how i could do that, I’d be glad.

Basically, my idea is to make sure that the parser works in two modes, a debug/catchall mode where the full headers are parsed and copied over, and another one where the headers are parsed, but are saved only when they are accepted by a provided function. I haven’t yet put to test my idea, but I guess that the hard work would be done more by the storage than the actual parser, especially considering that the parser is implemented by the ragel state machine generator, which is quite fast by itself. And if not for the speed of the parser itself, it would certainly reduce the amount of memory used, especially during parsing of eventual crafted messages.

Hopefully, given enough time and effort, it might produce a library that can be used as a basis for parsing and generating requests and responses for both RTSP and HTTP, as well as parsing e-mail messages, and other RFC 822 applications (I think, but I’m not sure, that the MSN messenger protocol uses something like that too; I do know that git uses it too though).

Who knows, maybe I’ll resume gitarella next, and write it using ruby-liberis, if that’s going to prove faster than the current alternatives. I sincerely hope so.