HTTP-like protocols have one huge defect

So you might or might not remember that my main paid job in the past months (and right now as well) has been working on feng, the RTSP server component of the lscube stack .

The RTSP protocol is based off HTTP, and indeed uses the same message format as defined by the RFC822 text (the same used for email messages), and a request line “compatible” with HTTP.

Now, it’s interesting to know that this similitude between the two has been used, among other things, by Apple to implement the so-called HTTP tunnelling (see the QuickTime Streaming Server manual Chapter 1 Concepts, section Tunneling RTSP and RTP Over HTTP for the full description of that procedure). This feature allows clients behind standard HTTP proxies to access the stream, creating a virtual full-duplex communication between the two. Pretty neat stuff, even though Apple recently superseded it with the pure HTTP streaming that is implemented in QuickTime X.

For LScube we want to implement at a very least this feature, both server and client side, so that we can get on par with the QuickTime features (implementing the new HTTP-based streaming is part of the long haul TODO, but that’s beside the point now). To do that, our parser has to be able to accept the HTTP request and deal with them appropriately. For this reason, I’ve been working to replace the RTSP-specific parser to a more generic parser that accepts both HTTP and RTSP. Unfortunately, this turned out not to be a very easy task.

The main problem is that what we wanted to do was to do the least passes over the request line to get the data out; when we only supported RTSP/1.0 this was trivial: we knew exactly which method were supported, which ones appeared valid but weren’t supported (like RECORD) and which ones were simply invalid to begin with, so we set the value for the method passing by and then moved on to check the protocol. If the protocol was not valid, we cared not about the method anyway, but at worse we had to pass through a series of states for no good reason, but that wasn’t especially bad.

With the introduction of a simultaneous HTTP parser, the situation became much more complex: the methods are parsed right away, but the two protocols have different methods: the GET method that is supported for HTTP is a valid but not supported method for RTSP, and vice-versa when it comes to the PLAY method. The actions that handled the result of parsing of the method for the two protocols ended up executing simultaneously, if we were to use a simple union of state machines, and that, quite obviously, couldn’t have been the right thing to do.

Now, it’s really simple to understand that what we needed was a way to discern which protocol we’re trying to parse first, and then proceed to parse the rest of the line as needed. But this is exactly what I think is the main issue with the HTTP protocol and all the protocols, like RTSP, or WebDAV, that derive, or extend, it: the protocol specification is at the end of the request line. Since you usually parse a line in the latin order of characters (from left to right), you read the method before you know which protocol the client is speaking. This is easily solved by backtracking parsers (I guess LALR parsers is the correct definition, but parsers aren’t my field of work, usually, so I might be mistaken), since they first pass through the text to parse to identify which syntax to apply, and then they apply the syntax; Ragel is not such a parser, while kelbt (by the same author) is.

Time constrain and the fact that kelbt is even more sparingly documented than Ragel mean that I won’t be trying to use kelbt just yet, and for now I settled at trying to find an overcomplex and nearly unmaintainable workaround to have something working (since the parsing is going to be a black-box function, the implementation can easily change in the future when I learn some decent way to do that).

This all thing would have been definitely simpler if the protocol specification was at the start of the line! At that point we could just have decided the parsing further down the line depending on the protocol.

At this point I’m definitely not surprised that Adobe didn’t use RTSP and instead invented their own Real-Time Message Protocol not based on HTTP but is rather a binary protocol (which should also make it much easier to parse, to an extent).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s