Finally, last night, I’ve been able to finish, at least in a side-branch, to support Apple’s RTSP-in-HTTP tunnelling support, as dictated by their specifications. Now that the implementation is complete (and really didn’t take that much work to support once the parser worked as needed), I can tell a few things about that specification and about Apple phasing it out in favour of a different, HTTP-only streaming system.
First of all the idea of supporting both the RTSP and the RTSP-in-HTTP protocol, while working with the same exact streaming logic behind the scenes, requires a much more flexible parser, which isn’t as easy because of the HTTP design which I already discussed. While of course, once the work is done, it’s done, the complexity of such a parser isn’t ignorable.
But, since the work was done in quite a short time for me, it’s really not that bad, if the technique worked as good as it’s supposed to. Unfortunately, that’s not the case. For instance, the default configuration of net-proxy/polipo (a French HTTP proxy), does not allow for the technique to work, because of the way this is designed to work: pipelining and re-use of the connection, which are very common things to do with proxies to try improving performance, usually wait for the server to complete a request before they are returned to the client; unfortunately the GET request that is made by the client is one that will never complete, as it is where the actual streaming will happen.
At the end, for testing, I found it definitely easier to use the good old squid for testing purposes, even though the documentation at one (very hidden) point explains which parameters to set to make it work with QuickTime. But it definitely mean that not all HTTP proxy will let this technique work correctly.
And it’s definitely not the only reason. Since the HTTP and RTSP protocols are pretty similar, even the documentation says that if it POSTed the RTSP requests directly, it would have been seen as a bad HTTP requet by the proxy; to avoid that the requests are sent base64-encoded (which means, bigger than the original). But while the data coming from the client is usually scrutinised more, proxies nowadays probably scrutinise the responses as well as the requests, to make sure that they are not dealing with a malicious server (phising or stuff like that); and if they do, they are very likely to find the response coming from the GET request quite suspicious, likely considering it a tentative to HTTP response splitting (which is a common webapp vulnerability).
Now, of course it would have been possible for Apple to simply upgrade the trick by encoding the response as well as the request, but that has one huge drawback: it would both increase the latency of the stream (because the base64 content would have to be decoded before it’s used) and at the same time it would increase the size of the response, by â…“, one third, due to that kind of encoding). Another alternative would have been to simply encode with base64 the pure RTSP responses, and keep unencoded the RTP streams (which are veicolated over interleaved RTSP). Unfortunately this would have required more work, since at that point, the GET body wouldn’t be simply be stream-compatible with a pure RTSP stream , and thus wouldn’t be very transparent for either the client nor the server.
On the other hand, the idea of implementing that as an extension hasn’t entirely disappeared in my mind; since the channels one and following are used by the RTP streams, the channel code zero is still unused, and would make it possible to simply use that to send the RTSP response encoded in base64. At least in feng this wouldn’t require huge changes to the code, since we already consider a special channel zero for the SCTP connection.
With all these details considered, I can understand why Apple was looking into alternatives. What I cannot understand is, still, what they decided to use as alternative, since the new HTTP Live Streaming protocol still looks tremendously hacky to me. Hopefully, our next step is rather going to be Adobe’s take at a streaming protocol .