While LScube is no longer – for me – a paid job project (since I haven’t been officially paid to work on it since May or so), it’s a project that intrigues me well enough to work on it on free time, rather than, say, watch a movie. The reason is probably to find in situations like those in the past few days, when Luca brings up a feature to implement that either no one else has implemented before, or simply needs to be re-implemented from scratch.
Last time I had to come up with a solution to implement something this intricate was when we decide to implement Apple’s own HTTP tunnelling feature whose only other documentation I know of, beside the original content from Apple is the one I wrote for LScube itself.
This time, the challenge Luca shown me is something quite more stimulating because it means breaking new ground that up to now seems to have been almost entirely missing: RTSP recording and secure RTSP.
The recording feature for RTSP has been somewhat specified by the original RFC but it was left quite up in the air as to what it should have been used for. Darwin Streaming Sever has an implementation, but beside the fact that we try not to look at it too closely (to avoid pollination), we don’t agree on the minimum amount of security that it provides as far as we can tell. But let’s start from the top.
The way we need RECORD to work, is to pass feng (the RTSP server) a live stream, which feng will then multiplex to the connected users. Up to now, this task was care of a special protocol between flux (the adaptor) and feng, which also went to a few changes over time: originally it used UDP over localhost, but we had problems with packet loss (no I can’t even begin to wonder why) and in performance; nowadays it uses POSIX message queues, which are supposedly portable and quite fast.
This approach has, objectively, a number of drawbacks. First of all the MQ stuff is definitely not a good idea to use; it’s barely implemented in Linux and FreeBSD, it’s not implemented on Darwin (and for a project whose aim is to replace Darwin Streaming Server this is quite a bad thing) and even though it does work on Linux, its performances aren’t exactly exciting. Also, using POSIX MQs is quite tricky: you cannot simply write an eventloop-based interface with libev because, even though they are descriptors in their implementation, an MQ descriptor does not support the standard poll()
interface (nor any of the more advanced ones) so you end up with a thread blocking in reading… and timing out to make sure that flux wasn’t restarted in the mean time, since in that case, a new pair of MQs would have been created.
This situation was also not becoming any friendlier by the fact that, even though the whole team is composed of fewer than five developers, flux and feng are written in two different languages, so the two sides of the protocol implemented over MQs are also entirely separated. And months after I dropped netembryo from feng it is still used within flux, which is the one piece of the whole project that needed it the least, as it uses none of SCTP, TCP and SSL abstractions, and the interfaces of the two couldn’t be more far apart (flux is written in C++ if you didn’t guess, while netembryo is also written in C).
So it’s definitely time to get rid of the middleware and let the producer/adapter (ffmpeg) talk directly with the server/multiplexer (feng). This is where RECORD is necessary: ffmpeg will open a connection to feng via the same RTSP protocol used by the clients, but rather than ask to DESCRIBE a resource, it will ANNOUNCE one; then it will SETUP the server so that it would open RTP/RTCP ports, and start feeding data to the server. Since this means creating new resources and making it available to third parties (the clients) – and eventually creating files on the server’s disk, if we proceed to the full extent of the plan – it is not something that you want just any client to be able to do: we want to make sure that it is authorised to do so — we definitely don’t want to pull a Diaspora anytime soon.
And in a situation like this, paying for a decent UML modeler, shows itself as a pretty good choice. A number of situations became apparent to me only when I went to actually think of the interaction to draw the pretty diagrams that we should show to our funders.
Right now, though, we don’t even support authentication, so I went to look for it; RTSP was designed first to be an HTTP-compatible protocol, so the authentication is not defined in RFC2326, but rather to be defined by HTTP itself; this means that you’ll find it in RFC 2616 instead — yes it is counterintuitive, as the RFC number is higher; the reason is simple though: RFC2616 obsoletes the previous HTTP description in 2068 which is the one that RTSP refers to; we have no reason to refer to the old one knowing that a new one is out, so…
You might know already that HTTP defines at least two common authentication schemes: Basic (totally insecure) and Digest (probably just as insecure, but obfuscated enough). The former simply provides username and password encoded in base64 — do note: encoded, not encrypted; it is still passed in what can be called clear text, and is thus a totally bogus choice to take. On the other hand, Digest support provides enough features and variables to require a very complex analysis of the documentation before I can actually come up with a decent implementation. Since we wanted something more akin to a Proof-of-Concept, but also something safe enough to use in production, we had to find a middle ground.
At this point I had two possible roads in front of me. On one side, implementing SASL looked like a decent solution to have a safe authentication system, while keeping the feng-proper complexity at bay; unfortunately, I still haven’t found any complete documentation on how SASL is supposed to be implemented as part of HTTP; the Wikipedia page I’m linking to also doesn’t list HTTP among the SASL-supported protocols — but I know it is supposed to work, both because Apache has a mod_authn_sasl and because if I recall correctly, Microsoft provides ActiveDirectory-authenticated proxies, and again if my memory serves me correctly, ActiveDirectory uses SASL. The other road looked more complex at first but it promised much with (relatively) little effort: implementing encrypted, secure RTSP, and require using it to use the RECORD functionality.
While it might sound a bit of overkill to require an encrypted secure connection to send data that has to be further streamed down to users, keep in mind that we’re talking only of encrypting the control channel; in most cases you’d be using RTP/RTCP over UDP to send the actual data, and the control stream can even be closed after the RECORD request is sent. We’re not yet talking about implementing secure RTP (which I think was already drafted up or even implement), but of course if you were to use a secure RTSP session and interleaved binary data, you’d be getting encrypted content as well.
At any rate, the problem with this became how to implement the secure RTSP connection; one option would have been using the oldest trick in the book and use two different ports, one for clear-text RTSP and another for the secure version, as is done for HTTP and HTTPS. But I also remembered that this approach has been frown upon and deprecated for quite a bit. Protocols such as SMTP and IMAP now use TLS through a STARTTLS command that replaces a temporarily clear-text connection with an encrypted TLS on; I remember reading about a similar approach for HTTP and indeed, RFC 2817 describes just that. Since RTSP and HTTP are so similar, there should be no difficulty into implementing the same idea over RTSP.
Now in this situation, implementing secure ANNOUNCE support is definitely nicer; you just refuse to accept methods (in write-mode) for the resource unless you’re over a secure TLS connection; then you also ensure that the user authenticated before allowing access. Interestingly, here the Basic authentication, while still bad, is acceptable for an usable proof of concept, and in the future it can easily be extended to authenticate based on client-side certificates without requiring 401 responses.
Once again, I have to say that having translated Software Engineering wasn’t so bad a thing; even though I usually consider UML being just a decent way to produce pretty diagrams to show the people with the gold purse, doing some modelling of the interaction between the producer and feng actually cleared up my mind enough to avoid a number of pitfalls, like doing authentication contextually to the upgrade to TLS, or forgetting about the need to provide two responses when doing the upgrade.
At any rate, for now I’ll leave you with the other two diagrams I have drawn but wouldn’t fit the post by themselves, they’ll end up on the official documentation — and if you wonder what “nq8” stands for, it’s the default extension generated by pngnq: Visual Paradigm produces 8-bit RGB files when exporting to PNG (and the SVG export produces files broken with RSVG, I have to report that to the developers), but I can cut in half the size of the files by quantising them to a palette.
I’m considering the idea of RE part of the design of feng into UML models, and see if that gives me more insight on how to optimise it further, if that works I should probably consider spending more time over UML in the future… too bad that reading O’Reilly PDF books on the reader is so bad, otherwise I would be reading through all of “Learning UML 2.0” rather than skimming through it when I need.
What are your objections to digest auth, btw? It should be reasonable secure (not able to recover the password except for via brute force, replay attack not possible), although I’m not sure how hard it is to implement properly on the server side (I’ve just implemented it on the client side for ffmpeg). That’s also what I’m normally using between ffmpeg and DSS.Getting support for TLS/RTSP would absolutely be nice, but support for that within ffmpeg isn’t really present at the moment so it would probably take some time until everything around that would be agreed upon.
Client-side digest is quite clean to implement, but on the server side it is quite a mess: avoiding the replay attack means we have to keep a state of the client’s authentication and is indeed something that I’d rather not spend so much time implementing, if we can implement something stronger altogether, like TLS would be.I guess here it’s a matter of deciding whether we’re doing that to be as low work as possible or also try to become something new: secure RTSP is not something we have already as far as I can see and it stimulates me to think of implementing it first.
The comment “probably just as insecure, but obfuscated enough” on digest in the blog post – any elaborations on that? I’d be hard pressed to say digest is “probably just as insecure”.(Although, with digest you get the issue that if someone accesses the stored credentials on the server, they can use that to authenticate at any time, without knowing the original password.)Regarding RTSP/TLS, getting an implementation of that would absolutely be nice, if you’d get it standardized in some RFC or similar. And I can totally understand that it would be more rewarding to work on that part. :-)For getting support for ANNOUNCE/RECORD in feng though, I’d rather see it implemented with digest first, before venturing on to specifying new protocol extensions. That would also result in compatibility with all existing applications supporting ANNOUNCE/RECORD (Wirecast, QuickTime Broadcaster, ffmpeg in its current form, etc). But as long as I’m not the one implementing it, this is just my 2 cents 🙂
We are also considering supporting non-upgrade TLS so people using different producers could use third party tunnels as well.Regarding digest, well it will come later once we figure out the other bits correctly. ^^
Ah sorry I should have cleared it up, it was more meant in a “just as insecure in its simplest forms”. As far as I can tell, digest auth among its many variations also allows to use methods that are not replay-safe… and that would have been the original choice for a proof of concept given that the stateful versions (which are replay-safe) don’t seem as easy to implement…But maybe I’ll just prod you if I have trouble with Digest, and will study the RFC more thoroughly as soon as I can.
Ah, I see. Yes, in the simplest implementation, it’s prone to replay attacks even though you can’t easily get the plaintext password.It would be interesting to see how much work a minimal but proper, replay-safe server-side implementation is. The spec is quite large with provisions for several different use cases, but as the server, you choose which parts you want to use.There’s for example also support for the server proving to the client that it really know the same secret, so that the client knows it can trust the server, by hashing the secret with some client-provided nonce. That part is optional and probably not relevant for this case. Similarly, there’s a lot of other things that you can skip right away.