Mail, SSL and Postfix

In my previous post, I delineated a few reasons why I care about SSL for my blog and the xine bugzilla. What I did not talk about was about the email infrastructure for both. The reason is that, according to the very same threat model that I delineated in that post, it’s not as important for me to secure that part of the service.

That does not mean, though, that I never considered it, just I considered it not important enough yet. But after that post I realized it’s time to fix that hole and I’ve now started working on securing the email coming out of my servers as well as that coming through the xine server. But before going into details about that, let’s see why I was not as eager to secure the mail servers compared to the low-hanging fruit of my blog.

As I said in the previous post, what you have to identify two details: what information is important to defend, and who the attackers would be. In the case of the blog, as I said, the information was the email addresses of the commenters, and the attackers the other users of open, unencrypted wifi networks in use. In the case of email, the attackers in particular change drastically; the only people in a position to get access to the connections’ streams are the people in the hosting and datacenter companies, and if they made mistakes, your neighbours in the same datacenter. So for sure it’s not the very easy prey of the couple sitting at the Starbucks next to you.

The content, well, is a bit more interesting information. We all know that there is no real way to make email completely opaque to service providers unless we use end to end encryption such as GnuPG, so if you really don’t want your server admin to ever be able to tell what’s in your email, that’s what you should do. But even then, there is something that (minus protocol-level encryption) is transmitted in cleartext: the headers, the so-called metadata, that stirred the press so much last year. So once again it’s the address of the people you contact that could be easily leaked, even with everything else being encrypted. In the case of xine, the mail server handles mostly bugzilla messaging, and it is well possible that it sends over, without encryption, the comments on security bugs, so reducing the risk of that information leaking is still a good idea.

Caveat emptor in all of this post, though! In the case of the xine mail server, the server handles both inbound and outbound messages, but at the same time it does not ever let users access their mailbox; the server itself is a mail router, rather than a full mail service. This is important, becuase otherwise I wouldn’t be able to justify my sloppiness on covering SSL support for the mail! If your server hosts mailboxes or allows direct mail submission (relay), you most definitely need to support SSL as then it’s a client-server connection which is attackable by the Starbucks example above.

So what needs to be done to implement this? Well, first you need to remember that a mail router like the one I described above requires SSL in two directions: when it receive a message it should be able to offer SSL to the connecting client, and when it sends a message it has to request SSL to the remote server too. In a perfect set up, the client also offers a certificate to prove who it is. This means that you need a certificate that works both as a server and as a client certificate; thankfully, StartSSL supports that for Class 2 certificates, even if they are named for web servers, they work just fine for mail servers too.

Unfortunately, the same caveat that apply to HTTP certificates, apply to mail servers: ciphers and protocol versions combinations. Unfortunately, while Qualys has SSL Labs to qualify the SSL support in your website, I know of no similar service for mail routers, and coming up with one is not trivial, as you would want to make sure not to become a relay-spammer by mistake, and the only way to judge the message pushing of the server is to trick it into sending a message to your own service back, which should not be possible on a properly non-open relay of a server.

So the good news is that of all of the xine developers with an alias on the domain have a secure server when routing mail to them, so the work I’ve been done is not for nothing. The other note is that a good chunk of the other users in Bugzilla uses GMail or similar big hosting providers. And unlike others I actually find this a good thing, as it’s much more likely that the lonely admin of a personal mail server (like me for xine) would screw up encrypion, compared to my colleagues over at GMail. But I digress.

The bad news is that not only there is no way to assess the quality of the configuration of a mail server, but at least for the case of Postfix you have only a ternary setting about TLS: yes always, yes if client requests it (or if the server provides the option, when submitting mail), or not at all. There is no way to set up policy so that e.g. gmail servers don’t get spoofed and tricked into sending the messages over a clear text connection. A second bad news is that I have not been able to get Postfix to validate the certificates either as server or client, likely caused by the use of opportunistic TLS rather than forcing TLS support. And the last problem is that servers connecting to submit mail will not fallback to cleartext if the TLS can’t be negotiated (either because of cipher or protocols), and will instead keep trying to negotiate TLS the same way.

Anyway, my current configuration for this is:

smtpd_tls_cert_file = /etc/ssl/postfix/server.crt
smtpd_tls_key_file = /etc/ssl/postfix/server.key
smtpd_tls_received_header = yes
smtpd_tls_loglevel = 1
smtpd_tls_security_level = may
smtpd_tls_ask_ccert = yes
smtpd_tls_mandatory_protocols = !SSLv2, !SSLv3

smtp_tls_cert_file = /etc/ssl/postfix/server.crt
smtp_tls_key_file = /etc/ssl/postfix/server.key
smtp_tls_security_level = may
smtp_tls_loglevel = 1
smtp_tls_protocols = !SSLv2, !SSLv3

If you have any suggestions on how to make this more reliable and secure, please do!

Why HTTPS anyway?

You probably noticed that in the past six months I had at least two bothersome incidents related to my use of HTTPS in the blog. An obvious question at this point would be why on earth would I care about making my blog (and my website) HTTPS only.

Well, first of all, the work I do for the blog usually matches fairly closely the work I do for xine’s Bugzilla so it’s not a real doubling of the effort, and actually allows me to test things out more safely than with some website that actually holds information that has some value. In the case of the Bugzilla, there are email addresses and password hashes (hopefully properly salted, I trust Bugzilla for that, although I would have preferred OAuth 2 to avoid storing those credentials), and possibly security bugs reported with exploit information that should not be sent out in the clear.

My blog has much less than that; the only user is me, and while I do want to keep my password private, there is nothing that stops me from using a self-signed certificate only for the admin interface. And indeed I had that setup for a long while. But then I got the proper certificate and made it optionally available on my blog. Unfortunately that made it terrible to deal with internal and external links to the blog, and the loading of resources; sure there were ways around it but it was still quite a pain.

The other reason for that is simply to cover for people who leave comments. Most people connecting through open networks, such as from Starbucks, will have their traffic easily sniffable as no WPA is in use (and I’ve actually seen “secure” networks using WEP, alas), and I could see how people preferred not posting their email in comments. And back last year I was pushing hard for Flattr (I don’t any more) and I was trying to remove reasons for not using your email when commenting, so HTTPS protection was an interesting point to make.

Nowadays I stopped pushing for Flattr, but I still include gravatar integration and I like having a way to contact the people who comment on my blog especially as they make points that I want to explore more properly, so I feel it’s in my duty to protect their comments as they flow by using HTTPS at the very least.

Another personal postmortem: Heartbleed

So I promised I would explain why it took me that long to get the Heartbleed problem sorted out. So here’s what was going on.

Early last month, the company I worked for in Los Angeles have been vacating their cabinet at the Los Angeles hosting facility where Excelsior was also hosted. While they have been offering to continue hosting the server, the fact that I could not remotely log into it with the KVM was making it quite difficult for me to update the kernel and to keep updating the LXC ebuild.

I decided to bite the bullet, and enquired for hosting at the same (new) facility of them, Hurricane Electric. While they have very cheap half-rack hosting, I needed to get a full cabinet to host Excelsior on. At $600/mo is not cheap, but I can (barely) afford it right now, the positive side of being an absolute inept socially, and possibly I’ll be able to share it with someone else pretty soon.

The bright side with using Hurricane is that I can rely on nigh-infinite public addresses (IPv6), which is handy for a server like Excelsior that runs basically a farm of virtual machines. The problem is that you don’t only need a server, you also need a switch and a VPN endpoint if you were to put a KVM in there. That’s what I did, I bought a Netgear switch, a Netgear VPN router, and installed the whole thing while I was in the Bay Area for a conference (Percona Live Conference if you’re curious).

Unfortunately, Heartbleed was announced in-between the server being shut down at the previous DC and it being fully operational in the new one — in particular it was still in Los Angeles, but turned down. How does that matter? Well, the vservers where this blog, the xine Bugzilla and other websites run are not powerful enough to build packages for Gentoo, so what I’ve been doing instead is building packages from Excelsior and uploading them to the vservers. This meant that to update OpenSSL, I needed Excelsior running.

Now to get Excelsior running, I spent a full weekend, having to go to the datacenter twice: I couldn’t get the router to work, and after some time my colleague who was kindly driving me there figured out that somehow the switch did not like for port 0 (or 1, depending how you count) on that switch to be set on a VLAN that is not the default, so connecting the router to any other port made it work as expected. I’m still not sure why that is the case.

After that, I was able to update OpenSSL — but the problem was getting a new set of SSL certificates for all the servers. You probably don’t remember my other postmortem, but when my blog’s certificate expired, I was also in the USA, and I had no access to my StartSSL credentials, as they were only on the other laptop. The good news was that I had the same laptop I used that time with me, and I was able to log in and generate new certificates. While at it, I replaced the per-host SNI ones with a wildcard one.

The problem was with the xine certificate: the Class 2 certificate was already issued with the previous user, which I had no access to still (because I never thought I would have needed it), which meant I could not request a revocation of the certificate. Not only StartSSL were able to revoke it for me anyway, but they also did so free of charge (again, kudos to them!).

What is the takeaway from all of this? Well, for sure I need a backup build host for urgent package rebuilds; I think I may rent a more powerful vserver for that somewhere. Also I need a better way to handle my StartSSL credentials: I had my Zenbook with me only because I planned to do the datacenter work. I think I’ll order one of their USB smartcard tokens and use that instead.

I also ordered another a USB SIM-sized card reader, to use with a new OpenPGP card, so expect me advertising a second GPG key (and if I remember this time, I’ll print some business card with the fingerprints). This should make it easier for me to access my servers even if I don’t have a trusted computer with me.

Finally, I need to set up PFS but to do that I need to update Apache to version 2.4 and last time that was a problem with mod_perl. With a bit of luck I can make sure it works now and I can update. There is also the Bugzilla on xine-project that needs to be updated to version 4, hopefully I can do that tonight or tomorrow.

Heartbleed and xine-project

This blog comes in way too late, I know, but there has been extenuating circumstances around my delay on clearing this through. First of all, yes, this blog and every other websites I maintain were vulnerable to Heartbleed. Yes they are now completely fixed: new OpenSSL first, new certificates after. For most of the certificates, though, there is no revocation issued, as they are issued through StartSSL which means that they are free to issue, and expensive to revoke. The exception to this has been the certificate used by xine’s bugzilla which was revoked, free of charge, by StartSSL (huge thanks to the StartSSL people!)

If you have an account on xine’s Bugzilla, please change your passwords NOW. If somebody knows a way to automatically reset all passwords that were not changed before a given date in Bugzilla, please let me know. Also, if somebody knows whether Bugzilla has decent support for (optional) 2FA, I’d also be interested.

More posts on the topic will follow, this is just an announcement.

SSL Postmortem redux

It is a funny coincidence that the week I’m at LISA ‘13 I’m doing so much work on my own servers. It might be because I’m not spending my time in front of a computer at work like I usually do. It might be because I got unlucky and my SSL certificates failed at the wrong time.

Again, Johann pointed me on Twitter to the SSL Labs page for my blog, that noted how only a bunch of OS/Software combination fails for no SNI — but that made me notice that the website said that TLSv1.1 and TLSv1.2 were not enabled, although I was ready to swear I configured it to enable all TLS. And a quick check in my Puppet master shows that my idea was right:

SSLProtocol TLSv1.2 TLSv1.1 TLSv1

So what is going on? Well, the Apache logs don’t tell you anything about what’s going on, so I decided to try empirically and move the order:

SSLProtocol TLSv1 TLSv1.1 TLSv1.2

This worked with TLS 1.2 but not with 1.0 — which is pretty bad as most browsers do not support 1.2, only the newest ones do. Okay so what’s going on? Well, Turns out that this, taken from the Apache documentation, works:

SSLProtocol All -SSLv2 -SSLv3

And that’s what I have in my configuration right now; this also means it works very very well if a new version of TLS becomes supported, it will added. So, listen to my advice and do that!

A side note: turns out that IE6 on XP not only does not support SNI, but also it does not support any TLS protocol version (just SSLv3) which means it hasn’t been able to reach my blog for about an year already.

So I decided to look at the Apache source code, and it turns out that their documentation does not make it clear: unless you add a + in front of the protocol version, the last entry is the catch-all. And there is no warning when that happens. There is no example for not using All in the docs for Apache 2.2 — it’s actually even worse with the documentation for Apache 2.4 as the example now only enables TLSv1 and that’s it.

I’ll try to send a patch for either their documentation or the code to issue a warning when that setting is misused. In the mean time, please keep this in mind.

P.S.: seems like Readability has a problem with SNI and is now failing to fetch my articles. I’ve already contacted them about this and hopefully they’ll figure out how to fix it soon.

SNI Quest: how’s the support?

After yesterday’s incident my blog and all the other apps I’ve been hosting have moved to use SNI certificates (a downgrade to Class 1 from Class 2, but that’s okay).

SNI is still considered a partially experimental feature nowadays because Windows XP is unfortunately still a thing. Luckily for me, it doesn’t seem like I have many Windows XP users — and the few that are there are probably okay with using either Chrome, Firefox or Opera, all of which use their own implementation of SSL (two using NSS), that supports SNI just fine.

Internet Explorer uses the operating system level libraries, which are not capable of using SNI at all, even if you updated to IE8. With a bit of luck, this will also mean fewer spammers using real WinXP-based browsers will be able to post. I don’t hold my breath, but it’s still possible. A few spammers were kicked off by the HTTPS move after all, so who knows.

What turned out to be interesting is the support for dropping SNI-backed links into various web apps out there — the kind of test I’ve done many times before while testing my ModSecurity Rules. The results have been interesting. All the major websites, and RSS readers, seem to handle this pretty well, with two main exceptions.

LinkedIn has probably the worst HTTP client implementation I’ve seen on a serious web app. I already opened a ticket with them before because their fetcher does not use compressed answers. This is pretty bad, considering that non-compressed answers mean a multiple times increase, and since this is traffic upstream from your server, it means that you are paying for LinkedIn’s laziness.

Due to this, LinkedIn links to my blog were already showing a (wrong) 403 message (the actual error they would get is 406, but then they process is wrongly, and I don’t care much about that). With the new SNI certificate, the LinkedIn fetcher now can only report the hostname of my blog, and no log in Apache can be found about it, which makes me guess that they try to validate the connection’s certificate, and fail.

NewsBlur is interesting as well. At first it seemed to me like it was not supporting SNI, as the settings page for my blog’s feed showed “401 Bad URL” error messages — without any matching log in Apache, which meant that the SSL connection was not completed either. On the other hand, the feed is fetched. While Samuel at first said that he did not care enough to implement SNI support for just one customer, and that made me look for alternatives for half an hour, he’s been very helpful with debugging a bit around it. Turns out that the problem is only for real-page fetching, and I haven’t spent much more time than this working on it. If somebody wants to look at it I’m happy to point you to what’s going on.

Luckily, Python’s httplib does not verify the certificates, which means Planet Gentoo still works. I’ve not checked Planet Multimedia yet — but at least that one if it fails I can fix.

What happened to my SSL certificates? A personal postmortem

I know that for most people this is not going to be very interesting, but my current job is teaching me that it’s always a good idea to help people learn from your own mistakes; especially so if you let others comment on said mistakes to see what you could have done better. So here it goes.

Let’s start to say that I’m an idiot. Last month I was clever enough to update the certificate for xine-project which was almost to expire. Unfortunately, I wasn’t so clever as to notice that the rest of my certificates were going to expire give or take at the same time. Nor I went remembering that my StartSSL verification was expiring, as last year I was in the US when that happened, and I had some trouble as my usual Italian phone number was unavailable. I actually got a notification that my certificate was expiring already when I was in London, last week. I promised myself to act on it as soon as I would get home to Dublin, but of course I ended up forgetting about it.

And then this morning came, when I got notified via Twitter that my blog’s certificate expired. And then the panic. I’m not in Dublin; I’m not in Ireland, I’m not in Europe even. I’m in Washington, DC at LISA ‘13, without either my Italian or US phone number, without my client certificate, which was restricted to my Dell laptop which is sitting in my living room in Dublin, and of course, no longer living in Italy!

Thankfully, the StartSSL support are great guys, and while they couldn’t verify me for a Class 2 as I was before right away, I got at least further enough to be able to get new Class 1 certificates, and start the process for Class 2 re-verification. Unfortunately, Class 1 means that I can’t have multiple hostnames for the cert, or even wildcard certificates. So I decided to bit the bullet and go with SNI certificates, which basically means that each vhost now has its own certificate. Which is fine, just a bit more convoluted to set up, as I had to create a number of Certificate Signature Request (CSR) as letting StartSSL generate the keys as 4096 bit SHA-256 RSA takes a very long time.

Unfortunately, SNI means that there are a few people who won’t be able to access my blog any more, although most of them were already disallowed from commenting thanks to my ModSecurity Ruleset as they would be Windows XP with Internet Explorer (any version, my ruleset would only stop IE6 from commenting). There probably are some issues for people stuck with Android 2 and the default browser. I’m sorry for you guys, I think Opera Mobile would work fine for it, but feel free to scream at me that being the case.

Unfortunately, there seems to be trouble with Firefox and with Safari at this point: both these browsers enabled OCSP by default quite a while ago, but newly minted certificates from StartSSL will fail the OCSP check for a few hours. Also there seems to be an issue with Firefox on Android, where SNI is not supported, or maybe it’s just the same OCSP problem which leads to a different error message, I’m not sure. Chrome, Safari on iOS and Opera all work fine.

What still needs to be found out is whether Planet Gentoo and NewsBlur will handle this properly. I’m not sure yet but I’m sure I’ll find out pretty soon. Some offline RSS readers could also not support SNI — that being the case, rather than just complaining to me, let upstream know that they are broken, I’m sure somebody is going to have a good fun with that.

Before somebody points out I should have alerts about certificate expiration, yes I know. I used to have these set up on the Icinga instance that was used by my previous employer, but ever since I haven’t set up anything new for that. I’m starting to do so as we speak, by building Icinga for my Puppetmaster host. I’m also going to write on my calendar to make sure to update the certificates before they expires, as for the OCSP problem noted above.

Questions and comments are definitely welcome, suggestions on how to make things better are too, and if you use Flattr remember to use your email address, as good suggestions will be rewarded!

Multiple SSL implementations

If you run ~arch you probably noticed the a few days ago that all Telepathy-based applications failed to connect to Google Talk. The reason for this was a change in GnuTLS 3.1.7 that made it stricter while checking security parameters’ compliance, and in particular required 816-bit primes on default connections, whereas the GTalk servers provided only 768. The change has since been reverted, and version 3.1.8 connects to GTalk by default once again.

Since I hit this the first time I tried to set up KTP on KDE 4.10, I was quite appalled by the incompatibility, and was tempted to stay with Pidgin… but at the end, I found a way around this. The telepathy-gabble package was not doing anything to choose the TLS/SSL backend but deep inside it (both telepathy-gabble and telepathy-salut bundle the wocky XMPP library — I haven’t checked whether it’s the same code, and thus if it has to be unbundled), it’s possible to select between GnuTLS (the default) or OpenSSL. I’ve changed it to OpenSSL and everything went fine. Now this is exposed as an USE flag.

But this made me wonder: does it matter on runtime which backend one’s using? To be obviously honest, one of the reasons why people use GnuTLS over OpenSSL is the licensing concerns, as OpenSSL’s license is, by itself, incompatible with GPL. But the moment when you can ignore the licensing issues, does it make any difference to choose one or the other? It’s a hard question to answer, especially the moment you consider that we’re talking about crypto code, which tends to be written in such a way to be optimized for execution as well as memory. Without going into details of which one is faster in execution, I would assume that OpenSSL’s code is faster simply due to its age and spread (GnuTLS is not too bad, I have some more reserves for libgcrypt but nevermind that now, since it’s no longer used). Furthermore, Nikos himself noted that sometimes better algorithm has been discarded, before, because of the FSF copyright assignment shenanigan which I commented on a couple of months ago.

But more importantly than this, my current line of thought is wondering whether it’s better to have everything GnuTLS or everything OpenSSL — and if it has any difference to have mixed libraries. The size of the libraries themselves is not too different:

        exec         data       rodata        relro          bss     overhead    allocated   filename
      964847        47608       554299       105360        15256       208805      1896175   /usr/lib/
      241354        25304        97696        12456          240        48248       425298   /usr/lib/
       93066         1368        57934         2680           24        14640       169712   /usr/lib/
      753626         8396       232037        30880         2088        64893      1091920   /usr/lib/

OpenSSL’s two libraries are around 2.21MB of allocated memory, whereas GnuTLS is 1.21 (more like 1.63 when adding GMP, which OpenSSL can optionally use). So in general, GnuTLS uses less memory, and it also has much higher shared-to-private ratios than OpenSSL, which is a good thing, as it means it creates smaller private memory areas. But what about loading both of them together?

On my laptop, after changing the two telepathy backends to use OpenSSL, there is no running process using GnuTLS at all. There are still libraries and programs making use of GnuTLS (and most of them without an OpenSSL backend, at least as far as the ebuild go) — even a dependency of the two backends (libsoup), which still does not load GnuTLS at all here.

While this does mean that I can get rid of GnuTLS on my system (NetworkManager, GStreamer, VLC, and even the gabble backend use it), it means that I don’t have to keep loaded into memory that library at all time. While 1.2MB is not that big a save, it’s still a drop in the ocean of the memory usage.

So, while sometimes what I call “software biodiversity” is a good thing, other times it only means we end up with a bunch of libraries all doing the same thing, all loaded at the same time for different software… oh well, it’s not like it’s the first time this happens…

The pain of installing RT in Gentoo

Since some of my customers tend to forget what they asked me to do and then they complain that I’m going overbudget or overtime, I’ve finally decided to bite the bullet and set up some kind of tracker. Of course given that almost all of said customers are not technical at all, using Bugzilla was completely out of question. The choice fell back to RT that has a nice email-based interface for them to use.

Setting this up seemed a simple process after all: you just have to emerge the package, deal with the obnoxious webapp-config (you can tell I don’t like it at all!), and set up database and mod_perl for Apache. It turned out not to be so easy. The first problem is the one I already wrote about at least in passing: Apache went segfaulting on me when loading mod_perl on this server, and I didn’t care enough to actually go around and debug why that is.

But fear not, since I’ve already rented a second server, as I said, I decided to try deploying RT there, it shouldn’t be trouble, no? Hah, I wish.

First problem is that Apache refused to start because the script couldn’t be launched. Which was at the very least bothersome, since it also refused to actually show me some error message beside repeating to me that it couldn’t load it. I decided against trying mod_perl once again and moved to a more “common” configuration of lighttpd and reverse proxying, using FastCGI.

And here trouble starts getting even nastier. To begin with, FastCGI requires you to start rt with its own init script; the one provided by the current 3.8.10 ebuild is pretty much outdated and won’t work, possibly at all. I rewrote it (and the rewrite I’ll see to push into portage soon), and got it to at least try starting up. But even then it won’t start up. Why is that?

It has to do with the way I decided to fix up the database: since the new server will run at some point a series of WordPress instances (don’t ask!), it’ll have to run MySQL, but there will be other Web Apps that should use PostgreSQL, and as long as performance is not that big an issue, I wanted to keep one database software per server; this meant connecting to PostgreSQL running on Earhart (which is on the same network anyway), and to do so, beside limiting access through iptables, I set it to use SSL. That was a mistake.

Even though you may set authentication as trust in the pg_hba.conf configuration file, the client-side PostgreSQL library tries to see if there are authentication tokens to use, which in case of SSL can be of two kinds: passwords and certificates. The former is the usual clear-text password, the latter as the name implies is a SSL user certificate that can be used to validate the secure connection from one point to the other. I had no interest to use user certificates at that point, so I didn’t care much about procuring or producing any.

So when I start the rt service (without using --background, that is… I’ll solve that before committing the new init script), I get this:

 * Starting RT ...
DBI connect('dbname=rt3;;requiressl=1','rt',...) failed: could not open certificate file "/dev/null/.postgresql/postgresql.crt": Not a directory at /usr/lib64/perl5/vendor_perl/5.12.4/DBIx/SearchBuilder/ line 106
Connect Failed could not open certificate file "/dev/null/.postgresql/postgresql.crt": Not a directory
 at //var/www/ line 206
Compilation failed in require at /var/www/ line 54.
 * start-stop-daemon: failed to start `/var/www/'                                                                       [ !! ]
 * ERROR: rt failed to start

Obviously /dev/null is the home of the rt user, which is what I’m trying to run this as, and of course it is not a directory by itself, trying to handle it as a directory will make the calls fail exactly as expected. And if you see this, your first thought is likely to be “PostgreSQL does not support connecting via SSL without an user certificate, what a trouble!”.. and you’d be wrong.

Indeed, if you look at a strace of psql run as root (again, don’t ask), you’ll see this:

stat("/root/.pgpass", 0x74cde2a44210)   = -1 ENOENT (No such file or directory)
stat("/root/.postgresql/postgresql.crt", 0x74cde2a41bb0) = -1 ENOENT (No such file or directory)
stat("/root/.postgresql/root.crt", 0x74cde2a41bb0) = -1 ENOENT (No such file or directory)

So it tries to find the certificate, doesn’t find it, and proceed to find a different one, if even that doesn’t exist, it gives up. But that’s not the case for the above case. The reason is probably a silly one: the library looks up errno to be ENOENT before ignoring the error, while the rest of them is likely considered fatal.

So how do you deal with a similar issue? The obvious answer would be to make the home directory to point to the RT installation directory, so that it’s also writeable by the user; in most cases this only requires you to set the $HOME variable, but that’s not the case for PostgreSQL, that instead decides to be smarter than that, and looks up the home directory of the user from the passwd file…

So why not changing the user’s home directory to the given directory then? One reason is that you could have multiple RT instances in the same system, mostly thanks to webapp-config, and another is that even with a single RT instance, the path to the installed code has the package’s version in it, so you would have to change the user’s home each time, which is not something you should be looking forward to.

How to solve this whole? Well, there is one “solution” that is what I’m going to do: set up RT on the same system as PostgreSQL, either with lighttpd or by using FastCGI directly within Apache, I have yet to decide; then there is the actual solution to solve this: get the PostgreSQL client library to respect $HOME and at the same time make it not throw a fit if the home directory is not really a directory. I just don’t think I have time to dedicate to the real fix for now.

Implementing the future! (or, how feng is designing secure RTSP)

While LScube is no longer – for me – a paid job project (since I haven’t been officially paid to work on it since May or so), it’s a project that intrigues me well enough to work on it on free time, rather than, say, watch a movie. The reason is probably to find in situations like those in the past few days, when Luca brings up a feature to implement that either no one else has implemented before, or simply needs to be re-implemented from scratch.

Last time I had to come up with a solution to implement something this intricate was when we decide to implement Apple’s own HTTP tunnelling feature whose only other documentation I know of, beside the original content from Apple is the one I wrote for LScube itself.

This time, the challenge Luca shown me is something quite more stimulating because it means breaking new ground that up to now seems to have been almost entirely missing: RTSP recording and secure RTSP.

The recording feature for RTSP has been somewhat specified by the original RFC but it was left quite up in the air as to what it should have been used for. Darwin Streaming Sever has an implementation, but beside the fact that we try not to look at it too closely (to avoid pollination), we don’t agree on the minimum amount of security that it provides as far as we can tell. But let’s start from the top.

The way we need RECORD to work, is to pass feng (the RTSP server) a live stream, which feng will then multiplex to the connected users. Up to now, this task was care of a special protocol between flux (the adaptor) and feng, which also went to a few changes over time: originally it used UDP over localhost, but we had problems with packet loss (no I can’t even begin to wonder why) and in performance; nowadays it uses POSIX message queues, which are supposedly portable and quite fast.

This approach has, objectively, a number of drawbacks. First of all the MQ stuff is definitely not a good idea to use; it’s barely implemented in Linux and FreeBSD, it’s not implemented on Darwin (and for a project whose aim is to replace Darwin Streaming Server this is quite a bad thing) and even though it does work on Linux, its performances aren’t exactly exciting. Also, using POSIX MQs is quite tricky: you cannot simply write an eventloop-based interface with libev because, even though they are descriptors in their implementation, an MQ descriptor does not support the standard poll() interface (nor any of the more advanced ones) so you end up with a thread blocking in reading… and timing out to make sure that flux wasn’t restarted in the mean time, since in that case, a new pair of MQs would have been created.

This situation was also not becoming any friendlier by the fact that, even though the whole team is composed of fewer than five developers, flux and feng are written in two different languages, so the two sides of the protocol implemented over MQs are also entirely separated. And months after I dropped netembryo from feng it is still used within flux, which is the one piece of the whole project that needed it the least, as it uses none of SCTP, TCP and SSL abstractions, and the interfaces of the two couldn’t be more far apart (flux is written in C++ if you didn’t guess, while netembryo is also written in C).

So it’s definitely time to get rid of the middleware and let the producer/adapter (ffmpeg) talk directly with the server/multiplexer (feng). This is where RECORD is necessary: ffmpeg will open a connection to feng via the same RTSP protocol used by the clients, but rather than ask to DESCRIBE a resource, it will ANNOUNCE one; then it will SETUP the server so that it would open RTP/RTCP ports, and start feeding data to the server. Since this means creating new resources and making it available to third parties (the clients) – and eventually creating files on the server’s disk, if we proceed to the full extent of the plan – it is not something that you want just any client to be able to do: we want to make sure that it is authorised to do so — we definitely don’t want to pull a Diaspora anytime soon.

Sequence Diagram of feng auth/record support

And in a situation like this, paying for a decent UML modeler, shows itself as a pretty good choice. A number of situations became apparent to me only when I went to actually think of the interaction to draw the pretty diagrams that we should show to our funders.

Right now, though, we don’t even support authentication, so I went to look for it; RTSP was designed first to be an HTTP-compatible protocol, so the authentication is not defined in RFC2326, but rather to be defined by HTTP itself; this means that you’ll find it in RFC 2616 instead — yes it is counterintuitive, as the RFC number is higher; the reason is simple though: RFC2616 obsoletes the previous HTTP description in 2068 which is the one that RTSP refers to; we have no reason to refer to the old one knowing that a new one is out, so…

You might know already that HTTP defines at least two common authentication schemes: Basic (totally insecure) and Digest (probably just as insecure, but obfuscated enough). The former simply provides username and password encoded in base64 — do note: encoded, not encrypted; it is still passed in what can be called clear text, and is thus a totally bogus choice to take. On the other hand, Digest support provides enough features and variables to require a very complex analysis of the documentation before I can actually come up with a decent implementation. Since we wanted something more akin to a Proof-of-Concept, but also something safe enough to use in production, we had to find a middle ground.

At this point I had two possible roads in front of me. On one side, implementing SASL looked like a decent solution to have a safe authentication system, while keeping the feng-proper complexity at bay; unfortunately, I still haven’t found any complete documentation on how SASL is supposed to be implemented as part of HTTP; the Wikipedia page I’m linking to also doesn’t list HTTP among the SASL-supported protocols — but I know it is supposed to work, both because Apache has a mod_authn_sasl and because if I recall correctly, Microsoft provides ActiveDirectory-authenticated proxies, and again if my memory serves me correctly, ActiveDirectory uses SASL. The other road looked more complex at first but it promised much with (relatively) little effort: implementing encrypted, secure RTSP, and require using it to use the RECORD functionality.

While it might sound a bit of overkill to require an encrypted secure connection to send data that has to be further streamed down to users, keep in mind that we’re talking only of encrypting the control channel; in most cases you’d be using RTP/RTCP over UDP to send the actual data, and the control stream can even be closed after the RECORD request is sent. We’re not yet talking about implementing secure RTP (which I think was already drafted up or even implement), but of course if you were to use a secure RTSP session and interleaved binary data, you’d be getting encrypted content as well.

At any rate, the problem with this became how to implement the secure RTSP connection; one option would have been using the oldest trick in the book and use two different ports, one for clear-text RTSP and another for the secure version, as is done for HTTP and HTTPS. But I also remembered that this approach has been frown upon and deprecated for quite a bit. Protocols such as SMTP and IMAP now use TLS through a STARTTLS command that replaces a temporarily clear-text connection with an encrypted TLS on; I remember reading about a similar approach for HTTP and indeed, RFC 2817 describes just that. Since RTSP and HTTP are so similar, there should be no difficulty into implementing the same idea over RTSP.

Sequence Diagram of feng TLS/auth/RECORD support

Now in this situation, implementing secure ANNOUNCE support is definitely nicer; you just refuse to accept methods (in write-mode) for the resource unless you’re over a secure TLS connection; then you also ensure that the user authenticated before allowing access. Interestingly, here the Basic authentication, while still bad, is acceptable for an usable proof of concept, and in the future it can easily be extended to authenticate based on client-side certificates without requiring 401 responses.

Once again, I have to say that having translated Software Engineering wasn’t so bad a thing; even though I usually consider UML being just a decent way to produce pretty diagrams to show the people with the gold purse, doing some modelling of the interaction between the producer and feng actually cleared up my mind enough to avoid a number of pitfalls, like doing authentication contextually to the upgrade to TLS, or forgetting about the need to provide two responses when doing the upgrade.

At any rate, for now I’ll leave you with the other two diagrams I have drawn but wouldn’t fit the post by themselves, they’ll end up on the official documentation — and if you wonder what “nq8” stands for, it’s the default extension generated by pngnq: Visual Paradigm produces 8-bit RGB files when exporting to PNG (and the SVG export produces files broken with RSVG, I have to report that to the developers), but I can cut in half the size of the files by quantising them to a palette.

I’m considering the idea of RE part of the design of feng into UML models, and see if that gives me more insight on how to optimise it further, if that works I should probably consider spending more time over UML in the future… too bad that reading O’Reilly PDF books on the reader is so bad, otherwise I would be reading through all of “Learning UML 2.0” rather than skimming through it when I need.