SSL Postmortem redux

It is a funny coincidence that the week I’m at LISA ‘13 I’m doing so much work on my own servers. It might be because I’m not spending my time in front of a computer at work like I usually do. It might be because I got unlucky and my SSL certificates failed at the wrong time.

Again, Johann pointed me on Twitter to the SSL Labs page for my blog, that noted how only a bunch of OS/Software combination fails for no SNI — but that made me notice that the website said that TLSv1.1 and TLSv1.2 were not enabled, although I was ready to swear I configured it to enable all TLS. And a quick check in my Puppet master shows that my idea was right:

SSLProtocol TLSv1.2 TLSv1.1 TLSv1

So what is going on? Well, the Apache logs don’t tell you anything about what’s going on, so I decided to try empirically and move the order:

SSLProtocol TLSv1 TLSv1.1 TLSv1.2

This worked with TLS 1.2 but not with 1.0 — which is pretty bad as most browsers do not support 1.2, only the newest ones do. Okay so what’s going on? Well, Turns out that this, taken from the Apache documentation, works:

SSLProtocol All -SSLv2 -SSLv3

And that’s what I have in my configuration right now; this also means it works very very well if a new version of TLS becomes supported, it will added. So, listen to my advice and do that!

A side note: turns out that IE6 on XP not only does not support SNI, but also it does not support any TLS protocol version (just SSLv3) which means it hasn’t been able to reach my blog for about an year already.

So I decided to look at the Apache source code, and it turns out that their documentation does not make it clear: unless you add a + in front of the protocol version, the last entry is the catch-all. And there is no warning when that happens. There is no example for not using All in the docs for Apache 2.2 — it’s actually even worse with the documentation for Apache 2.4 as the example now only enables TLSv1 and that’s it.

I’ll try to send a patch for either their documentation or the code to issue a warning when that setting is misused. In the mean time, please keep this in mind.

P.S.: seems like Readability has a problem with SNI and is now failing to fetch my articles. I’ve already contacted them about this and hopefully they’ll figure out how to fix it soon.

Multiple SSL implementations

If you run ~arch you probably noticed the a few days ago that all Telepathy-based applications failed to connect to Google Talk. The reason for this was a change in GnuTLS 3.1.7 that made it stricter while checking security parameters’ compliance, and in particular required 816-bit primes on default connections, whereas the GTalk servers provided only 768. The change has since been reverted, and version 3.1.8 connects to GTalk by default once again.

Since I hit this the first time I tried to set up KTP on KDE 4.10, I was quite appalled by the incompatibility, and was tempted to stay with Pidgin… but at the end, I found a way around this. The telepathy-gabble package was not doing anything to choose the TLS/SSL backend but deep inside it (both telepathy-gabble and telepathy-salut bundle the wocky XMPP library — I haven’t checked whether it’s the same code, and thus if it has to be unbundled), it’s possible to select between GnuTLS (the default) or OpenSSL. I’ve changed it to OpenSSL and everything went fine. Now this is exposed as an USE flag.

But this made me wonder: does it matter on runtime which backend one’s using? To be obviously honest, one of the reasons why people use GnuTLS over OpenSSL is the licensing concerns, as OpenSSL’s license is, by itself, incompatible with GPL. But the moment when you can ignore the licensing issues, does it make any difference to choose one or the other? It’s a hard question to answer, especially the moment you consider that we’re talking about crypto code, which tends to be written in such a way to be optimized for execution as well as memory. Without going into details of which one is faster in execution, I would assume that OpenSSL’s code is faster simply due to its age and spread (GnuTLS is not too bad, I have some more reserves for libgcrypt but nevermind that now, since it’s no longer used). Furthermore, Nikos himself noted that sometimes better algorithm has been discarded, before, because of the FSF copyright assignment shenanigan which I commented on a couple of months ago.

But more importantly than this, my current line of thought is wondering whether it’s better to have everything GnuTLS or everything OpenSSL — and if it has any difference to have mixed libraries. The size of the libraries themselves is not too different:

        exec         data       rodata        relro          bss     overhead    allocated   filename
      964847        47608       554299       105360        15256       208805      1896175   /usr/lib/libcrypto.so
      241354        25304        97696        12456          240        48248       425298   /usr/lib/libssl.so
       93066         1368        57934         2680           24        14640       169712   /usr/lib/libnettle.so
      753626         8396       232037        30880         2088        64893      1091920   /usr/lib/libgnutls.so

OpenSSL’s two libraries are around 2.21MB of allocated memory, whereas GnuTLS is 1.21 (more like 1.63 when adding GMP, which OpenSSL can optionally use). So in general, GnuTLS uses less memory, and it also has much higher shared-to-private ratios than OpenSSL, which is a good thing, as it means it creates smaller private memory areas. But what about loading both of them together?

On my laptop, after changing the two telepathy backends to use OpenSSL, there is no running process using GnuTLS at all. There are still libraries and programs making use of GnuTLS (and most of them without an OpenSSL backend, at least as far as the ebuild go) — even a dependency of the two backends (libsoup), which still does not load GnuTLS at all here.

While this does mean that I can get rid of GnuTLS on my system (NetworkManager, GStreamer, VLC, and even the gabble backend use it), it means that I don’t have to keep loaded into memory that library at all time. While 1.2MB is not that big a save, it’s still a drop in the ocean of the memory usage.

So, while sometimes what I call “software biodiversity” is a good thing, other times it only means we end up with a bunch of libraries all doing the same thing, all loaded at the same time for different software… oh well, it’s not like it’s the first time this happens…

Implementing the future! (or, how feng is designing secure RTSP)

While LScube is no longer – for me – a paid job project (since I haven’t been officially paid to work on it since May or so), it’s a project that intrigues me well enough to work on it on free time, rather than, say, watch a movie. The reason is probably to find in situations like those in the past few days, when Luca brings up a feature to implement that either no one else has implemented before, or simply needs to be re-implemented from scratch.

Last time I had to come up with a solution to implement something this intricate was when we decide to implement Apple’s own HTTP tunnelling feature whose only other documentation I know of, beside the original content from Apple is the one I wrote for LScube itself.

This time, the challenge Luca shown me is something quite more stimulating because it means breaking new ground that up to now seems to have been almost entirely missing: RTSP recording and secure RTSP.

The recording feature for RTSP has been somewhat specified by the original RFC but it was left quite up in the air as to what it should have been used for. Darwin Streaming Sever has an implementation, but beside the fact that we try not to look at it too closely (to avoid pollination), we don’t agree on the minimum amount of security that it provides as far as we can tell. But let’s start from the top.

The way we need RECORD to work, is to pass feng (the RTSP server) a live stream, which feng will then multiplex to the connected users. Up to now, this task was care of a special protocol between flux (the adaptor) and feng, which also went to a few changes over time: originally it used UDP over localhost, but we had problems with packet loss (no I can’t even begin to wonder why) and in performance; nowadays it uses POSIX message queues, which are supposedly portable and quite fast.

This approach has, objectively, a number of drawbacks. First of all the MQ stuff is definitely not a good idea to use; it’s barely implemented in Linux and FreeBSD, it’s not implemented on Darwin (and for a project whose aim is to replace Darwin Streaming Server this is quite a bad thing) and even though it does work on Linux, its performances aren’t exactly exciting. Also, using POSIX MQs is quite tricky: you cannot simply write an eventloop-based interface with libev because, even though they are descriptors in their implementation, an MQ descriptor does not support the standard poll() interface (nor any of the more advanced ones) so you end up with a thread blocking in reading… and timing out to make sure that flux wasn’t restarted in the mean time, since in that case, a new pair of MQs would have been created.

This situation was also not becoming any friendlier by the fact that, even though the whole team is composed of fewer than five developers, flux and feng are written in two different languages, so the two sides of the protocol implemented over MQs are also entirely separated. And months after I dropped netembryo from feng it is still used within flux, which is the one piece of the whole project that needed it the least, as it uses none of SCTP, TCP and SSL abstractions, and the interfaces of the two couldn’t be more far apart (flux is written in C++ if you didn’t guess, while netembryo is also written in C).

So it’s definitely time to get rid of the middleware and let the producer/adapter (ffmpeg) talk directly with the server/multiplexer (feng). This is where RECORD is necessary: ffmpeg will open a connection to feng via the same RTSP protocol used by the clients, but rather than ask to DESCRIBE a resource, it will ANNOUNCE one; then it will SETUP the server so that it would open RTP/RTCP ports, and start feeding data to the server. Since this means creating new resources and making it available to third parties (the clients) – and eventually creating files on the server’s disk, if we proceed to the full extent of the plan – it is not something that you want just any client to be able to do: we want to make sure that it is authorised to do so — we definitely don’t want to pull a Diaspora anytime soon.

Sequence Diagram of feng auth/record support

And in a situation like this, paying for a decent UML modeler, shows itself as a pretty good choice. A number of situations became apparent to me only when I went to actually think of the interaction to draw the pretty diagrams that we should show to our funders.

Right now, though, we don’t even support authentication, so I went to look for it; RTSP was designed first to be an HTTP-compatible protocol, so the authentication is not defined in RFC2326, but rather to be defined by HTTP itself; this means that you’ll find it in RFC 2616 instead — yes it is counterintuitive, as the RFC number is higher; the reason is simple though: RFC2616 obsoletes the previous HTTP description in 2068 which is the one that RTSP refers to; we have no reason to refer to the old one knowing that a new one is out, so…

You might know already that HTTP defines at least two common authentication schemes: Basic (totally insecure) and Digest (probably just as insecure, but obfuscated enough). The former simply provides username and password encoded in base64 — do note: encoded, not encrypted; it is still passed in what can be called clear text, and is thus a totally bogus choice to take. On the other hand, Digest support provides enough features and variables to require a very complex analysis of the documentation before I can actually come up with a decent implementation. Since we wanted something more akin to a Proof-of-Concept, but also something safe enough to use in production, we had to find a middle ground.

At this point I had two possible roads in front of me. On one side, implementing SASL looked like a decent solution to have a safe authentication system, while keeping the feng-proper complexity at bay; unfortunately, I still haven’t found any complete documentation on how SASL is supposed to be implemented as part of HTTP; the Wikipedia page I’m linking to also doesn’t list HTTP among the SASL-supported protocols — but I know it is supposed to work, both because Apache has a mod_authn_sasl and because if I recall correctly, Microsoft provides ActiveDirectory-authenticated proxies, and again if my memory serves me correctly, ActiveDirectory uses SASL. The other road looked more complex at first but it promised much with (relatively) little effort: implementing encrypted, secure RTSP, and require using it to use the RECORD functionality.

While it might sound a bit of overkill to require an encrypted secure connection to send data that has to be further streamed down to users, keep in mind that we’re talking only of encrypting the control channel; in most cases you’d be using RTP/RTCP over UDP to send the actual data, and the control stream can even be closed after the RECORD request is sent. We’re not yet talking about implementing secure RTP (which I think was already drafted up or even implement), but of course if you were to use a secure RTSP session and interleaved binary data, you’d be getting encrypted content as well.

At any rate, the problem with this became how to implement the secure RTSP connection; one option would have been using the oldest trick in the book and use two different ports, one for clear-text RTSP and another for the secure version, as is done for HTTP and HTTPS. But I also remembered that this approach has been frown upon and deprecated for quite a bit. Protocols such as SMTP and IMAP now use TLS through a STARTTLS command that replaces a temporarily clear-text connection with an encrypted TLS on; I remember reading about a similar approach for HTTP and indeed, RFC 2817 describes just that. Since RTSP and HTTP are so similar, there should be no difficulty into implementing the same idea over RTSP.

Sequence Diagram of feng TLS/auth/RECORD support

Now in this situation, implementing secure ANNOUNCE support is definitely nicer; you just refuse to accept methods (in write-mode) for the resource unless you’re over a secure TLS connection; then you also ensure that the user authenticated before allowing access. Interestingly, here the Basic authentication, while still bad, is acceptable for an usable proof of concept, and in the future it can easily be extended to authenticate based on client-side certificates without requiring 401 responses.

Once again, I have to say that having translated Software Engineering wasn’t so bad a thing; even though I usually consider UML being just a decent way to produce pretty diagrams to show the people with the gold purse, doing some modelling of the interaction between the producer and feng actually cleared up my mind enough to avoid a number of pitfalls, like doing authentication contextually to the upgrade to TLS, or forgetting about the need to provide two responses when doing the upgrade.

At any rate, for now I’ll leave you with the other two diagrams I have drawn but wouldn’t fit the post by themselves, they’ll end up on the official documentation — and if you wonder what “nq8” stands for, it’s the default extension generated by pngnq: Visual Paradigm produces 8-bit RGB files when exporting to PNG (and the SVG export produces files broken with RSVG, I have to report that to the developers), but I can cut in half the size of the files by quantising them to a palette.

I’m considering the idea of RE part of the design of feng into UML models, and see if that gives me more insight on how to optimise it further, if that works I should probably consider spending more time over UML in the future… too bad that reading O’Reilly PDF books on the reader is so bad, otherwise I would be reading through all of “Learning UML 2.0” rather than skimming through it when I need.

Ranting on: libvirt remote

*This is a rant. This is not meant to be constructive, so it is not. If Rich is reading this, he might find some points he can work on; if I had the time, I would be working on them myself; if you feel like you agree with my rant and would like to get the stuff implemented, you can hire me to work on this. But for the rest, this is a rant, so if you’re not in the mood to read my rants, you might want to skip over this.*

The laptop from hell is finally shaping up; the smartcard reader works, after editing the ccid files (yes I have to publish the patches up there). Thanks to this I finally wanted to take one further step to make use of it to augment my productivity. One of these things, which I couldn’t feasibly do with OS X, is handling my virtual machines park via virt-manager.

Now, libvirt is designed to be a client-server system, and the server might not be local. There are three transport options to use remote servers: clear-text, TLS, and SSH tunnelling. Now, the clear-text is an obvious bad choice; SSH would be a good choice, if it wasn’t that.. it only works with the root user. There is no way to configure which user to use for the ssh tunnel, and I really don’t want to enable SSH root logins.

TLS is an interesting choice. With this transport, the authentication on the server side is done similarly to what OpenVPN does: you have a personal certification authority (CA), one key/certificate pair for the server, and then one pair per client. All the certificates are signed by the CA itself, and that validates the client to connect to the server. It’s a very nice approach for hosting providers I guess, since you can have a number of workstations that have the certificates set up properly to connect to the farm of virtualisation servers. Unfortunately it has design flawswhen you want to use it with something like a laptop.

First of all, while the server’s certificates and key files’ paths are configurable in the libvirtd.conf file, the client’s files are not configurable, they are hardcoded at build-time based on the system configuration directory (for Gentoo, that’s /etc). They are also only used host-global, as it does not even check for an override in the user’s directory. And this is a double-problem because the certificate has to be passwordless! Or, to put it in a different way, it has to be insecure, lacking a password protection. And since I just said that it does not allow for per-user overrides, you cannot even rely purely on encrypting your home directory. Alternative option is to symlink them from the /etc paths to your home directory but that’s not elegant at all.

It gets even a little worse: to access the display of the virtual machines libvirt tries to do a smart thing, by using the VNC protocol rather than reinventing the wheel. Now, a lot of comments can be written regarding the choice of using the VNC protocol itself, but the fact that it doesn’t reinvent a new one is positive. Unfortunately, when accessing a remote server via TLS, instead of muxing the VNC protocol over the same connection, it simply tries to connect to the VNC display on the remote host. And of course it’s a separate configuration to tell qemu to open the VNC on the correct host. D’oh!

Okay so I get to configure qemu to open the VNC on all interfaces, all IPs… but here’s the catch: for TLS to work correctly, you need to provide correct hostnames, stable hostnames; to do so I decided to use IPv6, since my boxes’ IPv6 are already forward-confirmed, thanks to the latest Hurricane Electric service (providing name server hosting without asking me to maintain my own). Unfortunately, it seems just like qemu does not support IPv6 at all, which means that .. the connection will not work because the hostname will find no hit.

It really doesn’t look too difficult to implement at least part of these features, like choosing the certificate files path, or connecting as a different SSH user. Sure, if you were to shoot high, you could probably consider using a single SCTP socket with multiple channels to multiplex the libvirt protocol together with the VNC connections, but that’s not needed at all. It really just need a few touches here and there to make it much more usable.

Ruby-elf and documentation

After my checklist post I got asked for some documentation about ruby-elf tools like cowstats and missingstatic.

As it turns out I wrote little to no documentation at all, and I relied exclusively on the scripts being self-documenting, for the most part. Probably not a good idea if I want to have a broader audience.

For this reason, I think I’ll start by writing some man pages for the tools, hopefully today or tomorrow, before I get to the hospital again. I’ll see also to actually release a version of this so I can add it to portage too, so that it’s actually available for developers who are interested (for now you can get it from my overlay as dev-ruby/ruby-elf-9999.

I also started working on improving the way cowstats decides what whether a symbol is in a copy on write section or not. Before I only used the name of the section and, as it turns out, I used to ignore the TLS sections (no, not SSL successor but Thread-local storage).

The TLS problem is solved now but I decided using the name of the section to decide whether it’s CoW or not is not very feasible. I added code that checks the type and the flags of the sections, to an extent, so that it ignores automatically all the sections containing executable code, and all the read-only sections. It also considers .bss and equivalent sections just by type rather than by name (if I did this in the first place I would have supported .tbss in the first place too).

On a different note, I forgot to write that while I was hospitalised, my Nokia decided to go crazy and corrupted the fring app I was using to chat from the E61 itself. I think (and from one side hope) that the MiniSD I was using was broken, because then the rest of the phone would be fine. The problem is that the internal memory is very tiny and the MiniSD that Nokia gave me with the phone, which I just put back in it, is half full of Nokia’s own software, like the MailForExchange launcher (which I don’t care of, or TravelMate). I think I’ll have to pick up a new MiniSD hoping that will work. Last time I bought a Corsair 1GB, this time I think I’ll stop with a Trascend one as they never failed me up to now. Interestingly enough, at my supplier, the MiniSD card would be pretty cheap (€5) while the shipping costs would be over that price. I should check if they have cheap SD cards too, in the stores around here they are tremendously expensive still (€10 for a 2GB card!).

Some notes about multi-threading

Ah, multithreading, such a wonderful concept, as it allows you to create programs in such a way that you don’t have to implement an event loop (for the most part). Anybody who ever programmed in the days of DOS has at least once implemented an event loop to write an interactive program. During my high school days I wrote a quite complex one for a game I designed for a class. Too bad I lost it :(

Multithreading is also an interesting concept nowadays that all the major CPUs are multi-core; for servers they were already for some time, but we know that mainstream is always behind on these things, right?

So, now that multithreading is even more intersting, it’s important to design programs, but even more importantly, libraries to be properly multithreaded.

I admit I’m not such a big expert of multithreading problems, I admit that, but I do know a few things that come useful when developing. One of these is that static variables are evil.

The reason why static variables are evil is because they are implicitly shared between different threads. For constants this is good, as they use less memory, but for variables this is bad, because you might overwrite the data another thread is using.

For this reason, one of the easiest thing to spot in a library to tell if it’s multithread-safe or not is to check if it relies on static variables. If it does, it’s likely not thread safe, and almost surely not thread optimised.

You could actually be quite thread safe even when using static variables, the easy way to do that is to have a mutex protecting every and all accesses to the variable, this way only one thread at a time can access it, and noone can overwrite someone else’s data.

That cause a problem though, as this serialises the execution of a given function. Let’s take for instance a function that requests data through the net with an arbitrary protocol (we don’t care which protocol it is), saves it on a static buffer, and then parse it filling a structure with the data received and parsed. If such a function is used in a multithreaded program, it has to be protected by a mutex, as it uses a static buffer. If four threads require access to that function almost simultaneously (and that might happen, especially on multi-core systems!), then the first one arriving will get the mutex, the other three will wait till the first one completed processing. Although in general, on a multicore system you’d then have other processes scheduled to be executed at that point, you’re going to waste time by waiting for a thread to complete its operation, before the next one can be resumed.

This is extremely annoying, especially now that the future of CPUs seems to be an increase in number of cores, rather than in their speed (as we’re walking around a physical limit of 3GHz as far as I can see). The correct way to handle such a situation is not to use a static buffer, but rather use a heap-allocated buffer, even if that is slightly slower for a single thread (as you have to allocate the memory and free it afterward); this way the four threads are independent and can be executed simultaneously. For this reason, libraries should try to never use static buffers, as they might not know if the software using them is multi-threaded or not.

When a library is blatantly not thread-safe, there is even a bigger problem, which can be solved in two ways: the first is to limit access to that library to a single thread. This way there are no problems with threading, but then all the requests that need to be sent to that library has to be passed to the thread, and the thread has to answer to them; while cheaper than IPC, ITC is still more expensive than using a properly thread-safe library.

The other option is to protect every use of the library with a mutex. This makes a library thread-safe if it’s at least re-entrant (that is, no function depends on the status of global variables set by other functions), but acts in the same way as the “big kernel lock” does: it does not allow you to run the same function from two threads at once, or even any function of that library from two threads at once – if the functions use shared global variables.

How should libraries behave, then, when they need to keep track of the state? Well there easiest way is obviously to have a “context” parameter, pointer to a structure that keeps all the needed state data, allowing two threads to use different contexts, and call the library simultaneously.

Sometimes, though, you just need to keep something similar to an errno variable, that is global and set by all your functions. There’s no way to handle that case gracefully through mutexes, but there’s an easy way to do that through Thread-Local Storage. If you mark the variable as thread-local, then every thread will see just one copy of that variable, and doesn’t need an explicit mutex to handle that (the implementation might use a mutex, I don’t really know the underlying details).

This is also quite useful for multi-threaded programs that would like to use global variables rather than having to pass a thread structure to all the functions. Take this code for instance:

/* Instantiated a few times simultaneously */
void *mythread(void *address) {
  mythread_state_t *state = malloc(sizeof(mythread_state_t));

  set_address(state, address);
  do_connection(state);
  check_data(state);

  do_more(state);
}

While for library API calls having a context parameter is an absolutely good idea, if the code has no reason to be reentrant, passing it as parameter might be a performance hit. At the same time, while using global variables in libraries is a very bad idea, for programs it’s not always that bad, and it can actually be useful to avoid passing parameters around or using up more memory. You could then have the same code done this way:

__thread mythread_state_t thread_state;

/* Instantiated a few times simultaneously */
void *mythread(void *address) {
  set_address(address);
  do_connection();
  check_data();

  do_more();
}

The thread_state variable would be one per thread, needing neither a mutex to protect it, nor to e passed once to every function.

There are a few notes about libraries and thread safety which I’d like to discuss, but I’ll leave those for another time. Two tech posts a day is quite a lot already, and I need to resume my paid job now.