LibreSSL: drop-in and ABI leakage

There has been some confusion on my previous post with Bob Beck of LibreSSL on whether I would advocate for using a LibreSSL shared object as a drop-in replacement for an OpenSSL shared object. Let me state this here, boldly: you should never, ever, for no reason, use shared objects from different major/minor OpenSSL versions or implementations (such as LibreSSL) as a drop-in replacement for one another.

The reason is, obviously, that the ABI of these libraries differs, sometimes subtly enought that they may actually load and run, but then perform abysmally insecure operations, as its data structures will have changed, and now instead of reading your random-generated key, you may be reading the master private key. nd in general, for other libraries you may even be calling the wrong set of functions, especially for those written in C++, where the vtable content may be rearranged across versions.

What I was discussing in the previous post was the fact that lots of proprietary software packages, by bundling a version of Curl that depends on the RAND_egd() function, will require either unbundling it, or keeping along a copy of OpenSSL to use for runtime linking. And I think that is a problem that people need to consider now rather than later for a very simple reason.

Even if LibreSSL (or any other reimplementation, for what matters) takes foot as the default implementation for all Linux (and not-Linux) distributions, you’ll never be able to fully forget of OpenSSL: not only if you have proprietary software that you maintain, but also because a huge amount of software (and especially hardware) out there will not be able to update easily. And the fact that LibreSSL is throwing away so much of the OpenSSL clutter also means that it’ll be more difficult to backport fixes — while at the same time I think that a good chunk of the black hattery will focus on OpenSSL, especially if it feels “abandoned”, while most of the users will still be using it somehow.

But putting aside the problem of the direct drop-in incompatibilities, there is one more problem that people need to understand, especially Gentoo users, and most other systems that do not completely rebuild their package set when replacing a library like this. The problem is what I would call “ABI leakage”.

Let’s say you have a general libfoo that uses libssl; it uses a subset of the API that works with both OpenSSL. Now you have a bar program that uses libfoo. If the library is written properly, then it’ll treat all the data structures coming from libssl as opaque, providing no way for bar to call into libssl without depending on the SSL API du jour (and thus putting a direct dependency on libssl for the executable). But it’s very well possible that libfoo is not well-written and actually treats the libssl API as transparent. For instance a common mistake is to use one of the SSL data structures inline (rather than as a pointer) in one of its own public structures.

This situation would be barely fine, as long as the data types for libfoo are also completely opaque, as then it’s only the code for libfoo that relies on the structures, and since you’re rebuilding it anyway (as libssl is not ABI-compatible), you solve your problem. But if we keep assuming a worst-case scenario, then you have bar actually dealing with the data structures, for instance by allocating a sized buffer itself, rather than calling into a proper allocation function from libfoo. And there you have a problem.

Because now the ABI of libfoo is not directly defined by its own code, but also by whichever ABI libssl has! It’s a similar problem as the symbol table used as an ABI proxy: while your software will load and run (for a while), you’re really using a different ABI, as libfoo almost certainly does not change its soname when it’s rebuilt against a newer version of libssl. And that can easily cause crashes and worse (see the note above about dropping in LibreSSL as a replacement for OpenSSL).

Now honestly none of this is specific to LibreSSL. The same is true if you were to try using OpenSSL 1.0 shared objects for software built against OpenSSL 0.9 — which is why I cringed any time I heard people suggesting to use symlink at the time, and it seems like people are giving the same suicidal suggestion now with OpenSSL, according to Bob.

So once again, don’t expect binary-compatibility across different versions of OpenSSL, LibreSSL, or any other implementation of the same API, unless they explicitly aim for that (and LibreSSL definitely doesn’t!)

Bundling libraries for trouble

You might remember that I’ve been very opinionated against bundling libraries and, to a point, static linking of libraries for Gentoo. My reasons have been mostly geared toward security but there has been a few more instances I wrote about of problems with bundled libraries and stability, for instance the moment when you get symbol collisions between a bundled library and a different version of said library used by one of the dependencies, like that one time in xine.

But there are other reasons why bundling is bad in most cases, especially distributions, and it’s much worse than just statically linking everything. Unfortunately, while all the major distribution have, as far as I know, a policy against bundled (or even statically linked) libraries, there are very few people speaking against them outside your average distribution speaker.

One such a rare gem comes out of Steve McIntyre a few weeks ago, and actually makes two different topics I wrote about meet in a quite interesting way. Steve worked on finding which software packages make use of CPU-specific assembly code for performance-critical code, which would have to be ported for the new 64-bit ARM architecture (Aarch64). And this has mostly reminded me of x32.

In many ways, there are so many problems in common between Aarch64 and x32, and they mostly gear toward the fact that in both cases you have an architecture (or ABI) that is very similar to a known, well-understood architecture but is not identical. The biggest difference, a part from the implementations themselves, is in the way the two have been conceived: as I said before, Intel’s public documentation for the ABI’s inception noted explicitly the way that it was designed for closed systems, rather than open ones (the definition of open or closed system has nothing to do with open- or closed-source software, and has to be found more into the expectancies on what the users will be able to add to the system). The recent stretching of x32 on the open system environments is, in my opinion, not really a positive thing, but if that’s what people want …

I think Steve’s reports is worth a read, both for those who are interested to see what it takes to introduce a new architecture (or ABI). In particular, for those who maintained before that my complaining of x32 breaking assembly code all over the place was a moot point — people with a clue on how GCC works know that sometimes you cannot get away with its optimizations, and you actually need to handwrite code; at the same time, as Steve noted, sometimes the handwritten code is so bad that you should drop it and move back to plain compiled C.

There is also a visible amount of software where the handwritten assembly gets imported due to bundling and direct inclusion… this tends to be relatively common because handwritten assembly is usually tied to performance-critical code… which for many is the same code you bundle because a dynamic link is “not fast enough” — I disagree.

So anyway, give a read to Steve’s report, and then compare with some of the points made in my series of x32-related articles and tell me if I was completely wrong.

What could have been. A time travel story of x32 and FatELF

I was toying around with the idea to write about this for a week or two by now, and I discussed it with Luca as well, but my main issue has been coming up with a title this time.. the original title I had in mind was a Marvel-style What if… “… x32 and FatELF arrived nine years ago”. But then Randall beat me to it with his hypothetical question answering site.

So let’s put some background in context; I criticised in the past Ryan’s FatELF idea and recently Intel’s x32 — but would I have criticised them the same way if the two of them came up almost together nine years ago? I don’t think so; and this is the main issue with this kind of ideas: you need the perfect timing or they are not worth much. Both of them came out at the wrong time in my opinion.

So here’s my scenario that didn’t happen and can’t happen now.

It’s 2003, AMD just launched their Opteron CPU which is the first sporting the new x86-64 ISA. Intel, after admitting to the failure of their Itanic Itanium project, releases IA-32e following suit. At the same time, though, they decide that AMD’s route of pure 64-bit architecture for desktop and servers is going to take too long to be production-ready, especially on Linux, as even just discussing multlib support is making LSB show its frailty.

They decide to thus introduce a new ABI, called x32 (in contrast with x64 used by Sun and Microsoft to refer to AMD’s original ABI). At the same time they decide to hire Ryan Gordon, to push forward a sketchy idea he proposed, about supporting Apple-style fat binaries in Linux — the idea was originally ignored because nobody expected any use out of a technique used in 1994 to move from M68k to PowerPC and then left to die with no further use.

The combined energy of Intel and Ryan came up with a plan on how to introduce the new ABI, and thus the new architecture, in an almost painless way. A new C library version is to be introduced in Linux, partially breaking compatibility with the current libc.so.6, but this is for the greatest good.

The new version of the C library, glibc 3.0, will bring a libc.so.7 for the two architectures, introducing a few incompatibility in the declarations. First of all, the largefile optional support is being dropped: both ABIs will use only 64-bit off_t, and to avoid the same kind of apocalyptic feeling of Y2K, they also decided to use 64-bit clock_t types.

These changes make a very slight dent into the usual x86 performance, but this is not really visible in the new x32 ABI. Most importantly, Intel wanted to avoid the huge work required to port to either IA-64 or AMD64, by creating an ILP32 ABI — where int, long and void * are all 32-bit. And here’s where Ryan’s idea comes to fruition.

Source code written in C will compile identically between x86 and x32, and thanks to the changes, aligning the size of some primitive standard types, even more complex data types will be identical between the two. The new FatELF extended format introduced by glibc 3.0 leverages this — original x86 code will be emitted in the .text section of the ELF, while the new code will live in .text32 — all the data, string and symbol tables are kept in single copy only. The dynamic loader can then map one or the other section depending on whether the CPU supports 64-bit instructions and all the dependencies are available on the new ABI.

Intel seems to have turned the tables under AMD’s nose with their idea, thanks to the vastly negative experience with the Itanium: the required changes to the compiler and loader are really minimal, and most of the softare will just build on the new ABI without any extra change, thanks to maintaining most of the data sizes of the currently most widespread architectures (the only changes are off_t behaving like largefile is enabled by default, and then clock_t that got extended). Of course this still requires vast porting of assembly-ridden software such as libav and most interpreter-based software, but all of this can easily happen over time thanks to Ryan’s FatELF design.

Dissolve effect running

Yes, too bad that Intel took their dear time to enter the x86-64 market, and even longer to come up with x32, to the point where now most of the software is already ported, and supporting x32 means doing most of the work again. Plus since they don’t plan on making a new version of the C library available on x86 with the same data sizes as x32, the idea of actually sharing the ELF data and overhead is out of question (the symbol table as well, since x86 still have the open/open64 split which in my fantasy is actually gone!) — and Ryan’s own implementation of FatELF was a bit of an over-achiever, as it doesn’t actually share anything between one architecture and the other.

So unfortunately this is not something viable to implement now (it’s way too late), and it’s not something that was implemented then — and the result is a very messed up situation.

There’s ABI and ABI

With all this talk about x32 there are people who might not know what we’re referring to when we talk about ABI. Indeed this term, much alike its sibling API, is so overloaded with multiple meanings that the only way to know what one is referring to is understanding in which of the many contexts it’s being used.

It’s not that I haven’t talked about ABI before but I think it’s the first time I talk about it in this context.

Let’s start from the meaning of the two acronyms:

  • API stands for Application Programming Interface;
  • ABI stands for Application Binary Interface.

The whole idea is that the API is what the humans are concerned with, and ABI what the computers are concerned with. But I have to repeat that what these two mean depends vastly on the context you refer to them.

For instance what I usually talk about is the ABI of a shared object which is a very limited subset of what we talk about in the context of x32. In that context, the term ABI refers to the “compiled API”, which often is mistaken for the object’s symbol table although it includes more details such as the ordered content of the transparent structures, the order and size of the parameters in functions’ signatures and the meaning of said parameters and the return value (that’s why recently we had trouble due to libnetlink changing the return values, which caused NetworkManager to fail).

When we call x32 and amd64 ABIs of the x86-64 architecture, instead, we refer to the interface between a few more components… while I don’t know of a sure, all-covering phrase, the interfaces involved in this kind of ABI are those between kernel and userspace (the syscalls), the actual ELF variant used (in this case it’s a 32-bit class, x86-64 arch ELF file), the size of primitive types as declared by the compiler (long, void*, int, …), the size of typedefs from the standard library, the ordered content of the standard transparent structures, and probably most importantly the calling convention for functions. Okay there are a few more things in the mix such as the symbol resolution and details like those, but the main points are here, I think.

Now in all the things I noted above there is one thing that can be extracted without having to change the whole architecture ABI — it’s the C library ABI: the symbol table, typedefs, ordered content of transparent structures, and so on. That is the limited concept of shared object ABI applied to the C library object itself. This kind of change still require a lot of work, among others because of the way glibc works, which will likely require replacing a number of libraries, modules, the loader and more and more.

Why do I single out this kind of change? Well, while this would have also caused trouble with binaries the same way the introduction of a new architecture did, there is an interesting “What if” scenario: What if Ryan’s FatELF and Intel’s x32 ABI happened nine years ago, and people would have been keen on breaking the C library ABI for the good old x86 at the time?

In such a condition, with the two ABIs being both ILP32 style (which means that int, long and void* are 32-bit), if the rest of the C library ABI was the same between the two, a modified version of Ryan’s FatELF approach – one where the data sections are shared – could have been quite successful!

But let it be clear, this is not going to happen as it is now. Changing the C library ABI for x86 at this point is a script worth of Monty Python and the new x32 ABI corrects some of the obvious problems present in x86 itself — namely the use of 32-bit off_t (which restricts the size of files) and time_t (which could cause the Y2K38 bug), with the two of them having widely incompatible data structures.

Last few notes about x32

So my previous posts were picked up by none others than LWN.net — it was quite impressive to see the tweet of them picking up my blog post, it’s the first time, although I did author articles for them before.

Now in the comments of the articles and LWN’s own signalling of it, you can find a lot of discussion about the merits of x32, and a little of it tries to paint me as uninformed. I would like to just say a few words about that right now so that I don’t have to go through this later on. I’ve been toying around ELF, x86-64, PIC and structure optimisation for a very long time. I’ll come back in a moment on why I didn’t do a more thorough analysis and my own benchmarks of the architecture, but if you really think I’m just an amateur because I work on Gentoo Linux and not Fedora or Ubuntu, please think again. I might not be one of the “greats”, but I don’t think I’d be boasting if I say that I know what I’m doing — most of the times at least.

So why did I not go into doing my own benchmark to show the numbers of (non-)improvement on x32? Because for me it would be time wasted. I’m not Phoronix, I don’t benchmark stuff for a living, and I’m neither proposing the ABI or going to do work on it myself. I looked into the new ABI because from one side, it’s always cool to learn about new techniques and technology, even when they sound a little over the top (I did look a lot into FatELF as well and I was very negative about it — I hope Ryan doesn’t hold a grudge against me, I was quite unlikeable from his point of view I’m sure), and from the other because my colleague Luca suggested it could be useful to get some more performance out of a device we’re working on.

Now, said device is embedded, runs Gentoo Linux, and needs libav and x264 – I’m not going to give you any more specifics about it – which is why my first test on the new ABI has been testing libav (and finding it requiring way too much work than it would make sense for us). Looking into it has also told me that some of the assumption I made about how the new ABI would have been designed, for instance the fact that long was still 32-bit surprised me.

I’ve been told my arguments are “strawmen” because I singled out some specific topic instead of doing a top-down analysis — as the title of my post, and the reference to my old ccache article should have suggested, I was looking into some of the things I’ve been discussing, or have been told. The only exception to that has been my answer to “x32 is going to be compatible with x86, if not now in the future.” — I have talked with nobody about this but I’ve seen this kind of misconception floating around, especially at the time of the FatELF proposal, about a 64-bit ABI which would be binary compatible with good old 32-bit x86.

The purported reason for having such an ABI would be being able to load 32-bit closed-source libraries into the address space of 64-bit programs or vice-versa. The idea is that this way the copy of Skype I’m running wouldn’t be loading into my memory a copy of the 32-bit libc.so.6 library, which is used by no other process.

If it feels like my posts have been aimed squarely at the Gentoo folks, it might very well be right, although that was not the intention. Most people who look into new ABIs as they come out are probably on the same page as most Gentoo users with their bleeding edge feeling — if you have only production Fedora installs, you really won’t give much about an ABI fedora is not released for yet! And given Mike made us the first distribution releasing something for the ABI, it feels right to discuss Gentoo issues first.

Now I also been told that I didn’t talk enough about the reduction in size of data structures, which improves the use of the data cache (not the instruction cache as Francesco said in the comments of the first article), and for that people got the impression I don’t know how much of a difference that makes … that would be wrong given that I’ve actually discussed methods to minimize data usage and have spent time writing a tool to reduce copy-on-write even when that means making changes for ludicrously small improvements.

I have also been working closely with codiff and pahole from Arnaldo’s dwarves package to make sure that the software I manage has properly-designed structures, not only reducing the size of the single object, but making sure that attributes that are to be used together are grouped nearby — this is pretty important for data cache handling, and might goes against what most people are told in school, here at least, that attributes in classes have to be ordered semantically, not by use.

On a different note it would be nice if it was possible to tell the compiler that a given structure never leaves the object, and thus it can reorder it as needed to get the best performance — but that would also require that each unit reorders it properly. Nevermind.

There are some interesting things to be considered as well — if you need fast access to objects in an array, you might be interested in using a little more memory and make sure the object’s size is a power of two, so that instead of using expensive multiplications you can use left shifts to calculate the offset from base pointer of a given index.

I know that reducing the size of pointers and long will reduce the pressure on the data cache, which in turn means you can have faster pointer chasing and better access to thinks like linked lists and so on — on the other hand I don’t think that this improvement is worth all the compatibility and porting headaches that a new ABI involves, especially considering that, as we move along, more and more software will make a better use of the 64-bit address space, as developers start to understand they have to drop the old design and paradigm of scores of years ago and replace it with modern design; Paul-Henning Kamp of FreeBSD and Varnish fame said it very well in the linked ACM article.

So to sum it up: I still don’t think x32 is worth my time, whether it is for porting, bug-filing or benchmarking. Of course if somebody gets libav to work on x32 I’ll be the first person to set up a FATE instance for it, and if Gentoo decides to make it a first-class citizen I’ll set up a tinderbox instance for it, but … I sure hope I won’t have to spend more time on it.

What I think I’ll spend some time on in the next few days, that I started thinking about after all the comments, is some posts describing things such as what an ABI actually is in this content, and how to see whether your structures are simply inadequate for what you’re trying to do. It might get interesting.

And to finish this off, I know I use “Now,” to start paragraphs way too often — I guess this is the reason why O’Reilly wouldn’t consider me as an author.

Debunking x32 myths

There has been many comments on my previous post about the new x32 ABI; some are interesting, others are more “out there” — the feeling I get is that there is quite a bit of cargo culting, with people thinking “there has to be a reason why is is developed, so it’ll be good for me!” without actually having the technical background to judge the usefulness of all this.

So in the same spirit with which I commented on ccache almost exactly four years ago (wow, I have been keeping a blog for a very long time, haven’t I?), I’ll try to debunk a few of the myths and misconception around this new ABI.

The new x32 ABI has proven to be faster. Not really; what we have right now are a few benchmarks, published by those who actually created the ABI, Of course you’d expect that those who spent time to set it up found it interesting and actually faster, but I honestly have doubts about the results, for reasons that will be clearer by reading the next few entries.

It’s also interesting to note that while the overall benchmarks seem to be positive, the numbers are quite close in general.. and even Intel’s presentation only gives you actual “big” numbers only when comparing with the original x86 ABI — which nobody is saying is better than what x86-64 is!

The data is also coming from a synthetic test, not from an actual overall system usage, and if you have any clue about benchmarks you know that the numbers can easily lie out of their teeth!

The new ABI generates smaller code, which means more instruction will fit in cache, and you’ll have smaller files as well. This is absolutely false. Not only the code generated is generally the same as x86-64 (you’re not changing the instruction set at all, you’re just changing the so-called “data model”, which means you change the size of long (and related types) and of the pointers (and thus the address space).

From one side it is theoretically correct that you’re going to have smaller data structures, which means you can make better use of the data cache (not of the instruction cache, be sure!) — but is this the correct approach? In my informed opinion, it should be a better idea to look into actually writing code that considers the cachelines, if your code is cache-hungry! You can use dev-util/dwarves which is a set of utilities by Arnaldo (acme) — pahole will tell you how your data structures will be split in memory.

Also remember that for compatibility the syscalls are kept the same with x86-64, which means that all the kernel code executed, and all the data structures that are shared with the kernel are the same as x86-64 (which means that a number of data structures won’t even change their size with the new ABI).

Actually, referring again to the same slides you can see on on slide 24 that the x32 code can be longer than x86’s original code — it would have been nice if they included the same code in x86-64, especially since I don’t speak VCISC, but I think it’s just the same code.

It might be of interest to compare the size of the libc.so.6 file itself; this is the output of rbelf-size from my Ruby Elf suite:

        exec         data       rodata        relro          bss     overhead    allocated   filename
     1239436         7456       341974        13056        17784        94924      1714630   /lib/libc.so.6
     1259721         4560       316187         6896        12884        87782      1688030   x32/libc.so.6

The executable code is actually bigger in the x32 variant — the big change is of course in the data sections (data, rodata, relro and bss) as the pointers have been halved — I honestly wonder how’s it possible for the C library to have so many pointers in its own structures, but it’s a question beside the point. Even if these numbers are halved, the difference is not that big, in total you have something along the lines of 30KB less data allocated, which is unlikely to even change the memory map.

The data size reduction is useful. Okay this seems to be a common issue. Sure it is the case that the data structures are smaller with x32, that’s its design after all. The main question would probably be “is this significant?” — I don’t think it is. Even in the example above with the C library, the difference while still “big enough”, is just under 20% of the allocated space … of the C library! A library that is supposed to implement the very minimal interface.

Now if you add up all the possible libraries, you probably can shave off a few megabytes of data of course but … you’ll have to add in all the porting issues that I’m going to discuss soon. Yes it is true that C++ and most VM languages will have less pressure, especially when copying objects, thanks to the reduced pointers’ size, but this is still quite a stretch. Especially since for the most part you’ll have to keep data buffers aligned to at least 8 bytes (64-bit) to make use of the new instructions — you already to align them to 16 bytes (128-bit) to make use of some SIMD sets.

And for those who think that x32 is reducing the size of files on disk — remember that as it is you can’t run a pure-x32 install; what you get is usually going to be a mix of three ABIs: x86, amd64 and x32!

But there is no reason for $application to deal with more than 4GiB memory. Yes of course that might be true, but really, do you care about the pointer size? If you really want to make sure that the application doesn’t use more than a given amount of memory, use system limits! They are definitely less intrusive than building a new ABI altogether.

Interestingly there are two way different, contrasting, applications of a full 64-bit address space on systems with less than 4GiB of RAM: ASLR (Address Space Layout Randomization — which can really load the various objects an application require at widely different addresses), and Prelink (which can then make sure that every unique object on the system is always loaded at the same address, yes that’s really the opposite of what ASLR does!).

Applications use long but they don’t need the full 64-bit space. And of course the solution is to create a new ABI for it, according to some people.

I’m not going to say that there are many applications that still use long without a clue on why they do that; they probably have some very little range of values they want to use and yet they use “big values” such as long, as they probably learnt programming on systems that use it as a synonym for int — or even better they learnt programming on systems where long is 32-bit but int is 16-bit (hello MS-DOS!).

The solution to this is simply to use the standard integers provided by stdint.h such as uint32_t and int16_t — so that you always use the data size you’re expecting and needing! This also has the side-effect of working on many more systems than you expect, and works with FFI and other techniques.

Hand-coded assembly is rare. This is one thing a few people told me after my previous post as I complained about the fact that with the new ABI as it is we’re losing most of the hand-coded assembly. This might strictly be true, but it might be less rare than you think. Even excluding all the multimedia software, crypto software usually makes good use of SIMD as well, and that’s done through hand-coded assembly, not through the compiler’s intrinsics.

There is also another issue with hand-coded assembly in software such as Ruby — while Ruby 1.9 fails to build on x32, it gets much more interesting on Ruby 1.8 because while it builds just file, it_segfaults at runtime_. Reminds you of something?

Furthermore, it’s the C library itself that comes with most of the handcoded assembly — the only reason why you don’t feel the porting pressure is simply that H.J. Lu that takes care of most of those is one of the authors of the new ABI, which means the code is already ported there.

x32 is going to be compatible with x86, if not now in the future. Okay this I didn’t have a comment about before, but it’s one misconception I’ve noticed before being thrown around. Luckily, the presentation comes to help, slide 22 makes it very clear that the ABI are not compatible. Among other things you have to consider that the x32 ABI at least corrects some of the actual mistakes in x86, including the use of 32-bit data types for off_t and similar. Again, something I talked about two years ago.

This is the future of 64-bit processors. No; again refer to the slides in particular slide 10. This has been explicitly designed for closed systems rather than as a replacement for x86-64! How does that feel now?

The porting effort is going to be trivial, you just have to change the few lines of assembler and change the size of pointer arithmetic. This is not the case. The porting requires a number of other issues to be tackled, and handcrafted assembly is just the tip of the iceberg. Breaking the assumption that x86-64 has 64-bit pointers is, by itself, quite a big deal, but not as big as one might assume at first (it’s the same way on Windows), what I think is going to be a big issue is going to be the implementation of FFI style C bindings — remember I said it wasn’t an easy answer?

CPUs perform better on 32-bit operands than 64-bit. Interestingly, the only CPU that Intel admits do perform better on 32-bit on the presentation I already linked a few times, is the Atom — the quote is actually “64bit imul latency is twice of 32bit imul on Atom”.

Now, what the heck is imul? That’s a signed multiply operation. Do you multiply pointers? It doesn’t make sense. Besides, pointers are not signed. Are you telling me that your most concern is for a platform (Atom) that has extra latency on an operation when people use 64-bit data types and they should instead use 32-bit? And your solution for that concerns is to create a new ABI where it’s harder to use 64-bit data types instead of going to fix whatever program is causing the problem?

I guess I should end it here, because this last note about the Atom and imul is probably going to make the day of most people who have half a clue.

Why Foreign Function Interfaces are not an easy answer

With the term FFI you usually refer to techniques related to GCC’s libffi and their various bindings, such as Python’s ctypes. The name should instead encompass a number of different approaches that work in very different ways.

What I’m going to talk about is a subset of FFI techniques that work the way FFI works, which means they also cover .NET’s P/Invoke — which I briefly talked about in an old post.

The idea is that the code for the language you’re writing in, declares the arguments that the foreign language interfaces are expecting. While this works in theory, it has quite a few practical problems, which are not really easy to see, especially for developers whose main field of expertise is interpreted languages such as Ruby, or intermediate ones like C#. This, just because the problems are related to the ABI: Application Binary Interface.

While the ABI for C and C++ is quite different, I’ll start with the worst case scenario, and that is using FFI techniques for C interfaces. A C interface (a function) is exposed only through its name, and no other data; the name does not encode either the number of the type of parameters, which means that you can’t reflectively load the ABI based off the symbols in a shared object.

What you end up doing, in these cases, is declare in the Ruby code (or whatever else, I’ll stick with Ruby because that’s where I usually have experience) the list of parameters to be used with that function. And here it gets tricky: which types are you going to use for parameters? Unless you’re sticking with C99’s standard integers, and mostly pure functions, you’re going to have trouble, sooner or later.

  • the int, long and short types do not have fixed sizes, and depending on the architecture and the operating system they are going to be of different size; Win64 and eglibc’s x32 are making that even more interesting;
  • the size of pointers (void*) depends once again on the operating system and architecture;
  • some types such as off_t and size_t depends not just on the architecture and operating systems but also on the configuration of said system: on glibc/x86, by default they are 32-bit, but if you do enable the so-called largefile support they are 64-bit (the same goes with st_ino as that post suggest);
  • on some architectures, the char type is unsigned, on others it is signed, which is one of the things that made PPC porting quite interesting, if you weren’t using C99’s types;
  • if structures are involved, especially with bitfields, you’re just going to suffer, since the layout of the structure, if not packed, depends on both the size of the fields and the endianness of the architecture — plus you have to factor in the usual chance for difference due to architecture and operating system.

Up until now, the situation doesn’t seem to be unsolvable; indeed it should be quite easy, if not for the structures, if you create type mappings for each and every standard type that could change, and make sure developers use them… of course things don’t stop there.

Rule number one of the libraries’ consumer: ABI changes.

If you’re a Gentoo user you’re very likely to be on not-too-friendly terms with revdep-rebuild or the new preserved libraries feature. And you probably have heard or read that the issue with requiring to rebuild other packages is that one of the dependencies changed ABI. To be precise, what changes in those situation is the soname which is declaring the library changed ABI, which is nice of them.

But most of the changes in ABI are not declared, either by mistake or for proper reasons. In the former case, what you have is a project that didn’t care enough about its own consumers and didn’t make sure that its ABI is compatible one release with the other, and that didn’t follow soname bumping rules, which is actually all too common. In the latter scenario, instead, you have a quite more interesting situation, especially for what FFI is concerned.

There are some cases where you can change ABI, and yet keep binary compatibility. This is usually achieved by two types of ABI changes: new interfaces and versioned interfaces.

The first one is self-explanatory: if you add a new exported function to a library, it’s not going to conflict with the other exposed interfaces (remember I’m talking about C here; this is not strictly true for C++ methods!). Yet that means thast the new versions of the library have functions that are not present in the older ones — this, by the way, is the reason why downgrading libraries is never well-supported, especially in Gentoo (if you rebuilt the library’s consumers, it is possible that they used the newly-available functions — they wouldn’t be there after the downgrade, and yet the soname didn’t change, so revdep-rebuild wouldn’t flag them as bad).

The second option is trickier; I have written something about versioning before, but I never went out of my way to describe the whole handling of it. Suffice to say that by using symbol versioning, you can get an ABI-compatible change, for an API-compatible change that would otherwise break the ABI.

A classical example is moving from uint32_t to uint64_t for the parameters of a function: changing the function declaration is not going to break API because you’re increasing the integer size (and I explicitly referred to unsigned integers so you don’t have to worry about sign extension), so a simple rebuild of the consumer would be enough for the change to be integrated. At the same time, such a change in the C ABI would make the change incompatible, as the size of the parameters on the stack doubled, so calls to the previous API would crash on the new one.

This can be solved – if you used versioning to begin with (due to the bug in glibc I discussed in the article linked earlier) – by keeping a wrapper around the new API which still uses the old one, and making each of them use a new version for the symbol. At that point, the programs built against the old API will keep using the symbol with the original version (the wrapper), while the new ones will build straight to the new API. There you are: compatible API change leads to compatible ABI change.

Yes I know what you’re thinking: you can just add a suffix to the function and use macros to switch consumers to the new interface, without using versioning at all; that’s absolutely true, but I’m not trying to discuss the merits of symbol versioning here, just explaining how it connects to FFI trouble.

Okay, why is it all this relevant then? Well, what the FFI techniques use to load the libraries they wrap around is the dlopen() and dlsym() interfaces; the latter in particular is going to follow the step of the link editor, when a symbol with multiple versions is encountered: it will use the one that is declared to be the “default symbol”, that is, the latest added (usually).

Now return to the example above: you have wrapped through FFI the function to require two parameters as uint32_t, but now dlsym() is loading in its place a function that expects two uint64_t parameters.. there you are, your code has just crashed.

Of course it is possible to override this throught he use of dlvsym() but that’s not optimal because, as far as I can tell, it’s a GNU extension, and most libraries wouldn’t be caring about that at all. At the same time, symbol versioning, or at least this complex and messed up version of it, is mostly exclusive to GNU operating systems, and its use is discouraged for libraries that are supposed to be portable… the blade there is two-sided.

Since these methods are usually just employed by Linux-specific libraries, there aren’t so many that are susceptible to this kind of crash; on the other hand, since most non-Linux systems don’t offer this choice, most Ruby developers (who seem to use OS X or Windows, seeing how often we encountered case-sensitivity issues compared to any other class of projects) would be unaware of its very existence…

Oh and by the way, if your FFI-based extension is loading libfoo.so without any soversion, you’re not really understanding shared objects, and you should learn a bit more about them before wrapping them.

What’s the morale? Be sure about what you want to do: wrapping C-based libraries often is a good choice to avoid reimplement everything, but consider if it might not be a better idea to write the whole thing in Ruby, it might not be so time-critical as you think it is.

Writing C-based extension move the compatibility issues at build-time, which is a bit safer: even if you write tests for each and every function you wrap (which you should be doing), the ABI can change dynamically when you update packages, making install-time tests not much reliable for this kind of usage.

Your symbol table is not your ABI

You’d say that for how much I have written on the topics related to shared libraries, ABIs, visibility and so on, I would be able to write a whole book about the subject. Unfortunately, between me not being a native speaker – thus requiring more editing – and the topic not being about any shiny web technology, there seem to be no publisher interested, again. At least this time it’s not that somebody else wrote it already. Heh.

When a wise man points at the moon, the fool looks at the finger.

Yesterday (well, today, but I’m going to post this in the morning or afternoon) I hit an interesting bug in the tinderbox: a game failing to build apparently because of a missing link to libesd (the client library of the now-obsolete ESounD. This seemed trivial to me, as it appeared that the new libgnome didn’t link to libesd any longer. I had to look a bit more to see how bad the situation was. Indeed, libgnome used to bring in libesd via pkg-config; then it started using Requires.private, so it was only brought in indirectly (transitively) through the NEEDED entries. Finally, with this release, it is not even in the NEEDED entries, as long as you’re using --as-needed.

What happened? Well, with the latest release, the old, long-time deprecated esd interfaces for sound support are finally gone, and all the gnome-sound API is based off Lennart’s libcanberra; this is the good part of the story. The other part of the story is that two functions that were tightly tied with the old ESounD interface (to load a sample into the ESounD session, and get a handler to the currently running esd session, respectively) are no longer functioning. Since GNOME is not supposed to break ABI (Application Binary Interface) between releases within version 2, the developers decided to keep the symbols around, and, quoting them, to keep linking to libesd to maintain binary compatibility.

Well, first off, the “binary compatibility” that they wish to keep by linking to libesd is rather compatibility with underlinked software that, in its own self, is a bad thing we shouldn’t be condoning at all. Software that uses esd should link against it, not rely on the fact that libgnome would bring it in.

On the other hand, they seem to assume that the ABI is little more than what the symbol tables provides. Of course there are the symbols exported that are part of the ABI, and their types; for the functions, also the type and order of their parameters. You also have to add the structures, the types of their content, their size and the order. All of this is part of the ABI, yet the only thing that the linker will enforce is the symbol table, which is why in my title I wanted to make it clear that the symbol table is not the only part of the ABI; or the fact that the ABI is not only what the linker can enforce — or the API what the compiler can enforce!

What is that I’m referring to? Well as I said there are two functions now that make no sense at all; they weren’t, though, removed, as that would have broken the part of the ABI visible by the linker. Instead, the functions now return the value -1 (i.e. there has been an error) without doing anything (almost; actually the sample loading is done through libcanberra, but it really doesn’t matter given that you’re trying to get the sample handle, that is not returned anyway). Even though this won’t cause a startup error, or a build-time error, it’s still breaking the ABI: software that was built relying on the two functions working will not work starting this version.

You’re not maintaining binary compatibility, you’re hiding your own ABI breakage under the rug!

This is not dissimilar to the infamous undefined symbol problem as it simply shuffles to runtime a mistake that could have been easily found at build-time, by making the build fail when the symbol was used, to run-time, by forcing an error condition where the code was unlikely to expect one.

In this particular case, only two packages directly express the need for esd to be enabled in libgnome2; they should be fixed or simply dropped out of Portage, as ESounD has been deprecated for many years now and thus make no sense to find a way to fix these packages, they are more than likely dead if they didn’t write a fallback already they have probably stopped developing it.

Mono and Gentoo: integration needs more work

I’ve already written quite a bit about the fact that I’m mostly a Mono enthusiast and that I think there is work to be done to integrate Mono-based builds into autotools but I haven’t spent enough words about the integration of Mono in Gentoo as a distribution.

Indeed, the Mono team right now seem to be uniquely implemented by Peter Alfredsen (loki_val), which is of course a sub-optimal situation, since nothing should really end up being done by a single person in theory; in practice that’s more than common in Gentoo and that is one of our worst problems. And it’s not just a matter of not having the time to deal with everything, but also that you cannot brainstorm to separate the bad ideas from the good ones, and to polish them so that they can be used by more than a few people.

In paticular, there are quite a few things in the way Gentoo handles Mono that I’d love to see improved, but that I doubt would be considered unless me and someone else would join the team to discuss about them. Now, some things are definitely subjective – for instance I don’t like having upstream packages split in multiple ebuilds, especially now that we got USE-based dependencies, while it seems to be something that Peter loves to do – but others are definitely areas that need some work.

The first problem relates to where do we install Mono files: for some reason, in Gentoo we’re currently installing the Mono libraries under /usr/lib64, for AMD64 multilib systems; this is probably due to the fact that usually they also install 64-bit libraries and thus their libdir is supposed to be suffixed with 64. Unfortunately this is against what upstream uses, since Novell uses /usr/lib for it all — indeed, all the .NET libraries, the .dll and .exe files, are arch-independent, or actually platform-independent, for the most part (see this post for more details about how is it possible for .NET libraries to be arch-dependent). We’re stuck at patching lots of libraries, just like Fedora, because of that path change.

Another problems appears when you factor in the problem of ABI and .NET libraries: while Ruby, Perl and Python don’t really have ABI (at least between programs and native libraries), and Java have ABI but no ABI-definition (thus requiring a lot of manual work for the Java team), .NET policies come very near to ELF files and versioning, for which we have revdep-rebuild already. Unfortunately we have no similar tool for .NET (and it wouldn’t always work fine, given that undefined symbols in .NET are not fatal, and can easily be handled — for instance the software I’m developing only uses Outlook if it can load all its libraries).

Also, that I know of at least, there is no script to verify the properness of the runtime dependencies of Mono software, which is quite a bit of a problem when you end up packaging it yourself. I’m pretty sure somewhere there is a tool to check dependencies akin to the Dependency Walker but I really don’t know about it (if somebody has a name, that would probably be appreciated!).

All in all, there aren’t really big problems with Mono in Gentoo; they really appear no problems at all when you consider what we have yet to fix with Ruby but they still can be a bit of a bother. And they need more people for them to be fixed.

Optional interfaces and ABI compatibility

A few months ago I started writing about ABI (Application Binary Interface) and the problems related to maintaining that. In particular I’ve written about shared object versioning and the need of updating that when you change the interface of functions.

In all this topic, though, I have ignored one big issue with ABI and the versioning: optional interfaces. While a lot of software, included libraries, have optional features, not all of them have optional interfaces. The difference here is the key, but let me proceed with order.

When a library has optional features but not optional interfaces, it means that both the programming and binary interfaces are kept constant, but the way the library works depend on the features enabled at build time. For instance, a library like FFmpeg’s libavcodec will be able to transcode some given codec type only if the support is not disabled at build time, but the function from the program to the library to request the encoding is the same nonetheless; if the library cannot handle the encoding, it will tell that back to the program and it’s up to that to handle the thing properly.

Instead, when a library has optional interfaces, the functions called from program to library are dropped at build time entirely if they are not compiled in; this is what libxml does, as well as what the ALSA library does with midi (which I guess is what created the FUD about midi support in ALSA that made our terrific ALSA team from dropping all the work I’ve done for making that optional in the first place).

While using optional interfaces can reduce the size of a library, there are advantages and disadvantages with this approach:

  • you get an error at buildtime for the program if it’s trying to use features that are not available — assuming you’re using the “no undefined” linking method;
  • if you disable a previously enabled feature and launch a binary that use that, it either not start up, or will die as soon as it tries to call a function that’s gone.

The latter is more insidious than you might expect: for instance if you build the software on a machine where the library has the interfaces and then move it on another, it might not warn you that it has missing dependencies.

Taking this into consideration, I really really suggest not to use optional interfaces in libraries at all, and rather, if you do want optional interfaces, break the library in multiple sub-libraries. This is what libxml really should be doing for instance (Mart has been saying that for a long time — although we disagree how the compatibility issue should be tackled, he’d like to use a standard ELF library that would break with --as-needed, while I’m more inclined on using a ldscript that redirects to the actual libraries), so that each program can request the actual interface that it’s going to need. This slightly increases the overhead because you have no longer the local symbols for the shared functions, but you instead use a single “core” private-interface library (think of libpulsecore) that is 1:1 versioned (with the release in the filename, and soversion 0). But this is usually negligible.

So what’s the next step? Well I guess soon enough I’ll illustrate how to actually do the compatibility trick with the ldscript, but that’s for another day.