Why Foreign Function Interfaces are not an easy answer

Flameeyes

13 years ago

With the term FFI you usually refer to techniques related to GCC’s libffi and their various bindings, such as Python’s ctypes. The name should instead encompass a number of different approaches that work in very different ways.

What I’m going to talk about is a subset of FFI techniques that work the way FFI works, which means they also cover .NET’s P/Invoke — which I briefly talked about in an old post.

The idea is that the code for the language you’re writing in, declares the arguments that the foreign language interfaces are expecting. While this works in theory, it has quite a few practical problems, which are not really easy to see, especially for developers whose main field of expertise is interpreted languages such as Ruby, or intermediate ones like C#. This, just because the problems are related to the ABI: Application Binary Interface.

While the ABI for C and C++ is quite different, I’ll start with the worst case scenario, and that is using FFI techniques for C interfaces. A C interface (a function) is exposed only through its name, and no other data; the name does not encode either the number of the type of parameters, which means that you can’t reflectively load the ABI based off the symbols in a shared object.

What you end up doing, in these cases, is declare in the Ruby code (or whatever else, I’ll stick with Ruby because that’s where I usually have experience) the list of parameters to be used with that function. And here it gets tricky: which types are you going to use for parameters? Unless you’re sticking with C99’s standard integers, and mostly pure functions, you’re going to have trouble, sooner or later.

the int, long and short types do not have fixed sizes, and depending on the architecture and the operating system they are going to be of different size; Win64 and eglibc’s x32 are making that even more interesting;
the size of pointers (void*) depends once again on the operating system and architecture;
some types such as off_t and size_t depends not just on the architecture and operating systems but also on the configuration of said system: on glibc/x86, by default they are 32-bit, but if you do enable the so-called largefile support they are 64-bit (the same goes with st_ino as that post suggest);
on some architectures, the char type is unsigned, on others it is signed, which is one of the things that made PPC porting quite interesting, if you weren’t using C99’s types;
if structures are involved, especially with bitfields, you’re just going to suffer, since the layout of the structure, if not packed, depends on both the size of the fields and the endianness of the architecture — plus you have to factor in the usual chance for difference due to architecture and operating system.

Up until now, the situation doesn’t seem to be unsolvable; indeed it should be quite easy, if not for the structures, if you create type mappings for each and every standard type that could change, and make sure developers use them… of course things don’t stop there.

Rule number one of the libraries’ consumer: ABI changes.

If you’re a Gentoo user you’re very likely to be on not-too-friendly terms with revdep-rebuild or the new preserved libraries feature. And you probably have heard or read that the issue with requiring to rebuild other packages is that one of the dependencies changed ABI. To be precise, what changes in those situation is the soname which is declaring the library changed ABI, which is nice of them.

But most of the changes in ABI are not declared, either by mistake or for proper reasons. In the former case, what you have is a project that didn’t care enough about its own consumers and didn’t make sure that its ABI is compatible one release with the other, and that didn’t follow soname bumping rules, which is actually all too common. In the latter scenario, instead, you have a quite more interesting situation, especially for what FFI is concerned.

There are some cases where you can change ABI, and yet keep binary compatibility. This is usually achieved by two types of ABI changes: new interfaces and versioned interfaces.

The first one is self-explanatory: if you add a new exported function to a library, it’s not going to conflict with the other exposed interfaces (remember I’m talking about C here; this is not strictly true for C++ methods!). Yet that means thast the new versions of the library have functions that are not present in the older ones — this, by the way, is the reason why downgrading libraries is never well-supported, especially in Gentoo (if you rebuilt the library’s consumers, it is possible that they used the newly-available functions — they wouldn’t be there after the downgrade, and yet the soname didn’t change, so revdep-rebuild wouldn’t flag them as bad).

The second option is trickier; I have written something about versioning before, but I never went out of my way to describe the whole handling of it. Suffice to say that by using symbol versioning, you can get an ABI-compatible change, for an API-compatible change that would otherwise break the ABI.

A classical example is moving from uint32_t to uint64_t for the parameters of a function: changing the function declaration is not going to break API because you’re increasing the integer size (and I explicitly referred to unsigned integers so you don’t have to worry about sign extension), so a simple rebuild of the consumer would be enough for the change to be integrated. At the same time, such a change in the C ABI would make the change incompatible, as the size of the parameters on the stack doubled, so calls to the previous API would crash on the new one.

This can be solved – if you used versioning to begin with (due to the bug in glibc I discussed in the article linked earlier) – by keeping a wrapper around the new API which still uses the old one, and making each of them use a new version for the symbol. At that point, the programs built against the old API will keep using the symbol with the original version (the wrapper), while the new ones will build straight to the new API. There you are: compatible API change leads to compatible ABI change.

Yes I know what you’re thinking: you can just add a suffix to the function and use macros to switch consumers to the new interface, without using versioning at all; that’s absolutely true, but I’m not trying to discuss the merits of symbol versioning here, just explaining how it connects to FFI trouble.

Okay, why is it all this relevant then? Well, what the FFI techniques use to load the libraries they wrap around is the dlopen() and dlsym() interfaces; the latter in particular is going to follow the step of the link editor, when a symbol with multiple versions is encountered: it will use the one that is declared to be the “default symbol”, that is, the latest added (usually).

Now return to the example above: you have wrapped through FFI the function to require two parameters as uint32_t, but now dlsym() is loading in its place a function that expects two uint64_t parameters.. there you are, your code has just crashed.

Of course it is possible to override this throught he use of dlvsym() but that’s not optimal because, as far as I can tell, it’s a GNU extension, and most libraries wouldn’t be caring about that at all. At the same time, symbol versioning, or at least this complex and messed up version of it, is mostly exclusive to GNU operating systems, and its use is discouraged for libraries that are supposed to be portable… the blade there is two-sided.

Since these methods are usually just employed by Linux-specific libraries, there aren’t so many that are susceptible to this kind of crash; on the other hand, since most non-Linux systems don’t offer this choice, most Ruby developers (who seem to use OS X or Windows, seeing how often we encountered case-sensitivity issues compared to any other class of projects) would be unaware of its very existence…

Oh and by the way, if your FFI-based extension is loading libfoo.so without any soversion, you’re not really understanding shared objects, and you should learn a bit more about them before wrapping them.

What’s the morale? Be sure about what you want to do: wrapping C-based libraries often is a good choice to avoid reimplement everything, but consider if it might not be a better idea to write the whole thing in Ruby, it might not be so time-critical as you think it is.

Writing C-based extension move the compatibility issues at build-time, which is a bit safer: even if you write tests for each and every function you wrap (which you should be doing), the ABI can change dynamically when you update packages, making install-time tests not much reliable for this kind of usage.

Share this: