RPC Frameworks and Programming Languages

RPC frameworks are something that I never thought I would be particularly interested in… until I joined the bubble, where nearly everything used the same framework, which made RPC frameworks very appealing. But despite both my previous and current employers releasing two similar RPC frameworks (gRPC and Apache Thrift respectively), they are not really that commonly used in Open Source, from what I can tell. D-Bus technically counts, but it’s also a bus messaging system, rather than a simpler point-to-point RPC system.

On the proprietary software side, RPC/IPC frameworks have existed for dozens of years: CORBA was originally specified in 1991, and Microsoft’s COM was released in 1993. Although these are technically object models rather than just RPC frameworks, they fit into the “general aesthetics” of the discussion.

So, what’s the deal with RPC frameworks? Well, in the general sense, I like to represent them as a set of choices already made for you: they select an IDL (Interface Description Language), they provide some code generation tool leveraging the libraries they select, and they decide how structures are encoded on the wire. They are, by their own nature, restrictive rather than flexible. And that’s the good thing.

Because if we considered the most flexible options, we’d be considering IP as an RPC framework, and that’s not good — if all we have is IP, it’s hard for two components developed in isolation to be able to talk together. That’s why we have higher level protocols, and that’s why even just using HTTP as an RPC protocol is not good enough: it doesn’t define anywhere close to the semantics you need to be able to use it as a protocol without knowing both client and server code.

And one of the restrictions that I think RPC frameworks are good for, is making you drop the convention of specific programming languages — or at least of whichever programming language they didn’t take after. Because clearly, various RPC frameworks inspire themselves from different starting languages, and so their conventions feel more or less at ease in each language depending on how far they are from the starting language.

So for instance, if you look at gRPC, errors are returned with a status code and a detailed status structure, while in Thrift you declare specific exception structures that your interfaces can throw. Both options make different compromise, and they require different amount of boilerplate code to feel more at ease with different languages.

There are programming languages, particularly in the functional family (I’m looking at you, Erlang!) that don’t really “do” error checking — if you made a mistake somewhere, you expect that some type of error will be raise/thrown/returned, and everything else will fall behind it. So an RPC convention with a failure state and a (Adam Savage voice) “here’s your problem” long stack trace would fit them perfectly fine.

This would be equivalent of having HTTP only ever return error codes 400 and maybe 500 — client or server error, and that’s about it. You deal with it, after all it’s nearly always a human in front of a computer looking at the error message, no? Well…

Turns out that being specific to a point of what your error messages are can be very useful, particularly when interacting at a distance (either physical distance, or the distance of not knowing the code of whatever you’re talking to) — which is now HTTP 401 is used to trigger an authentication request on most browsers. If you wanted to go a further step, you could consider a 451 response as an automated trigger to re-request the same page from a VPN in a different country (particularly useful with GDPR-restricted news sources in the USA, nowadays).

Personally, I think this is the reason why the dream of thin client libraries, in my experience, stays a dream. While, yes, with a perfectly specified RPC interface definition you could just use the RPC functions as if they were a library themselves… that usually means that the calls don’t “feel” correct for the language, for any language.

Instead, I personally think you need a wrapper library that can expose the RPC interfaces with a language-native approach — think builder paradigms in Java, and context managers in Python. Not doing so leads, in my experience, to either people implementing their own wrapper libraries you have no control over, or pretty bad code overall, because the people knowing the language refuse to touch the unwrapped client.

This is also, increasingly, relevant for local tooling — because honestly I’d rather have an RPC-based interface over Unix Domain Sockets (which allow you to pass authentication information) rather than running command line tools as subprocesses and trying to parse their output. And while for simpler services, signal-based communication or very simple “text” protocols would work just as well, there’s value in having a “lingua franca” to speak between different services.

I guess what I’m saying is that, unlike programming languages, I do think we should make, and stick to, choices on RPC systems. The fact that for the longest time most of Windows apps could share the same basic IPC/RPC system was a significant advantage (nowadays there’s… somewhat confusion at least in my eyes — and that probably has something to do with the amount of localhost-only HTTP servers that are running on my machines).

In the Open Source world, it feels like we don’t really seem to like the idea of taking options away – which was clearly visible when the whole systemd integration started – and that makes choices, and integrations, much harder. Unfortunately, that also means significantly higher cost to integrate components together — and a big blame game when some of the bigger, not-open players decide to make some of those choices (cough application-specific passwords cough).

Again about glibc 2.14, RPC and modern software

It looks like my previous post on glibc 2.14 made it to reddit – even though it made not much of an impression to flattr – and there is at least one interesting question asked there, about what software is using RPC that I wasn’t expecting.

While it is definitely true that I underestimated the amount of systems still using the old-style NIS, standing to the commenters on my other post about PAM, there is a long list of packages that make use of glibc’s RPC subsystem that I didn’t expect. All of this definitely doesn’t make for an interface that is dying without replacement, as one commenter expressed:

Except that no one uses Sun RPC for that. It’s only application in modern unixes is NFS, so it does not really belong to libc. nfs-utils and libtirpc should handle that. Same goes for NIS and other remnants from the dark ages. Removing unused bloat from the fundamental system library is actually a good thing.

And for the record, RPC will not be removed from “the fundamental system library”: code for the RPC implementation is still all there, it’s just hidden and disallowed from being linked to, which means that the packages that use the interface will not build, but those that were built before (or the binary packages that come prebuilt) will not fail to run on the new library. No “bloat” removed.

Okay, so what are those packages? Well, for once let’s see at something I have worked on myself for a while and that is actively developed to this very moment: libvirt. that, while obviously designed to work well with libtirpc, can’t be installed on glibc 2.14 (as libtirpc is not fixed yet). And its RPC usage has nothing to do with NFS either. On the other hand, it seems like watchdog, lsof, quota, autofs and possibly tcpdump do need it for NFS support.

I don’t know much about them, but the list of packages requiring RPC includes oc, torque, libcult, libassa, hamlib, lives, xinetd, db (yes Berkeley DB), libdap, tcb, netkit-rusers, netkit-bootparamd, ogdi, charm, netkit-rwall, gs-assembler, ctdb, perdition, amanda, scilab and R….

I haven’t started fixing any of these myself, I have way too much things on my plate already and this is not an high enough priority for me to tackle in my free time, but at least I can report and keep tabs on them. It’s enough for now I guess.

About GLIBC 2.14, EGLIBC, and Gentoo

I was originally planning to write about one of my current job tasks tonight, since that was honestly interesting for the Free Software part as well, but since I’ve received a number of comments in those regards, and even a couple of direct email messages, I think it might be a better use of my time to reply on this situation instead.

I have blogged repeatedly about the trouble caused by the new version of GLIBC (2.14) and its developers’ choice to stop allowing access to the RPC implementation that it comes with, in favour of the new, also-broken-by-the-same-update libtirpc library.

Turns out that this situation is becoming so absurd, that at least Archlinux decided to revert the removal of the RPC interface. And the same decision seems to be taken by the EGLIBC developers (which as far as I can tell, means that Debian and Ubuntu will keep the RPC interface as well). The obvious question people ask me then is “Why isn’t Gentoo doing the same?”

I’m afraid I don’t have a real answer to this: I’m not the GLIBC maintainer, that’s Mike. I’m not in his head and I honestly haven’t asked him to comment on the issue yet; the reason why I’m not pushing him for comments or actions is simple: I see no particular urge to move to the new GLIBC version. The news entries for the new release are a bit short to be of immediate interest to me, and the presence of a bug making Ruby not installable (thanks Sergei for tracking down the root cause!) makes it very low-priority to me, as in, no-priority really.

In particular, the last I knew about the EGLIBC situation, was that Mike preferred validating the applied patches by their own merit, following the upstream GLIBC developers as close as possible unless required for particular architectures and situations, which is a choice I respect deeply. The issue there seems to be that Drepper is getting more and more detached with the needs of the eco-system, and is still a sort-of dictator for what concerns the C library. I was also pointed at some suspects that he’s no longer in direct employment of RedHat, but given that I don’t really care about that I didn’t confirm or reject that; make what you want of it.

As for reverting the removal of RPC interface.. I don’t like that choice. I mean, the problem here is not that we lack a replacement for the RPC interface in GLIBC, but rather than the replacement is non-working. Rather than spend effort in working against GLIBC developers, it would be better spent to fix libtirpc so that it works with GLIBC 2.14, thus leaving us with a properly-working RPC implementation.

In particular, I think it might be a good idea now to implement the proper virtual for RPC implementations on GLIBC and other systems:

elibc_glibc? ( || ( net-libs/libtirpc <sys-libs/glibc-2.14 ) )

Using such a virtual would make it easier for me to ignore the packages that are known not working with glibc-2.14, as the dependencies wouldn’t be satisfied, and the tinderbox would then skip over the package altogether. I guess I should send an email about this so that it can be discussed and implemented.

There is another reason why I’m not so keen on restoring the interfaces that were removed from this version of the C library; while in my previous post’s comments a number of people have commented, correcting me on my first assessment that NIS was dead, it is still something that most desktops wouldn’t need, and uClibc does not implement, and finding the packages relying on said interface is still an interesting task to tackle.

In general, I’m afraid to tell you that I’m not going to “solve” the problem, by restoring the symbols, myself. If Mike decides to take that approach, the fallout is just going to be delayed, not avoided. And no, even though I probably would prefer moving away from GLIBC to EGLIBC – not just for this problem but also for things like the base versioning issue that is making gold less useful than it could be – I don’t have the time nor the motivation to step up and become the new C library maintainer in Gentoo. I barely have the time to keep on track with what I’m already supposed to do.

Are you kidding me? Or, why we’ll wait glibc 2.14 for a while

A couple of days ago I noted the move to glibc 2.14 of my tinderbox with the hope to quickly find and fix the packages that depend on the now-removed RPC interface. I didn’t expect this kind of apocalypse, but I’m almost wanting to cry, thinking about the mess this version seems to create.

First of all, it doesn’t seem like it’s just Ruby being hit by memory corruption issues, which makes it likely that the new memcpy() interface noted in the ChangeLog is to blame. I haven’t had time to debug this yet though.

A new scary situation arose as well: wget exits with a segmentation fault when trying to resolve any hostname that is not in /etc/hosts, which in the case of the tinderbox means anything that is not localhost or Yamato (as that’s where the Squid proxy is added that caches requests for the fetched Gentoo data). I’m not sure of the cause yet, as the fault happens not within the executable’s code but directly into libresolv, which would point at a bug in glibc itself.

For what concern RPC, I’m surprised that there are so many packages depending on it, and of the widest variety: multimedia, scientific, network analysis tools, and so on. Now, I was optimist in my previous post, expecting that for most, if not all, of the packages using RPC would be fixed by relying on libtirpc. Ooh boy, how wrong I was.

See the issue is this: libtirpc itself does not build on glibc-2.14, as it relies on one of the NIS/YP headers that has also been removed. Even worse, the latest version (0.2.2) of libtirpc, which I hoped would solve the issue, does not work on any system at all, since a change by our very own Mike (vapier), which was merged upstream just before 0.2.2 release, causes the build to produce a library that lacks a couple of symbols — the source file where they are defined was not added, but even when you add it, you get a couple more symbols being missing. And this release has been out for over a month without any sign of a 0.2.3 coming (upstream repository is still broken, at the time of writing).

Are you freaking kidding me?

Oh and for those who wonder, the issue with base versioning that, as I’ve told, is holding up implementing base version support in gold, is still not fixed. This means that packages such as fuse, included, who wanted to keep binary compatibility with their original unversioned symbols are still not getting any compatibility, even with this version. In my personal opinion it would be a good time to drop the code for that in fuse, but upstream prefers waiting for the new 3.0 version, which is going to get tricky.

With all this considered, it really looks like a very badly broken release, and one that makes me wonder if it wasn’t too inconsiderate to reject the idea of moving to the eglibc patchset/fork like Debian and Ubuntu seems to have done.