Gentoo Public Service Announcement: crossdev, glib and binary compatibility

Quick summary, please read as it is important! If you’re running a 64-bit arch such as amd64, then you’re safe and you can read this simply as future reference. The same goes if you have never used sys-devel/crossdev (even better if you have no idea what that is. But if you’re cross-compiling to a 32-bit architecture such as ARM, you probably want to read this as you might have to restart from scratch or you’re using a 32-bit architecture and have sys-devel/crossdev installed you really want to read this post.

Today started as a fun day; not only I stated filing bugs from a tinderbox that is now running with the gold link editor but PaX Team pointed me at an interesting, and unfortunate, issue with crossdev and glib.

The issue shows itself as VMware crashing; this is likely not to be the only program that does so, any binary, pre-built program using system glib could crash the same way. The problem? A simple out of boundary access related to GMutex (in particular GStaticMutex). The root cause? A bit more complex to understand.

The issue was already tracked down, when relayed to me, to a configure check misreporting the size of pthread_mutex_t (the underlying type used by GMutex). On a 32-bit system, the size is 24, on a 64-bit system it is 40… but the build from Portage shown the reported value, on i386, as 40 as defined by cache – 40 (cached) – which was simply wrong.

This matters: the GMutex and GStaticMutex types are not entirely opaque to their users; being semi-transparent means that their size is to be considered part of the ABI and thus binary compatibility is broken when this changes. For a properly-built glib, this is not a problem: on a given arch the structure will always have the same size, but when you have one built with a misreported value.. you’ve got a problem.

Of course up to here I didn’t talk about the root cause, of why Portage-built glib would get the wrong value for that size, since it should be an easy test to run. Well, the reason is that the underlying GLIB_SIZEOF macro used to check the size of the structure uses an execution test, which is not cross-compilable. And how does that matter, if you’re not cross-compiling?

When you do install sys-devel/crossdev , the ebuild installs some “site files” which are used to tell ./configure about known results of execution tests, since you can’t have them when cross-compiling. One of these site files sets the size of pthread_mutex_t (or to be more precise further, the size of GMutex underlying object) to 40. But instead of doing so only for 64-bit arches, it does so for all Linux systems, which is bad.

This was then reported as bug #367351 and I just committed a fix that should fix the issue fine.

Unfortunately once the fix is actually deployed, and glib is further rebuilt, the ABI of libgthread will change, and the same issue noted with VMware above will happen for all the binaries linked against the previous glib. As a safety measure, even before the new crossdev is actually deployed, run the following set of commands

# sed -i -e '/^glib_cv_sizeof_gmutex/d' /usr/share/crossdev/include/site/linux
# emerge -1 glib
# revdep-rebuild --library libgthread-2.0.so.0

This will make sure that the correct ABI is present once again in libgthread, and you should be safe from there on. Please note that I have honestly no idea (nor I have a quick way to check) whether the 32-bit libgthread in the emul packages is good or not. I need to track down who’s managing those right now.

How to tell if you’re affected? First of all you should check whether your platform has a 24-bytes or a 40-bytes pthread_mutex_t. Quick way to do so is compile and execute the following test code on the host (the target of the cross-compilation if that’s what you’re doing!)

#include 
#include 

int main() {
    printf("%zun", sizeof(pthread_mutex_t));

    return 0;
}

If it reports 40, you’ve got a 40 bytes structure, and thus the injected cache value is consistent with your system and have not to worry any further. Otherwise you have to consider whether you have a broken glib or not. I don’t really know how to get the compiled value out of a binary glib, so unless you want to look at the logs, I’d say it is better to consider glib broken if you ever had crossdev installed. Do note: having it installed is enough, you don’t have to have used it at all, which is the worst part of it all.

So if you ever have had crossdev installed in your system, run the three commands I already listed:

# sed -i -e '/^glib_cv_sizeof_gmutex/d' /usr/share/crossdev/include/site/linux
# emerge -1 glib
# revdep-rebuild --library libgthread-2.0.so.0

Please do this as soon as you can, it is important to have a proper ABI in libgthread for stability of your system.

I wish I could say this is one more good reason to run amd64, but it could have gone the same way, this was just human error, and it was out of real luck that we caught this, I guess.

12 thoughts on “Gentoo Public Service Announcement: crossdev, glib and binary compatibility

  1. Hi Diego,I’m a bit confused here – if I’ve understood, this issue affects all 32-bit targets, regardless of whether you use an 64-bit host? (Your “if you’re running a 64-bit arch, you’re safe” suggests otherwise).Second, my crossdev install includes an /usr/share/crossdev/include/site/linux-gnueabi containing the line:glib_cv_sizeof_gmutex=${glib_cv_sizeof_gmutex=24}Will this override the incorrect setting for my armv5tel-softfloat-linux-gnueabi target?

    Like

  2. Uhm in theory that should take the precedence over the wrong (40) one — that is, if armv4tel is 32-bit.. which I don’t really know.And yes, it does affect all 32-bit targets.

    Like

  3. @Solour the way I read this it affects everyone on a 64 bit system unless you have no 32 bit libs. How is that a choice thing?Meaning no multimedia capability likely and no ‘mulitlib’ how common is that with 64 bit systems? 98% mabbe?

    Like

  4. @user99: As I read the opening paragraph, the problem is:If you have installed crossdev, then all architectures assume sizeof(pthread_mutex_t)=40, whether or not that is true in reality. (The comment from Josh Parsons and Diego’s response suggest there may be architectures that do not make this assumption, but the original post says that all do, and it is safer to assume they do (and take corrective action) than to assume they do not (and risk letting bad code stand)). As it happens, amd64 users really do have sizeof=40, so their native builds happen to build the right code, by sheer luck. For x86 users, sizeof=24, so problems arise if you have crossdev installed. Additionally, Diego states that all cross-compilation targets will assume sizeof=40, so __anyone__ who cross-compiles to such targets, regardless of the host architecture used to run $CBUILD-gcc, may generate bad code. As above, if the target happens to have sizeof=40, then you get lucky. However, Diego states that 32-bit ARM is believed not to have sizeof=40.

    Like

  5. Since I crosscompile for my OLPC, you might just have saved me days of work. Many thanks!(flattr-subscribed to this article for 12 months – the least I can do)Thank you!

    Like

  6. Thank you for the information. I am revisiting Gentoo. Previously, I was not able to get the install disk working on my then-current hardware (2006, 2008)BTW, if you could point me to information why the documentation uses the new sda sdb etc format, but the minimal install finds my harddrive at hdd instead of sdd?rick underscore galbraith at iprance dot ca

    Like

  7. @Rick Galbraith…its the kernel that changes from hda to sda…. its that simple most of the time.The kernel on the minimal install is quite old, so when it starts up it comes up with the old hdX style addresses.

    Like

  8. I’m running crossdev on an amd64 system to crosscompile to a x86 system. On the amd64 sizeof(pthread_mutex_t)=40 while on the x86 is 24 so it seems correct.The “cross-compiling to a 32-bit architecture” part in the first paragraph is only an issue when you run on 32bit host right? It’s a little bit confusing :-)

    Like

  9. Diego, thanks for the clarification.I have thought more about my situation and it’s a little bit different from the subject of your post. I’m using crossdev to create the compilers needed by distcc (amd64 is a distcc server host while the x86 host is the slow client).In this case the configure phase is done on the x86 system and the pthread size is correctly detected.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s