Gentoo Public Service Announcement: crossdev, glib and binary compatibility

Quick summary, please read as it is important! If you’re running a 64-bit arch such as amd64, then you’re safe and you can read this simply as future reference. The same goes if you have never used sys-devel/crossdev (even better if you have no idea what that is. But if you’re cross-compiling to a 32-bit architecture such as ARM, you probably want to read this as you might have to restart from scratch or you’re using a 32-bit architecture and have sys-devel/crossdev installed you really want to read this post.

Today started as a fun day; not only I stated filing bugs from a tinderbox that is now running with the gold link editor but PaX Team pointed me at an interesting, and unfortunate, issue with crossdev and glib.

The issue shows itself as VMware crashing; this is likely not to be the only program that does so, any binary, pre-built program using system glib could crash the same way. The problem? A simple out of boundary access related to GMutex (in particular GStaticMutex). The root cause? A bit more complex to understand.

The issue was already tracked down, when relayed to me, to a configure check misreporting the size of pthread_mutex_t (the underlying type used by GMutex). On a 32-bit system, the size is 24, on a 64-bit system it is 40… but the build from Portage shown the reported value, on i386, as 40 as defined by cache – 40 (cached) – which was simply wrong.

This matters: the GMutex and GStaticMutex types are not entirely opaque to their users; being semi-transparent means that their size is to be considered part of the ABI and thus binary compatibility is broken when this changes. For a properly-built glib, this is not a problem: on a given arch the structure will always have the same size, but when you have one built with a misreported value.. you’ve got a problem.

Of course up to here I didn’t talk about the root cause, of why Portage-built glib would get the wrong value for that size, since it should be an easy test to run. Well, the reason is that the underlying GLIB_SIZEOF macro used to check the size of the structure uses an execution test, which is not cross-compilable. And how does that matter, if you’re not cross-compiling?

When you do install sys-devel/crossdev , the ebuild installs some “site files” which are used to tell ./configure about known results of execution tests, since you can’t have them when cross-compiling. One of these site files sets the size of pthread_mutex_t (or to be more precise further, the size of GMutex underlying object) to 40. But instead of doing so only for 64-bit arches, it does so for all Linux systems, which is bad.

This was then reported as bug #367351 and I just committed a fix that should fix the issue fine.

Unfortunately once the fix is actually deployed, and glib is further rebuilt, the ABI of libgthread will change, and the same issue noted with VMware above will happen for all the binaries linked against the previous glib. As a safety measure, even before the new crossdev is actually deployed, run the following set of commands

# sed -i -e '/^glib_cv_sizeof_gmutex/d' /usr/share/crossdev/include/site/linux
# emerge -1 glib
# revdep-rebuild --library libgthread-2.0.so.0

This will make sure that the correct ABI is present once again in libgthread, and you should be safe from there on. Please note that I have honestly no idea (nor I have a quick way to check) whether the 32-bit libgthread in the emul packages is good or not. I need to track down who’s managing those right now.

How to tell if you’re affected? First of all you should check whether your platform has a 24-bytes or a 40-bytes pthread_mutex_t. Quick way to do so is compile and execute the following test code on the host (the target of the cross-compilation if that’s what you’re doing!)

#include 
#include 

int main() {
    printf("%zun", sizeof(pthread_mutex_t));

    return 0;
}

If it reports 40, you’ve got a 40 bytes structure, and thus the injected cache value is consistent with your system and have not to worry any further. Otherwise you have to consider whether you have a broken glib or not. I don’t really know how to get the compiled value out of a binary glib, so unless you want to look at the logs, I’d say it is better to consider glib broken if you ever had crossdev installed. Do note: having it installed is enough, you don’t have to have used it at all, which is the worst part of it all.

So if you ever have had crossdev installed in your system, run the three commands I already listed:

# sed -i -e '/^glib_cv_sizeof_gmutex/d' /usr/share/crossdev/include/site/linux
# emerge -1 glib
# revdep-rebuild --library libgthread-2.0.so.0

Please do this as soon as you can, it is important to have a proper ABI in libgthread for stability of your system.

I wish I could say this is one more good reason to run amd64, but it could have gone the same way, this was just human error, and it was out of real luck that we caught this, I guess.

Important! Do not update to Python 2.6.5_p20100801

Seems like someone pulled another breakage, almost an year after the past one. Please do not upgrade to this version of Python, and if you have problems like the following:

>>> Emerging (1 of 1) dev-lang/python-2.6.5-r3
Traceback (most recent call last):
  File "/usr/bin/emerge", line 42, in 
    retval = emerge_main()
  File "/usr/lib64/portage/pym/_emerge/main.py", line 1555, in emerge_main
    myopts, myaction, myfiles, spinner)
  File "/usr/lib64/portage/pym/_emerge/actions.py", line 434, in action_build
    retval = mergetask.merge()
  File "/usr/lib64/portage/pym/_emerge/Scheduler.py", line 914, in merge
    rval = self._merge()
  File "/usr/lib64/portage/pym/_emerge/Scheduler.py", line 1222, in _merge
    self._main_loop()
  File "/usr/lib64/portage/pym/_emerge/Scheduler.py", line 1369, in _main_loop
    self._poll_loop()
  File "/usr/lib64/portage/pym/_emerge/PollScheduler.py", line 134, in
_poll_loop
    handler(f, event)
  File "/usr/lib64/portage/pym/_emerge/SpawnProcess.py", line 151, in
_output_handler
    buf.fromfile(files.process, self._bufsize)
IOError: [Errno 11] Resource temporarily unavailable

Then please see bug #330937at this point in time I have no idea how to solve if you updated already, it’s 3:50am and I’m actually here just because I made a mistake rebooting Yamato and found this out.

Update: we have a quick hotfix for you to apply if you reach this point:

wget http://dev.gentoo.org/~ferringb/fix-python-2.6.5_p20100801.patch -O - | patch /usr/lib/portage/pym/_emerge/SpawnProcess.py

it’s a one-liner, just execute it, then you can simply re-merge Python 2.6.5-r3 and Portage to have the pristine system.