Bundling libraries for trouble

You might remember that I’ve been very opinionated against bundling libraries and, to a point, static linking of libraries for Gentoo. My reasons have been mostly geared toward security but there has been a few more instances I wrote about of problems with bundled libraries and stability, for instance the moment when you get symbol collisions between a bundled library and a different version of said library used by one of the dependencies, like that one time in xine.

But there are other reasons why bundling is bad in most cases, especially distributions, and it’s much worse than just statically linking everything. Unfortunately, while all the major distribution have, as far as I know, a policy against bundled (or even statically linked) libraries, there are very few people speaking against them outside your average distribution speaker.

One such a rare gem comes out of Steve McIntyre a few weeks ago, and actually makes two different topics I wrote about meet in a quite interesting way. Steve worked on finding which software packages make use of CPU-specific assembly code for performance-critical code, which would have to be ported for the new 64-bit ARM architecture (Aarch64). And this has mostly reminded me of x32.

In many ways, there are so many problems in common between Aarch64 and x32, and they mostly gear toward the fact that in both cases you have an architecture (or ABI) that is very similar to a known, well-understood architecture but is not identical. The biggest difference, a part from the implementations themselves, is in the way the two have been conceived: as I said before, Intel’s public documentation for the ABI’s inception noted explicitly the way that it was designed for closed systems, rather than open ones (the definition of open or closed system has nothing to do with open- or closed-source software, and has to be found more into the expectancies on what the users will be able to add to the system). The recent stretching of x32 on the open system environments is, in my opinion, not really a positive thing, but if that’s what people want …

I think Steve’s reports is worth a read, both for those who are interested to see what it takes to introduce a new architecture (or ABI). In particular, for those who maintained before that my complaining of x32 breaking assembly code all over the place was a moot point — people with a clue on how GCC works know that sometimes you cannot get away with its optimizations, and you actually need to handwrite code; at the same time, as Steve noted, sometimes the handwritten code is so bad that you should drop it and move back to plain compiled C.

There is also a visible amount of software where the handwritten assembly gets imported due to bundling and direct inclusion… this tends to be relatively common because handwritten assembly is usually tied to performance-critical code… which for many is the same code you bundle because a dynamic link is “not fast enough” — I disagree.

So anyway, give a read to Steve’s report, and then compare with some of the points made in my series of x32-related articles and tell me if I was completely wrong.

What could have been. A time travel story of x32 and FatELF

I was toying around with the idea to write about this for a week or two by now, and I discussed it with Luca as well, but my main issue has been coming up with a title this time.. the original title I had in mind was a Marvel-style What if… “… x32 and FatELF arrived nine years ago”. But then Randall beat me to it with his hypothetical question answering site.

So let’s put some background in context; I criticised in the past Ryan’s FatELF idea and recently Intel’s x32 — but would I have criticised them the same way if the two of them came up almost together nine years ago? I don’t think so; and this is the main issue with this kind of ideas: you need the perfect timing or they are not worth much. Both of them came out at the wrong time in my opinion.

So here’s my scenario that didn’t happen and can’t happen now.

It’s 2003, AMD just launched their Opteron CPU which is the first sporting the new x86-64 ISA. Intel, after admitting to the failure of their Itanic Itanium project, releases IA-32e following suit. At the same time, though, they decide that AMD’s route of pure 64-bit architecture for desktop and servers is going to take too long to be production-ready, especially on Linux, as even just discussing multlib support is making LSB show its frailty.

They decide to thus introduce a new ABI, called x32 (in contrast with x64 used by Sun and Microsoft to refer to AMD’s original ABI). At the same time they decide to hire Ryan Gordon, to push forward a sketchy idea he proposed, about supporting Apple-style fat binaries in Linux — the idea was originally ignored because nobody expected any use out of a technique used in 1994 to move from M68k to PowerPC and then left to die with no further use.

The combined energy of Intel and Ryan came up with a plan on how to introduce the new ABI, and thus the new architecture, in an almost painless way. A new C library version is to be introduced in Linux, partially breaking compatibility with the current libc.so.6, but this is for the greatest good.

The new version of the C library, glibc 3.0, will bring a libc.so.7 for the two architectures, introducing a few incompatibility in the declarations. First of all, the largefile optional support is being dropped: both ABIs will use only 64-bit off_t, and to avoid the same kind of apocalyptic feeling of Y2K, they also decided to use 64-bit clock_t types.

These changes make a very slight dent into the usual x86 performance, but this is not really visible in the new x32 ABI. Most importantly, Intel wanted to avoid the huge work required to port to either IA-64 or AMD64, by creating an ILP32 ABI — where int, long and void * are all 32-bit. And here’s where Ryan’s idea comes to fruition.

Source code written in C will compile identically between x86 and x32, and thanks to the changes, aligning the size of some primitive standard types, even more complex data types will be identical between the two. The new FatELF extended format introduced by glibc 3.0 leverages this — original x86 code will be emitted in the .text section of the ELF, while the new code will live in .text32 — all the data, string and symbol tables are kept in single copy only. The dynamic loader can then map one or the other section depending on whether the CPU supports 64-bit instructions and all the dependencies are available on the new ABI.

Intel seems to have turned the tables under AMD’s nose with their idea, thanks to the vastly negative experience with the Itanium: the required changes to the compiler and loader are really minimal, and most of the softare will just build on the new ABI without any extra change, thanks to maintaining most of the data sizes of the currently most widespread architectures (the only changes are off_t behaving like largefile is enabled by default, and then clock_t that got extended). Of course this still requires vast porting of assembly-ridden software such as libav and most interpreter-based software, but all of this can easily happen over time thanks to Ryan’s FatELF design.

Dissolve effect running

Yes, too bad that Intel took their dear time to enter the x86-64 market, and even longer to come up with x32, to the point where now most of the software is already ported, and supporting x32 means doing most of the work again. Plus since they don’t plan on making a new version of the C library available on x86 with the same data sizes as x32, the idea of actually sharing the ELF data and overhead is out of question (the symbol table as well, since x86 still have the open/open64 split which in my fantasy is actually gone!) — and Ryan’s own implementation of FatELF was a bit of an over-achiever, as it doesn’t actually share anything between one architecture and the other.

So unfortunately this is not something viable to implement now (it’s way too late), and it’s not something that was implemented then — and the result is a very messed up situation.

There’s ABI and ABI

With all this talk about x32 there are people who might not know what we’re referring to when we talk about ABI. Indeed this term, much alike its sibling API, is so overloaded with multiple meanings that the only way to know what one is referring to is understanding in which of the many contexts it’s being used.

It’s not that I haven’t talked about ABI before but I think it’s the first time I talk about it in this context.

Let’s start from the meaning of the two acronyms:

  • API stands for Application Programming Interface;
  • ABI stands for Application Binary Interface.

The whole idea is that the API is what the humans are concerned with, and ABI what the computers are concerned with. But I have to repeat that what these two mean depends vastly on the context you refer to them.

For instance what I usually talk about is the ABI of a shared object which is a very limited subset of what we talk about in the context of x32. In that context, the term ABI refers to the “compiled API”, which often is mistaken for the object’s symbol table although it includes more details such as the ordered content of the transparent structures, the order and size of the parameters in functions’ signatures and the meaning of said parameters and the return value (that’s why recently we had trouble due to libnetlink changing the return values, which caused NetworkManager to fail).

When we call x32 and amd64 ABIs of the x86-64 architecture, instead, we refer to the interface between a few more components… while I don’t know of a sure, all-covering phrase, the interfaces involved in this kind of ABI are those between kernel and userspace (the syscalls), the actual ELF variant used (in this case it’s a 32-bit class, x86-64 arch ELF file), the size of primitive types as declared by the compiler (long, void*, int, …), the size of typedefs from the standard library, the ordered content of the standard transparent structures, and probably most importantly the calling convention for functions. Okay there are a few more things in the mix such as the symbol resolution and details like those, but the main points are here, I think.

Now in all the things I noted above there is one thing that can be extracted without having to change the whole architecture ABI — it’s the C library ABI: the symbol table, typedefs, ordered content of transparent structures, and so on. That is the limited concept of shared object ABI applied to the C library object itself. This kind of change still require a lot of work, among others because of the way glibc works, which will likely require replacing a number of libraries, modules, the loader and more and more.

Why do I single out this kind of change? Well, while this would have also caused trouble with binaries the same way the introduction of a new architecture did, there is an interesting “What if” scenario: What if Ryan’s FatELF and Intel’s x32 ABI happened nine years ago, and people would have been keen on breaking the C library ABI for the good old x86 at the time?

In such a condition, with the two ABIs being both ILP32 style (which means that int, long and void* are 32-bit), if the rest of the C library ABI was the same between the two, a modified version of Ryan’s FatELF approach – one where the data sections are shared – could have been quite successful!

But let it be clear, this is not going to happen as it is now. Changing the C library ABI for x86 at this point is a script worth of Monty Python and the new x32 ABI corrects some of the obvious problems present in x86 itself — namely the use of 32-bit off_t (which restricts the size of files) and time_t (which could cause the Y2K38 bug), with the two of them having widely incompatible data structures.

Last few notes about x32

So my previous posts were picked up by none others than LWN.net — it was quite impressive to see the tweet of them picking up my blog post, it’s the first time, although I did author articles for them before.

Now in the comments of the articles and LWN’s own signalling of it, you can find a lot of discussion about the merits of x32, and a little of it tries to paint me as uninformed. I would like to just say a few words about that right now so that I don’t have to go through this later on. I’ve been toying around ELF, x86-64, PIC and structure optimisation for a very long time. I’ll come back in a moment on why I didn’t do a more thorough analysis and my own benchmarks of the architecture, but if you really think I’m just an amateur because I work on Gentoo Linux and not Fedora or Ubuntu, please think again. I might not be one of the “greats”, but I don’t think I’d be boasting if I say that I know what I’m doing — most of the times at least.

So why did I not go into doing my own benchmark to show the numbers of (non-)improvement on x32? Because for me it would be time wasted. I’m not Phoronix, I don’t benchmark stuff for a living, and I’m neither proposing the ABI or going to do work on it myself. I looked into the new ABI because from one side, it’s always cool to learn about new techniques and technology, even when they sound a little over the top (I did look a lot into FatELF as well and I was very negative about it — I hope Ryan doesn’t hold a grudge against me, I was quite unlikeable from his point of view I’m sure), and from the other because my colleague Luca suggested it could be useful to get some more performance out of a device we’re working on.

Now, said device is embedded, runs Gentoo Linux, and needs libav and x264 – I’m not going to give you any more specifics about it – which is why my first test on the new ABI has been testing libav (and finding it requiring way too much work than it would make sense for us). Looking into it has also told me that some of the assumption I made about how the new ABI would have been designed, for instance the fact that long was still 32-bit surprised me.

I’ve been told my arguments are “strawmen” because I singled out some specific topic instead of doing a top-down analysis — as the title of my post, and the reference to my old ccache article should have suggested, I was looking into some of the things I’ve been discussing, or have been told. The only exception to that has been my answer to “x32 is going to be compatible with x86, if not now in the future.” — I have talked with nobody about this but I’ve seen this kind of misconception floating around, especially at the time of the FatELF proposal, about a 64-bit ABI which would be binary compatible with good old 32-bit x86.

The purported reason for having such an ABI would be being able to load 32-bit closed-source libraries into the address space of 64-bit programs or vice-versa. The idea is that this way the copy of Skype I’m running wouldn’t be loading into my memory a copy of the 32-bit libc.so.6 library, which is used by no other process.

If it feels like my posts have been aimed squarely at the Gentoo folks, it might very well be right, although that was not the intention. Most people who look into new ABIs as they come out are probably on the same page as most Gentoo users with their bleeding edge feeling — if you have only production Fedora installs, you really won’t give much about an ABI fedora is not released for yet! And given Mike made us the first distribution releasing something for the ABI, it feels right to discuss Gentoo issues first.

Now I also been told that I didn’t talk enough about the reduction in size of data structures, which improves the use of the data cache (not the instruction cache as Francesco said in the comments of the first article), and for that people got the impression I don’t know how much of a difference that makes … that would be wrong given that I’ve actually discussed methods to minimize data usage and have spent time writing a tool to reduce copy-on-write even when that means making changes for ludicrously small improvements.

I have also been working closely with codiff and pahole from Arnaldo’s dwarves package to make sure that the software I manage has properly-designed structures, not only reducing the size of the single object, but making sure that attributes that are to be used together are grouped nearby — this is pretty important for data cache handling, and might goes against what most people are told in school, here at least, that attributes in classes have to be ordered semantically, not by use.

On a different note it would be nice if it was possible to tell the compiler that a given structure never leaves the object, and thus it can reorder it as needed to get the best performance — but that would also require that each unit reorders it properly. Nevermind.

There are some interesting things to be considered as well — if you need fast access to objects in an array, you might be interested in using a little more memory and make sure the object’s size is a power of two, so that instead of using expensive multiplications you can use left shifts to calculate the offset from base pointer of a given index.

I know that reducing the size of pointers and long will reduce the pressure on the data cache, which in turn means you can have faster pointer chasing and better access to thinks like linked lists and so on — on the other hand I don’t think that this improvement is worth all the compatibility and porting headaches that a new ABI involves, especially considering that, as we move along, more and more software will make a better use of the 64-bit address space, as developers start to understand they have to drop the old design and paradigm of scores of years ago and replace it with modern design; Paul-Henning Kamp of FreeBSD and Varnish fame said it very well in the linked ACM article.

So to sum it up: I still don’t think x32 is worth my time, whether it is for porting, bug-filing or benchmarking. Of course if somebody gets libav to work on x32 I’ll be the first person to set up a FATE instance for it, and if Gentoo decides to make it a first-class citizen I’ll set up a tinderbox instance for it, but … I sure hope I won’t have to spend more time on it.

What I think I’ll spend some time on in the next few days, that I started thinking about after all the comments, is some posts describing things such as what an ABI actually is in this content, and how to see whether your structures are simply inadequate for what you’re trying to do. It might get interesting.

And to finish this off, I know I use “Now,” to start paragraphs way too often — I guess this is the reason why O’Reilly wouldn’t consider me as an author.

Debunking x32 myths

There has been many comments on my previous post about the new x32 ABI; some are interesting, others are more “out there” — the feeling I get is that there is quite a bit of cargo culting, with people thinking “there has to be a reason why is is developed, so it’ll be good for me!” without actually having the technical background to judge the usefulness of all this.

So in the same spirit with which I commented on ccache almost exactly four years ago (wow, I have been keeping a blog for a very long time, haven’t I?), I’ll try to debunk a few of the myths and misconception around this new ABI.

The new x32 ABI has proven to be faster. Not really; what we have right now are a few benchmarks, published by those who actually created the ABI, Of course you’d expect that those who spent time to set it up found it interesting and actually faster, but I honestly have doubts about the results, for reasons that will be clearer by reading the next few entries.

It’s also interesting to note that while the overall benchmarks seem to be positive, the numbers are quite close in general.. and even Intel’s presentation only gives you actual “big” numbers only when comparing with the original x86 ABI — which nobody is saying is better than what x86-64 is!

The data is also coming from a synthetic test, not from an actual overall system usage, and if you have any clue about benchmarks you know that the numbers can easily lie out of their teeth!

The new ABI generates smaller code, which means more instruction will fit in cache, and you’ll have smaller files as well. This is absolutely false. Not only the code generated is generally the same as x86-64 (you’re not changing the instruction set at all, you’re just changing the so-called “data model”, which means you change the size of long (and related types) and of the pointers (and thus the address space).

From one side it is theoretically correct that you’re going to have smaller data structures, which means you can make better use of the data cache (not of the instruction cache, be sure!) — but is this the correct approach? In my informed opinion, it should be a better idea to look into actually writing code that considers the cachelines, if your code is cache-hungry! You can use dev-util/dwarves which is a set of utilities by Arnaldo (acme) — pahole will tell you how your data structures will be split in memory.

Also remember that for compatibility the syscalls are kept the same with x86-64, which means that all the kernel code executed, and all the data structures that are shared with the kernel are the same as x86-64 (which means that a number of data structures won’t even change their size with the new ABI).

Actually, referring again to the same slides you can see on on slide 24 that the x32 code can be longer than x86’s original code — it would have been nice if they included the same code in x86-64, especially since I don’t speak VCISC, but I think it’s just the same code.

It might be of interest to compare the size of the libc.so.6 file itself; this is the output of rbelf-size from my Ruby Elf suite:

        exec         data       rodata        relro          bss     overhead    allocated   filename
     1239436         7456       341974        13056        17784        94924      1714630   /lib/libc.so.6
     1259721         4560       316187         6896        12884        87782      1688030   x32/libc.so.6

The executable code is actually bigger in the x32 variant — the big change is of course in the data sections (data, rodata, relro and bss) as the pointers have been halved — I honestly wonder how’s it possible for the C library to have so many pointers in its own structures, but it’s a question beside the point. Even if these numbers are halved, the difference is not that big, in total you have something along the lines of 30KB less data allocated, which is unlikely to even change the memory map.

The data size reduction is useful. Okay this seems to be a common issue. Sure it is the case that the data structures are smaller with x32, that’s its design after all. The main question would probably be “is this significant?” — I don’t think it is. Even in the example above with the C library, the difference while still “big enough”, is just under 20% of the allocated space … of the C library! A library that is supposed to implement the very minimal interface.

Now if you add up all the possible libraries, you probably can shave off a few megabytes of data of course but … you’ll have to add in all the porting issues that I’m going to discuss soon. Yes it is true that C++ and most VM languages will have less pressure, especially when copying objects, thanks to the reduced pointers’ size, but this is still quite a stretch. Especially since for the most part you’ll have to keep data buffers aligned to at least 8 bytes (64-bit) to make use of the new instructions — you already to align them to 16 bytes (128-bit) to make use of some SIMD sets.

And for those who think that x32 is reducing the size of files on disk — remember that as it is you can’t run a pure-x32 install; what you get is usually going to be a mix of three ABIs: x86, amd64 and x32!

But there is no reason for $application to deal with more than 4GiB memory. Yes of course that might be true, but really, do you care about the pointer size? If you really want to make sure that the application doesn’t use more than a given amount of memory, use system limits! They are definitely less intrusive than building a new ABI altogether.

Interestingly there are two way different, contrasting, applications of a full 64-bit address space on systems with less than 4GiB of RAM: ASLR (Address Space Layout Randomization — which can really load the various objects an application require at widely different addresses), and Prelink (which can then make sure that every unique object on the system is always loaded at the same address, yes that’s really the opposite of what ASLR does!).

Applications use long but they don’t need the full 64-bit space. And of course the solution is to create a new ABI for it, according to some people.

I’m not going to say that there are many applications that still use long without a clue on why they do that; they probably have some very little range of values they want to use and yet they use “big values” such as long, as they probably learnt programming on systems that use it as a synonym for int — or even better they learnt programming on systems where long is 32-bit but int is 16-bit (hello MS-DOS!).

The solution to this is simply to use the standard integers provided by stdint.h such as uint32_t and int16_t — so that you always use the data size you’re expecting and needing! This also has the side-effect of working on many more systems than you expect, and works with FFI and other techniques.

Hand-coded assembly is rare. This is one thing a few people told me after my previous post as I complained about the fact that with the new ABI as it is we’re losing most of the hand-coded assembly. This might strictly be true, but it might be less rare than you think. Even excluding all the multimedia software, crypto software usually makes good use of SIMD as well, and that’s done through hand-coded assembly, not through the compiler’s intrinsics.

There is also another issue with hand-coded assembly in software such as Ruby — while Ruby 1.9 fails to build on x32, it gets much more interesting on Ruby 1.8 because while it builds just file, it_segfaults at runtime_. Reminds you of something?

Furthermore, it’s the C library itself that comes with most of the handcoded assembly — the only reason why you don’t feel the porting pressure is simply that H.J. Lu that takes care of most of those is one of the authors of the new ABI, which means the code is already ported there.

x32 is going to be compatible with x86, if not now in the future. Okay this I didn’t have a comment about before, but it’s one misconception I’ve noticed before being thrown around. Luckily, the presentation comes to help, slide 22 makes it very clear that the ABI are not compatible. Among other things you have to consider that the x32 ABI at least corrects some of the actual mistakes in x86, including the use of 32-bit data types for off_t and similar. Again, something I talked about two years ago.

This is the future of 64-bit processors. No; again refer to the slides in particular slide 10. This has been explicitly designed for closed systems rather than as a replacement for x86-64! How does that feel now?

The porting effort is going to be trivial, you just have to change the few lines of assembler and change the size of pointer arithmetic. This is not the case. The porting requires a number of other issues to be tackled, and handcrafted assembly is just the tip of the iceberg. Breaking the assumption that x86-64 has 64-bit pointers is, by itself, quite a big deal, but not as big as one might assume at first (it’s the same way on Windows), what I think is going to be a big issue is going to be the implementation of FFI style C bindings — remember I said it wasn’t an easy answer?

CPUs perform better on 32-bit operands than 64-bit. Interestingly, the only CPU that Intel admits do perform better on 32-bit on the presentation I already linked a few times, is the Atom — the quote is actually “64bit imul latency is twice of 32bit imul on Atom”.

Now, what the heck is imul? That’s a signed multiply operation. Do you multiply pointers? It doesn’t make sense. Besides, pointers are not signed. Are you telling me that your most concern is for a platform (Atom) that has extra latency on an operation when people use 64-bit data types and they should instead use 32-bit? And your solution for that concerns is to create a new ABI where it’s harder to use 64-bit data types instead of going to fix whatever program is causing the problem?

I guess I should end it here, because this last note about the Atom and imul is probably going to make the day of most people who have half a clue.

Is x32 for me? Short answer: no.

For the long answer keep reading.

So thanks to Anthony and the PaX team yesterday I was able to set up the first x32 testing system — of which I lamented last week I was unable to get a hold of. The reason why it wasn’t working with LXC was actually quite simple: the RANDMMAP feature of the hardened kernel, responsible for the stronger ASLR was changing the load address of x32 binaries to be outside of the 32-bit range of the x32 pointers. Version 3.4.3 solves the issue finally and so I could set it up properly.

The first was the hard step: trying out libav. This is definitely an interesting and valuable test, simply because libav has so many hand-crafted assembly routines that it really shouldn’t suffer much from the otherwise slower amd64 ABI, and at the same time, Måns was sure that it wouldn’t have worked out of the box — which is indeed the case. From one side YASM still doesn’t support the new ABI which means that everything relying on it both in libav, x264 and other projects won’t work; from the other side the inline asm (which is handled by GCC) is now a completely different “third way” from the usual 32-bit x86 and the 64-bit amd64.

While I was considering setting up a tbx32 instance to see what would actually work, the answer right now is “way too little”; heck even Ruby 1.9 doesn’t work right now because it’s using some inline asm that is no longer working.

More interesting is that the usual way to discern which architecture one’s in are going to fail, badly:

  • sizeof(long) and sizeof(void*) are both 4 (which means that both types are 32-bit), like on x86;
  • __x86_64__ is defined just like on amd64 and there is no define specific for x32; the best you can do is to check for __x86_64__ and __SIZEOF_LONG__ == 4 __ILP32__ at the same time — edit: yes I knew that there had to be one define more specific than that, I just didn’t bother to look it up before; the point of needing two checks still stands, mmkay?

What does this mean? It simply means that considering it took us years to have a working amd64 system, which was in many ways a “pure” new architecture, which could be easily discerned from the older x86, we’re going to spend some more years trying to get x32 working … and all for what? To have a smaller address range and thus smaller pointers, to save on memory usage and memory bandwidth … by the time x32 will be ready, I’d be ready to bet that neither concerns will be that important — heck I don’t have a single computer with less than 8GB of RAM, right now!

It might be more interesting to know why is x32 so important for enough people to work on it; to me, it seems like the main reason is that it saves a lot of memory on C++ programs, simply because every class and every object has so many pointers (functions, virtual functions and so on so forth), that the change from 32 to 64 bit has been a hit big enough. Given that there is so much software still written in C++ (I’m unconvinced as to why to be honest), it’s likely that there is enough interest in working on this to improve the performance.

But at the end of the day, I’m really concerned that this might not be worth the effort: we’re calling unto us another big problem with this kind of multilib (considering we never really solved multilib for the “classic” amd64, and that Debian is still working hard to get their multiarch method out of the door), plus a number of software porting problems that will keep a lot of people busy for the next years. The efforts would probably be better directed at improving the current software and moving everything to pure 64-bit.

On a final note, of course libav works if you disable all the hand-written assembly code, as everything is also available in pure C. But many routines can be as much as 10 or more times slower in pure C than using AVX, for instance, which means that even if you have a noticeable improvement going from amd64 to x32, you’re going to lose more by losing the assembly.

My opinion? It’s not worth it.

The virt-manager pain

So I’m still trying to come up with a decent way to parse those logs. Actually I lost a whole batch of them because I forgot to rename them before processing, but that made me realize that the best thing I can do is process the logs outside of the tinderbox itself. But this is a topic for another time.

Another thing I’m trying to set up is a virtual machine to test the new x32 ABI that Mike has made available in Gentoo. This is important for the tinderbox as well as for FATE (which is already being run for a standard Gentoo Hardened setup on that very same hardware). Unfortunately I can’t use this via LXC, yet — simply because the kernel currently running does not support the x32 executables yet.

This means that I have to use KVM and go full-virtualisation. Thanks to Tim, I’ve got a modified SysRescueCD ISO that should let me take care of the install. Unfortunately, this is not easy to deal with for a number of reasons, still.

The first is that virt-manager is just slow, and painful, as some kind of slow and painful death. The whole idea of using a frontend that connects through SSH is that you don’t want to “feel” the lag… but virt-manager makes you feel the lag even more than a command-line SSH connection. I’m under the impression that the guys who work on that kind of stuff only ever tried this on a local connection, and never from the other side of the world. I mean, I understand you might have concurrency issues, but do you really have to make me wait for two minutes to switch from CPU settings to Memory settings to Disk setting when editing the VM?

The second issue is that even though I was able to set up a testing VM for x32… qemu doesn’t like the additional instruction sets (SIMD) that Bulldozer comes with; something within the C library causes every x32 binary to be killed with SIGILL (Illegal Instruction). The problem is likely in some of the indirect-binding functions that are being used — my guess based on the NEWS file of the 2.15 release is strcasecmp() which has been optimised through the use of SSE4.2 and AVX (both of which are available on that server) — I have a 34 written, half drawn post about this kind of optimisations in my queue, I’ll see if I can post it over the weekend.

The end result is that I spent the most part of three hours on virt-manager before accepting that the way to go is to update the host’s kernel and just run the usual container. Just so you know, the final step that “creates” the VM (which is not the LVM allocation step!) took me over half an hour. This is silly, what the heck was it doing during that time?

Oh and yes, two years afterwards virt-manager still keeps defunct ssh processes around because it never reaps them (check the comments).

Right now I’m trying to get this to work with LXC, but I’m not having much luck with it either; and yes I did update the init script to handle correctly the x32 containers, that didn’t work correctly before… it might have some problems if you’re going to use this on SPARC, because I’m not handling those properly yet, but this is (again) a topic for another time.