What could have been. A time travel story of x32 and FatELF

I was toying around with the idea to write about this for a week or two by now, and I discussed it with Luca as well, but my main issue has been coming up with a title this time.. the original title I had in mind was a Marvel-style What if… “… x32 and FatELF arrived nine years ago”. But then Randall beat me to it with his hypothetical question answering site.

So let’s put some background in context; I criticised in the past Ryan’s FatELF idea and recently Intel’s x32 — but would I have criticised them the same way if the two of them came up almost together nine years ago? I don’t think so; and this is the main issue with this kind of ideas: you need the perfect timing or they are not worth much. Both of them came out at the wrong time in my opinion.

So here’s my scenario that didn’t happen and can’t happen now.

It’s 2003, AMD just launched their Opteron CPU which is the first sporting the new x86-64 ISA. Intel, after admitting to the failure of their Itanic Itanium project, releases IA-32e following suit. At the same time, though, they decide that AMD’s route of pure 64-bit architecture for desktop and servers is going to take too long to be production-ready, especially on Linux, as even just discussing multlib support is making LSB show its frailty.

They decide to thus introduce a new ABI, called x32 (in contrast with x64 used by Sun and Microsoft to refer to AMD’s original ABI). At the same time they decide to hire Ryan Gordon, to push forward a sketchy idea he proposed, about supporting Apple-style fat binaries in Linux — the idea was originally ignored because nobody expected any use out of a technique used in 1994 to move from M68k to PowerPC and then left to die with no further use.

The combined energy of Intel and Ryan came up with a plan on how to introduce the new ABI, and thus the new architecture, in an almost painless way. A new C library version is to be introduced in Linux, partially breaking compatibility with the current libc.so.6, but this is for the greatest good.

The new version of the C library, glibc 3.0, will bring a libc.so.7 for the two architectures, introducing a few incompatibility in the declarations. First of all, the largefile optional support is being dropped: both ABIs will use only 64-bit off_t, and to avoid the same kind of apocalyptic feeling of Y2K, they also decided to use 64-bit clock_t types.

These changes make a very slight dent into the usual x86 performance, but this is not really visible in the new x32 ABI. Most importantly, Intel wanted to avoid the huge work required to port to either IA-64 or AMD64, by creating an ILP32 ABI — where int, long and void * are all 32-bit. And here’s where Ryan’s idea comes to fruition.

Source code written in C will compile identically between x86 and x32, and thanks to the changes, aligning the size of some primitive standard types, even more complex data types will be identical between the two. The new FatELF extended format introduced by glibc 3.0 leverages this — original x86 code will be emitted in the .text section of the ELF, while the new code will live in .text32 — all the data, string and symbol tables are kept in single copy only. The dynamic loader can then map one or the other section depending on whether the CPU supports 64-bit instructions and all the dependencies are available on the new ABI.

Intel seems to have turned the tables under AMD’s nose with their idea, thanks to the vastly negative experience with the Itanium: the required changes to the compiler and loader are really minimal, and most of the softare will just build on the new ABI without any extra change, thanks to maintaining most of the data sizes of the currently most widespread architectures (the only changes are off_t behaving like largefile is enabled by default, and then clock_t that got extended). Of course this still requires vast porting of assembly-ridden software such as libav and most interpreter-based software, but all of this can easily happen over time thanks to Ryan’s FatELF design.

Dissolve effect running

Yes, too bad that Intel took their dear time to enter the x86-64 market, and even longer to come up with x32, to the point where now most of the software is already ported, and supporting x32 means doing most of the work again. Plus since they don’t plan on making a new version of the C library available on x86 with the same data sizes as x32, the idea of actually sharing the ELF data and overhead is out of question (the symbol table as well, since x86 still have the open/open64 split which in my fantasy is actually gone!) — and Ryan’s own implementation of FatELF was a bit of an over-achiever, as it doesn’t actually share anything between one architecture and the other.

So unfortunately this is not something viable to implement now (it’s way too late), and it’s not something that was implemented then — and the result is a very messed up situation.

There’s ABI and ABI

With all this talk about x32 there are people who might not know what we’re referring to when we talk about ABI. Indeed this term, much alike its sibling API, is so overloaded with multiple meanings that the only way to know what one is referring to is understanding in which of the many contexts it’s being used.

It’s not that I haven’t talked about ABI before but I think it’s the first time I talk about it in this context.

Let’s start from the meaning of the two acronyms:

  • API stands for Application Programming Interface;
  • ABI stands for Application Binary Interface.

The whole idea is that the API is what the humans are concerned with, and ABI what the computers are concerned with. But I have to repeat that what these two mean depends vastly on the context you refer to them.

For instance what I usually talk about is the ABI of a shared object which is a very limited subset of what we talk about in the context of x32. In that context, the term ABI refers to the “compiled API”, which often is mistaken for the object’s symbol table although it includes more details such as the ordered content of the transparent structures, the order and size of the parameters in functions’ signatures and the meaning of said parameters and the return value (that’s why recently we had trouble due to libnetlink changing the return values, which caused NetworkManager to fail).

When we call x32 and amd64 ABIs of the x86-64 architecture, instead, we refer to the interface between a few more components… while I don’t know of a sure, all-covering phrase, the interfaces involved in this kind of ABI are those between kernel and userspace (the syscalls), the actual ELF variant used (in this case it’s a 32-bit class, x86-64 arch ELF file), the size of primitive types as declared by the compiler (long, void*, int, …), the size of typedefs from the standard library, the ordered content of the standard transparent structures, and probably most importantly the calling convention for functions. Okay there are a few more things in the mix such as the symbol resolution and details like those, but the main points are here, I think.

Now in all the things I noted above there is one thing that can be extracted without having to change the whole architecture ABI — it’s the C library ABI: the symbol table, typedefs, ordered content of transparent structures, and so on. That is the limited concept of shared object ABI applied to the C library object itself. This kind of change still require a lot of work, among others because of the way glibc works, which will likely require replacing a number of libraries, modules, the loader and more and more.

Why do I single out this kind of change? Well, while this would have also caused trouble with binaries the same way the introduction of a new architecture did, there is an interesting “What if” scenario: What if Ryan’s FatELF and Intel’s x32 ABI happened nine years ago, and people would have been keen on breaking the C library ABI for the good old x86 at the time?

In such a condition, with the two ABIs being both ILP32 style (which means that int, long and void* are 32-bit), if the rest of the C library ABI was the same between the two, a modified version of Ryan’s FatELF approach – one where the data sections are shared – could have been quite successful!

But let it be clear, this is not going to happen as it is now. Changing the C library ABI for x86 at this point is a script worth of Monty Python and the new x32 ABI corrects some of the obvious problems present in x86 itself — namely the use of 32-bit off_t (which restricts the size of files) and time_t (which could cause the Y2K38 bug), with the two of them having widely incompatible data structures.

Last few notes about x32

So my previous posts were picked up by none others than LWN.net — it was quite impressive to see the tweet of them picking up my blog post, it’s the first time, although I did author articles for them before.

Now in the comments of the articles and LWN’s own signalling of it, you can find a lot of discussion about the merits of x32, and a little of it tries to paint me as uninformed. I would like to just say a few words about that right now so that I don’t have to go through this later on. I’ve been toying around ELF, x86-64, PIC and structure optimisation for a very long time. I’ll come back in a moment on why I didn’t do a more thorough analysis and my own benchmarks of the architecture, but if you really think I’m just an amateur because I work on Gentoo Linux and not Fedora or Ubuntu, please think again. I might not be one of the “greats”, but I don’t think I’d be boasting if I say that I know what I’m doing — most of the times at least.

So why did I not go into doing my own benchmark to show the numbers of (non-)improvement on x32? Because for me it would be time wasted. I’m not Phoronix, I don’t benchmark stuff for a living, and I’m neither proposing the ABI or going to do work on it myself. I looked into the new ABI because from one side, it’s always cool to learn about new techniques and technology, even when they sound a little over the top (I did look a lot into FatELF as well and I was very negative about it — I hope Ryan doesn’t hold a grudge against me, I was quite unlikeable from his point of view I’m sure), and from the other because my colleague Luca suggested it could be useful to get some more performance out of a device we’re working on.

Now, said device is embedded, runs Gentoo Linux, and needs libav and x264 – I’m not going to give you any more specifics about it – which is why my first test on the new ABI has been testing libav (and finding it requiring way too much work than it would make sense for us). Looking into it has also told me that some of the assumption I made about how the new ABI would have been designed, for instance the fact that long was still 32-bit surprised me.

I’ve been told my arguments are “strawmen” because I singled out some specific topic instead of doing a top-down analysis — as the title of my post, and the reference to my old ccache article should have suggested, I was looking into some of the things I’ve been discussing, or have been told. The only exception to that has been my answer to “x32 is going to be compatible with x86, if not now in the future.” — I have talked with nobody about this but I’ve seen this kind of misconception floating around, especially at the time of the FatELF proposal, about a 64-bit ABI which would be binary compatible with good old 32-bit x86.

The purported reason for having such an ABI would be being able to load 32-bit closed-source libraries into the address space of 64-bit programs or vice-versa. The idea is that this way the copy of Skype I’m running wouldn’t be loading into my memory a copy of the 32-bit libc.so.6 library, which is used by no other process.

If it feels like my posts have been aimed squarely at the Gentoo folks, it might very well be right, although that was not the intention. Most people who look into new ABIs as they come out are probably on the same page as most Gentoo users with their bleeding edge feeling — if you have only production Fedora installs, you really won’t give much about an ABI fedora is not released for yet! And given Mike made us the first distribution releasing something for the ABI, it feels right to discuss Gentoo issues first.

Now I also been told that I didn’t talk enough about the reduction in size of data structures, which improves the use of the data cache (not the instruction cache as Francesco said in the comments of the first article), and for that people got the impression I don’t know how much of a difference that makes … that would be wrong given that I’ve actually discussed methods to minimize data usage and have spent time writing a tool to reduce copy-on-write even when that means making changes for ludicrously small improvements.

I have also been working closely with codiff and pahole from Arnaldo’s dwarves package to make sure that the software I manage has properly-designed structures, not only reducing the size of the single object, but making sure that attributes that are to be used together are grouped nearby — this is pretty important for data cache handling, and might goes against what most people are told in school, here at least, that attributes in classes have to be ordered semantically, not by use.

On a different note it would be nice if it was possible to tell the compiler that a given structure never leaves the object, and thus it can reorder it as needed to get the best performance — but that would also require that each unit reorders it properly. Nevermind.

There are some interesting things to be considered as well — if you need fast access to objects in an array, you might be interested in using a little more memory and make sure the object’s size is a power of two, so that instead of using expensive multiplications you can use left shifts to calculate the offset from base pointer of a given index.

I know that reducing the size of pointers and long will reduce the pressure on the data cache, which in turn means you can have faster pointer chasing and better access to thinks like linked lists and so on — on the other hand I don’t think that this improvement is worth all the compatibility and porting headaches that a new ABI involves, especially considering that, as we move along, more and more software will make a better use of the 64-bit address space, as developers start to understand they have to drop the old design and paradigm of scores of years ago and replace it with modern design; Paul-Henning Kamp of FreeBSD and Varnish fame said it very well in the linked ACM article.

So to sum it up: I still don’t think x32 is worth my time, whether it is for porting, bug-filing or benchmarking. Of course if somebody gets libav to work on x32 I’ll be the first person to set up a FATE instance for it, and if Gentoo decides to make it a first-class citizen I’ll set up a tinderbox instance for it, but … I sure hope I won’t have to spend more time on it.

What I think I’ll spend some time on in the next few days, that I started thinking about after all the comments, is some posts describing things such as what an ABI actually is in this content, and how to see whether your structures are simply inadequate for what you’re trying to do. It might get interesting.

And to finish this off, I know I use “Now,” to start paragraphs way too often — I guess this is the reason why O’Reilly wouldn’t consider me as an author.

Is x32 for me? Short answer: no.

For the long answer keep reading.

So thanks to Anthony and the PaX team yesterday I was able to set up the first x32 testing system — of which I lamented last week I was unable to get a hold of. The reason why it wasn’t working with LXC was actually quite simple: the RANDMMAP feature of the hardened kernel, responsible for the stronger ASLR was changing the load address of x32 binaries to be outside of the 32-bit range of the x32 pointers. Version 3.4.3 solves the issue finally and so I could set it up properly.

The first was the hard step: trying out libav. This is definitely an interesting and valuable test, simply because libav has so many hand-crafted assembly routines that it really shouldn’t suffer much from the otherwise slower amd64 ABI, and at the same time, Måns was sure that it wouldn’t have worked out of the box — which is indeed the case. From one side YASM still doesn’t support the new ABI which means that everything relying on it both in libav, x264 and other projects won’t work; from the other side the inline asm (which is handled by GCC) is now a completely different “third way” from the usual 32-bit x86 and the 64-bit amd64.

While I was considering setting up a tbx32 instance to see what would actually work, the answer right now is “way too little”; heck even Ruby 1.9 doesn’t work right now because it’s using some inline asm that is no longer working.

More interesting is that the usual way to discern which architecture one’s in are going to fail, badly:

  • sizeof(long) and sizeof(void*) are both 4 (which means that both types are 32-bit), like on x86;
  • __x86_64__ is defined just like on amd64 and there is no define specific for x32; the best you can do is to check for __x86_64__ and __SIZEOF_LONG__ == 4 __ILP32__ at the same time — edit: yes I knew that there had to be one define more specific than that, I just didn’t bother to look it up before; the point of needing two checks still stands, mmkay?

What does this mean? It simply means that considering it took us years to have a working amd64 system, which was in many ways a “pure” new architecture, which could be easily discerned from the older x86, we’re going to spend some more years trying to get x32 working … and all for what? To have a smaller address range and thus smaller pointers, to save on memory usage and memory bandwidth … by the time x32 will be ready, I’d be ready to bet that neither concerns will be that important — heck I don’t have a single computer with less than 8GB of RAM, right now!

It might be more interesting to know why is x32 so important for enough people to work on it; to me, it seems like the main reason is that it saves a lot of memory on C++ programs, simply because every class and every object has so many pointers (functions, virtual functions and so on so forth), that the change from 32 to 64 bit has been a hit big enough. Given that there is so much software still written in C++ (I’m unconvinced as to why to be honest), it’s likely that there is enough interest in working on this to improve the performance.

But at the end of the day, I’m really concerned that this might not be worth the effort: we’re calling unto us another big problem with this kind of multilib (considering we never really solved multilib for the “classic” amd64, and that Debian is still working hard to get their multiarch method out of the door), plus a number of software porting problems that will keep a lot of people busy for the next years. The efforts would probably be better directed at improving the current software and moving everything to pure 64-bit.

On a final note, of course libav works if you disable all the hand-written assembly code, as everything is also available in pure C. But many routines can be as much as 10 or more times slower in pure C than using AVX, for instance, which means that even if you have a noticeable improvement going from amd64 to x32, you’re going to lose more by losing the assembly.

My opinion? It’s not worth it.

A “new” Tinderbox

While I’m still hoping for somebody to fund the PAM audit and fixup (remember: if you rely on PAM on your Gentoo systems, you really want for somebody to do the work!), and even though I have to reduce the tinderbox costs , I got some pretty cool news for many of you out there.

Up to now, the tinderbox have been running over the unstable/testing visibility for x86. The new tinderbox, which I simply called tinderbox64, uses instead the unstable/testing for amd64.

The new tinderbox is now testing ~amd64 rather than ~x86.

Why did I decide to go this route? Well, while the 64-bit builds require more space and time, I thought a bit about it, and even the stuff I introduce does not get keyworded ~x86 right away; it’s ignoring tests on my own stuff! Beside, with even my router moving to 64-bit to give the best with hardened, I start to think x86 is not really relevant for anything, nowadays.

It’s not all there of course; there are a number of issues that only appear on 64-bit (well, there are almost as many that only appear on 32-bit but for now let me focus on those): integer and buffer overflows, implicit function declarations that truncate pointers to integer, 64-bit unsafety that makes packages fail to build… All these conditions are more relevant because 64-bit is what you should be using on most modern systems, so they should be the ones tested, even more than ~x86.

Now of course it would be better to have both tinderboxes running, and I think I could get the two of them to run in parallel, but then I’d need a new “frontend” system, one I could use both for storage and for virtual machine hosting; probably something a little more beefy that the laptop I’m using, mostly in term of RAM, would be quite nice (the i7 performs quite nicely, but 4GB of RAM is just too little to play with KVM). But even if I could afford to buy a new frontend now (I cannot), it would still be a higher cost on a monthly basis in power. Right now I can roughly estimate that between power, and the maintenance costs (harddisks, UPSes, network connection), running the tinderbox is costing me between €150 and €200/month, which is not something I can easily afford, especially considering that last year, net of taxes and most expenses, I had an income of €500/month to pay for groceries and food. Whoopsie. And this is obviously without including the time I’m spending for manually review the results, or fixing them.

Anyway, expect another flood of bugs once the tinderbox gets again up to speed; for now, it might find a few more problems that previously it ignored, since it started building from scratch. And while the 32-bit filesystem is frozen, I’ll probably find some time to run again the collision-detection script that is part of Ruby-ELF that is supposed to find possible collisions between libraries and similar, which is something that is particularly important to take into consideration as those bugs tend to be the most complex to debug.

Multi-architecture, theory versus practise

You probably remember the whole thing about FatELF and my assertion that FatELF does nothing to solve what the users supporting it want to see solved: multiple architecture support by vendors. Since I don’t want to be taken for one of those people who throw an assertion and pretend everybody falls in line with it, I’d like to explain somewhat further what the problem is in my opinion.

As I said before, even if FatELF could simplify deployment (at the expense of increasing exponentially the complexity of any other part of the operating system that deals with executables and libraries), it does nothing to solve a much more important problem, that has to be solved before you can even think of achieve multi-architecture support from vendors: development.

Now, in theory it’s pretty easy to write multi-architecture code: you make no use of any machine-dependent feature, no inline assembly, no function call outside the scope of a standard. But is it possible for sophisticated code to keep this way? It certainly often is not for open source software, even when it already supports multiple architecture and multiple software platforms as well. You can find that even OpenOffice require a not-so-trivial porting to support Linux/HPPA and that’s a piece of software that, while deriving from a proprietary suite (and having been handled by Sun which is quite well known for messy build systems), has been heavily hacked at by a large community of developers, and includes already stable support for 64-bit architectures.

Now try to be a bit imaginative, and find yourself working at a piece of proprietary code: you’ve already allocated money to support Linux, which is for many points of view, a fringe operating system. Sure it starts to increase in popularity, but then again a lot of those using it won’t run proprietary application nonetheless… or wouldn’t pay for them. (And let’s not even start with the argument but Chrome OS will bring a lot more users to Linux since that’s already been shown as a moot point). Most likely at this point you are looking at supporting a relatively small subset of Linux users; it’s not just a matter of difference between distributions, it’s just a way to cut down testing time; if it works on unsupported distributions is fine, but you won’t go out of your way for them; the common “enterprisey” distributions are fine for that.

Now, at the end of the nineties or at the beginning of the current decade, you wouldn’t have to think much in term of architectures either: using Linux on anything but x86 mostly required lots of effort (and lead to instability). In all cases you had to “eradicate” the official operating system of the platform, which meant Windows for x86, Solaris for SPARC and Mac OS for PPC; but while the former was quite obvious, the other still required more work since they were developed by the developer of the hardware in the first place.

Nowadays it is true that things changed, but how did they change exactly? The major change is definitely the introduction of the AMD64 (or x86-64 if you prefer) architecture: an extension to the “good” old x86 that supports 64-bit addresses. This alone created quite a few problems: from one side, since it allows for compatibility with the old x86 software, proprietary, commercial software didn’t flock to support it so fast: after all their software could still be used, even though it required further support from the distributions (multilib support that is), on the other side, multilib was before something that only a few niche architectures like mips looked out for, so support for it wasn’t as ready for most distributions.

And, to put the cherry on top, users started insisting for some software to be available natively for x86-64 systems, so that it would be more compatible, or at least shinier in their eyes; Java, Flash Player, and stuff like that had to be ported over. But here we’re reaching the point where theory (or, if you’re definitely cynical – like me – fantasy) clashes against practise: making Flash work on AMD64 systems didn’t just involve calling another compiler, as many people think, partly because the technologies weren’t all available to Adobe to rebuild, and partly because the code made assumption about the architecture it was working on.

Let be honest: it’s hypocrite to say that Free Software developers don’t make such assumption; it’s more like porters and distributions fixed their code long time ago; proprietary software does not have this kind of peer review, and they are, generally, not interested on it. It takes time, it takes effort and thus it takes money. And that money does not generally come out of architectures like Alpha, or MIPS. And I’m not calling out for the two of them without reason here: they are the two architectures that have actually allowed for some porting to be done for AMD64 before its time. The former was probably the previously most available 64-bit system Linux worked decently on (SPARC64 is a long story), and had code requirements very similar to x86-64 in term of both pointer size and PIC libraries. The latter had the first implementations of multilib around.

But again, handling endianness correctly (and did you know that MIPS, ARM and PowerPC exist in multiple endian variations?), making sure that the pointers are not assumed to be of any particular size, and never use ASM-only routines is simply not enough to ensure your software will work on any particular architecture. There are many problems, some of which are solvable by changing engineering procedures, and some others which are simply not solvable without spending extra time debugging that architecture.

For instance, if you’ve got an hand-optimised x86-only assembly routine, and a replacement C code for the other architectures, that code is unlikely to get tested as much as the x86 code, if your development focuses simply on x86. And I’m not kidding you when I say that it’s not such a rare thing to happen, also with Free Software projects. Bugs in that piece of code will be tricky to identify unless you add to your development process the test and the support for that particular architecture; which, trust me, is not simple.

Similarly, you can think of the strict aliasing problem: GCC 4.4 introduced further optimisations that can make use of strict aliasing assumption on x86 as well; before, this feature was mostly used by other architectures. Interestingly enough, the amount of strict-aliasing bug is definitely not trivial, and will cause some spurious bugs at runtime. Again, this is something that you can only fix by properly testing, and debugging, on different architectures. Even though some failures now happen on x86 too, this does not mean that the same problems happen, no more no less, on anything else. And you need to add your compiler’s bug to the mix, which is also not so simple.

And all of this is only covering the problems with the code itself, and comes nowhere near the problems of cross-compilation, it does not talk about the problems and bugs that can be in your dependencies’ code for the other architectures, or the availability of stable-interface distributions for those architectures (how many architectures is RHEL available for?).

After all this, do you still think that the only problem keeping vendors from supporting multiple architectures is the lack of a “Universal Binary” feature? Really? If so, I have some fresh air to sell you.

Coming soon, in a sync tree near you

I’ve been meaning to write about the 32-bit emulation libraries for a while, with all the problems they come with, included a lot of security problems. The chance come now since last night solar pushed (not yet to the tree though) the new test version of the new emul-linux ebuilds.

The first problem to know of is that at the moment the whole of the set of emulation libraries is a collection of security bugs; I cannot even start to count how many issues there are but for sure there are two for PAM, the two that caused the recent stabilisation of 1.0.4 version. So if you need a system where you count security first, you should not use the multilib profile for AMD64 and the emul-linux libraries. You can still use it without emul-linux for stuff like grub, but that should be it; anything bringing in emul-linux will bring in security-impaired software.

On the other hand, thanks to Daniel’s work on 32-bit userland, it’s likely that building these libraries will start being less of a problem. Some of the fixes are already in, others will go in in time; I fixed Perl and Qt4 last night, the next step is to add a sub-profile for building those libraries.

There is more work that needs to be done though, for instance PulseAudio needs to have a way to disable the whole daemon part and just build the library used by software, which is what you want on both the 32-bit compatibility library list and for daemon-less clients (for instance when you want to just use a remote host to send audio to). These will require some changes in PulseAudio itself and will have, thus, to wait for the new release and probably the one afterwards. I could start working on it already (I would probably be able to get it to work before end of the day) but since I cannot even use it yet it doesn’t make much sense. The problem is that the latest test versions fail nastily on my system, and I was unable to ask Lennart about them, since he’s currently at BOSSA.

Hopefully, one day, the emul-linux libraries wouldn’t be needed at all, and instead we’d be building the packages, in either 32-bit or 64-bit mode, directly as needed. The true multilib support is, currently, one of the most important lacking features of Gentoo; on the other hand, it seems like that what gets discussed for implementation is stuff that has relatively little importance to users, and a lot of importance on “who is right” about implementation details.

Frankly, I really wish we were finally discussing a way to express the difference between same-ABI and any-ABI dependencies between packages, it would be really quite useful especially if the concept of ABI was expanded further to encompass also implementations like Ruby 1.8, Ruby 1.9, JRuby and so on so forth, so that we all could decide whether to build dev-ruby/ruby2ruby for one, the other, or all three together with its dependencies; or whether to install PulseAudio as 64-bit only or both 64-bit and 32-bit.

Changes coming for FreeBSD 6.2

So, FreeBSD 6.2 is getting ready day after day, the BETA2 was released a couple of days ago, and the ebuilds are now in portage. I’m still updating Defiant, so they are not entirely ready to go just yet.

But let’s see what the changes are for Gentoo in this release. First of all, one I already talked about, and is the usage of the standard baselayout package rather than the freebsd-baselayout: Roy did an excellent work, and this means less work for me to make sure that we are updated in respect to baselayout :) unfortunately this means that to update to 6.2 you need to force overwriting of freebsd-baselayout with baselayout, and then remove freebsd-baselayout.

Then there’s Daijo that is becoming developer, and he’ll be working on supporting AMD64 platform in Gentoo/FreeBSD, and Roy instead started working on SPARC64 support (which is likely more stable than AMD64, considering its age :) ). I’ll be really happy to see that we have enough portability to actually get three different architectures working fine on Gentoo/FreeBSD.

But what I’m working right now is a backstage change, that will improve the integration between Gentoo/FreeBSD and the rest of Gentoo, and would improve our citizenship status: stage building with catalyst. Right now the stages I built were created with simply a ROOT variable change and an emerge script. The result is somehow usable, but it’s not exactly the cleanest thing out there. It’s also a redundant set of scripts, not counting that they are anything but bug free, and thus you see the various attempt on getting the stage right.

The problem with the latter change is that we don’t have a direct equivalent of mount –bind in FreeBSD, although the same result can be easily achieved by using unionfs. The problem is that the default implementation of unionfs in FreeBSD 6 seems to be pretty broken, at least standing by this page, and I cannot mount more than one level of unionfs, to the result that catalyst does not even initialise cleanly. I’ll be checking the patch in freebsd-sources soon after the beta2 update is done here, and I hope it will work. If it does not, it will be the start of some hacking sessions on Catalyst until I can make it work :D

Of course today I’m not limiting myself on Gentoo/FreeBSD: I’ve been called for my new job yesterday, today I’ll be confirming a meeting next week, so in the next days I’ll be trying to cleanup my stuff to leave a clean floor if someone else would be helping me when I’ll be employed during the working week :)
But feel free to sleep relaxed, I’m not going anywhere anytime soon (unless someone wants to get rid of me).

Maybe I’ll also try to write the famous entry about books I promised one month ago now….

Unplanned downtime

So, since last night my blog was mostly down, till this late afternoon.

For the ones missing it, it was due to the nano-hurricane we had here last night, for sure not to the level that Americans are used to, but more of a storm than we usually have here. At my house we didn’t get major damages but there is now a broken tent, and a lot of branches fell from the trees here around.

With all that happened (and it was bad, with that icy rain), the power company left us without power for 3 hours, which isn’t bad considering the fell trees and so on.. in a not-so-distant town they were without power for the whole night till morning.

But with the cooler night, after two weeks of hot, too hot nights (unfortunately, only on the literal temperature sense ;P), I literally fell asleep before the power was up again, and I slept so well till 8am. At that point, although I did take up the server again, I forgot to run ddclient till afternoon, because I was occupied helping a friend of mine setting up her new laptop (how many times did you see the pre-installed software containing a trojan horse?).

Anyway, farragut is now back up, hopefully for long still, as I’m going away next month as I told, and it would be bad if I have to fix it :/

And for a few unexpected circumstances, I’m afraid I won’t have a new job too soon, meaning I’m probably not going to upgrade the CPU here :(

And more on a Gentoo note, I’ve today removed my im.gentoo.org account from Kopete, as lately I was having lots of trouble, wasn’t it enough annoying to put the password in three times before getting a login. Now I”ll be using Google Talk server as my only Jabber acocunt.

My thoughts on stable markings

Sounds like today I have to do another post talking about facts currently in discussion, although I usually try to avoid this kind of posts, focusing more on technicalities, or discussions about Freedom, or in general talking of things being done rather than being discussed.

So, there’s a lot of fuzz about stable markings lately, especially since after x86 arch team was created also x86 stable markings take their time, while before they were done when the developers felt that way.

This of course slowed down the x86 stable tree, but at the same time now we have a stable tree that is almost really “stable”. Unfortunately, although it started more quickly, now the stable marking rate is slowing down, not only for x86 but also for amd64.

The reasons? Too many people not caring for the stable tree and deciding to use ~arch directly, developers who don’t have stable chroots, and so it’s a domino effect: the less the stable tree is used, the less the stable markings can be done.

Myself, I actually started having a stable chroot at the time, but then I had to drop it, for the simple reason that I didn’t have enough time to keep it updated as my whole system.

So, although I don’t pretend to know an answer to this, I have a suggestion: ATs, when you become devs (because most of you will become), don’t drop the stable systems, don’t drop stable chroots, but rather help stable marking stuff. For AMD64 in particular I see a really bad trend :“( the stable markings are almost getting an halt lately… while last year it was the first arch marking stable.

Myself, I’ve decided that if I’m able to upgrade the CPU of this box (to an Athlon 64 x2 4600+, that has the not-so-high price of €245, plus €89 for a 1GB ram stick), or rather if I’m able to get another job to pay for that, as I’m going to upgrade the CPU for sure if I get enough money, I’ll be maintaining an amd64 stable chroot once again, so that at least the software I maintain usually I’ll be able to test and mark stable when everyone else can’t.

Anyway, I hope this is the last post I do following a discussion in gentoo-dev, as it’s also boring for me, as well as for who don’t read gentoo-dev :P