For the long answer keep reading.
So thanks to Anthony and the PaX team yesterday I was able to set up the first x32 testing system — of which I lamented last week I was unable to get a hold of. The reason why it wasn’t working with LXC was actually quite simple: the RANDMMAP feature of the hardened kernel, responsible for the stronger ASLR was changing the load address of x32 binaries to be outside of the 32-bit range of the x32 pointers. Version 3.4.3 solves the issue finally and so I could set it up properly.
The first was the hard step: trying out libav. This is definitely an interesting and valuable test, simply because libav has so many hand-crafted assembly routines that it really shouldn’t suffer much from the otherwise slower amd64 ABI, and at the same time, Måns was sure that it wouldn’t have worked out of the box — which is indeed the case. From one side YASM still doesn’t support the new ABI which means that everything relying on it both in libav, x264 and other projects won’t work; from the other side the inline asm (which is handled by GCC) is now a completely different “third way” from the usual 32-bit x86 and the 64-bit amd64.
While I was considering setting up a tbx32 instance to see what would actually work, the answer right now is “way too little”; heck even Ruby 1.9 doesn’t work right now because it’s using some inline asm that is no longer working.
More interesting is that the usual way to discern which architecture one’s in are going to fail, badly:
sizeof(long)
andsizeof(void*)
are both 4 (which means that both types are 32-bit), like on x86;__x86_64__
is defined just like on amd64 and there is no define specific for x32; the best you can do is to check for__x86_64__
and__SIZEOF_LONG__ == 4
__ILP32__
at the same time — edit: yes I knew that there had to be one define more specific than that, I just didn’t bother to look it up before; the point of needing two checks still stands, mmkay?
What does this mean? It simply means that considering it took us years to have a working amd64 system, which was in many ways a “pure” new architecture, which could be easily discerned from the older x86, we’re going to spend some more years trying to get x32 working … and all for what? To have a smaller address range and thus smaller pointers, to save on memory usage and memory bandwidth … by the time x32 will be ready, I’d be ready to bet that neither concerns will be that important — heck I don’t have a single computer with less than 8GB of RAM, right now!
It might be more interesting to know why is x32 so important for enough people to work on it; to me, it seems like the main reason is that it saves a lot of memory on C++ programs, simply because every class and every object has so many pointers (functions, virtual functions and so on so forth), that the change from 32 to 64 bit has been a hit big enough. Given that there is so much software still written in C++ (I’m unconvinced as to why to be honest), it’s likely that there is enough interest in working on this to improve the performance.
But at the end of the day, I’m really concerned that this might not be worth the effort: we’re calling unto us another big problem with this kind of multilib (considering we never really solved multilib for the “classic” amd64, and that Debian is still working hard to get their multiarch method out of the door), plus a number of software porting problems that will keep a lot of people busy for the next years. The efforts would probably be better directed at improving the current software and moving everything to pure 64-bit.
On a final note, of course libav works if you disable all the hand-written assembly code, as everything is also available in pure C. But many routines can be as much as 10 or more times slower in pure C than using AVX, for instance, which means that even if you have a noticeable improvement going from amd64 to x32, you’re going to lose more by losing the assembly.
My opinion? It’s not worth it.
As always a fascinating read – I’m also amazed by the push for the x32 ABI – it would have maybe been a good idea when Athlon64’s were first launched but it makes no sense doing this now 9 years on
Oh and are you going to Prague this year?
If 64 bit pointers are a problem for your app, you could always use pointer compression. I know Java VMs are doing that on 64bit systems. Still, given features like transparent huge pages, KSM, cleancache, swapless systems and gigs of RAM, it’s hardly worth the hassle unless you can prove that this solves a real problem.
Under __x86_64__ you should be able to test for __LP64__ for the real 64 Bit mode
Sounds like it’s a day late and a dollar short. The hype machine is probably VPS providers that are over-provisioning.
what it’s not increasing fast is cache L{1,2,3} a lot^W more code could fit into, other than most points apply.
x32 may improve performance because it decreases cache footprint of data structures.Pointer compression is not supported by C and C++ compilers. You have to roll your own in your data structure libraries.Software is written in C++ because C++ can lead to clearer code than C, by using templates instead of macros and by using destructors for RAII. C programmers who can predict well the assembly generated from their source code can write the same in C++ in a more maintainable way, without any overhead. So things like libav are exactly the stuff that can benefit from C++. I think such projects choose C because there are/were more C compilers available (especially for less common platforms) and to avoid Java-type programmers who think they can code in C++ (what Linus Torvalds once wrote is the biggest benefit of not using C++).
Francesco you’re thinking data, not code: * the instruction set is going to be the same, so the size of the instructions themselves is the same; * even if the immediate arguments are smaller, this accounts for very little margins; * the kernel is still 64-bit, so the final addresses in hardware memory will still be 64-bit, for what concerns jumps and similar;Sure data cache might improve slightly — but if you have a decent design you won’t gain as much, and if you lose the ability to use assembler you lose _much more_.ZT, libav would benefit from C++? Maybe if C++ was designed decently, but with current C++, libav only has to lose with using it. C can sure be messy, but is not _that_ messy, and if you think that templates can bring more order than a well written C project …. you probably have never seen the kind of projects I worked on before.
Flameeyes, I’m curious. Which C programs do you consider exceptionally well written? I’m looking for a bit of inspiration for my own C projects.
There’s actually a very good use case for x32, it’s people that actually use their machines to the max. One example is cloud offerings, where it’s all about how much (requests, users, …) you can cram into a single server, where x32 will certainly make a difference. Imagine Facebook, why would they need PHP to run as a 64bit program? What would they lose if they’d run it under x32 ABI? What would they gain? Performance per server.So it might not matter for desktop users, it does matter if you’re actually running a server and can’t afford (or actually, don’t need) a bigger and better server, or if you’re running enough servers so that this can save you from buying yet another one.
I remember seeing benchmarks that show that x32 has substantial improvements already, and considering how it was just created recently, there are going to be even more…
People will continue to use VPSs with the less than 4 GiB of memory for quite some time. I have a VPS and I decided to go with x86 and not x86_64 because of the limited memory (512 MiB). On the other hand, I run a RDBMS there which does a a lot of 64 bit arithmetic but it cannot use the native capability of the host CPU to do 64 bit arithmetic because I’m running a x86 kernel.
the LP64 define is gcc specific and nonstandard. x32 using __x86_64__ is insane and broken because it breaks existing code, and adding the ifdef for LP64 will break compilation on anything but gcc.
In fact sizeof(size_t) is the common way to check the word size on a platform. It works on Windows 32- and 64-bit with cl.exe or gcc (your solutions do not, except for sizeof(void*)), on Solaris with SunCC or gcc, on AIX with XL C/C++ or gcc again, and others, and, of course, Linux with gcc, clang or even path64.C/C++ compilers not providing a size_t typedef are all long dead, and, more important, using sizeof(size_t) is here _semantically_ correct, because of how the standards (C and C++) define size_t.And I think x86 is still important… Simply because we are quite many to continue to use very old machines. I was one of the first to have an amd64 working (2004), even long before having multi-core. I will be one of the last to continue to use x86. Or at least people around me. As long as it works and can be useful, I keep it, it’s as simple as that. Because of that I have a small X-Window client, a radioamateur server and a small business database server under my responsibility that are x86 Gentoo. And of course I shall not count virtual machines… My opinion is… well, x86 should not be maintained as amd64 is, now. But totally abandonning it so soon is too soon. You should not feel obliged to maintain it yourself though… If I had the time, I should do it myself ! :-)I have to add that volume taken because of pointer size differences in memory is much less than for example the difference between CISC and RISC machine code volumes for the same programs. And both are much less than the data volume of any common program. Cases where pointer size is not negligible are not that common. Structure alignment (often 16 bytes these days, totally useless on 32-bit machines) is much heavier. And is often not (specified to be) dependent on the platform.What is really a problem for many programs is the fact that C and C++ programmers do not know how to code formally and semantically correct code that can by design work on any platform (let’s say : 16- to 256-bit and 1 to 512 NUMA-CPUs). C/C++ are just as portable as Java, and probably even more if you know how to use them (you wrote about it a few years ago, Diego, IIRC). But who knows ? :-(Alas, unsupporting x86 is more masking a problem than solving it. Is it only solvable ?
Reading myself again, I see I totally missed my own point…I think x32 is a terrible thing for x86. I think its performance advantages are few, its complexity and impacts are high, and its lifetime will be very small especially considering its marginal utility (even a 5-6 years looks to me very optimistic). Even VPSs (the only serious usage for the ABI) will not have their memory limited to less than 4GB for decades. And if you really need performance, you have a physical machine, not a VPS, so you have RAM, so you have… oh yeah, you have x86_64 ! Even a 10% perf gain does not justify that for most people.My point was… all bad coded programs will not run naturally on x32, especially desktop ones and, worse, they may be sabotaged on x86 to run on x32 (ugly hacks are often simpler to code and to justify than Correct Solutions, but often have side effects). And I have to repeat that x86 is still really important and should be the most stable Linux platform. And that bad coded programs are often useful. Supporting “real” if not much used platforms like IA-64, SPARC or even HP-PA brings quality to code, often through semantic coherence. Supporting bizarre ABI on common hardware may just bring many bizarre bugs from too many semi-professional users, for a minimal gain.Better not changing what works today for marginal benefit. Strictly limiting it to server applications will of course not work. Who has a server app wants graphical server management tools with it, “et là c’est l’escalade !”.Because bad code will not disappear suddenly,”x32 delenda est.”
how about an atom netbook with 1G of ram…for the moment, mine is running amd64 but I’d like to test x32 if it could lead to a longer running time on battery.Alain
I’m making good progress porting what’s necessary to bring up GNOME on x32 with Gentoo. I’ve just ported spidermonkey (17-rc1), so there’s now JIT enabled javascript. Despite opinions expressed above, I feel this is a very worthwhile effort, I can *see* the performance difference while emerging! (I’m using a custom multilib-portage profile)