Virtual machine, real problems

Since I bought Yamato I have been trying my best to make use of the AMD-V support in the Opterons, this included continuing the fight with VirtualBox to get it to work properly, with Solaris and Fedora, then trying RedHat’s virt-manager and now, after the failure the other day QEmu 0.10 (under Luca’s insistence).

The summary of my current opinions is something like the following:

  • VirtualBox is a nice concept but the limitation in the “Open Source Edition” are quite nasty, plus it has huge problems (at least, in the OSE version) with networking under Solaris (which is mind boggling for me since both products are developed by Sun), making it unusable for almost anything in my book; replacing the previously used Linux tun/tap code with its own network modules wasn’t very nice because it reminded me of VMware, and it didn’t solve much in my case;
  • RedHat’s virt-manager is a nice idea but it has limitations that are (quite understandably from one point of view) tied with the hardcoding of RedHat style systems; I say quite understandably because I’m not even dreaming to ask RedHat to support other operating systems before they feel their code is absolutely prime-time ready; on the other hand it would be nice if there was better support for Gentoo, maybe with an active branch for it;
  • I still don’t like the kqemu approach, so I’m hoping for the support to Linux’s own KVM interface in the next kernel release (2.6.29), but it should be feasible; on the other hand, setting up QEmu (or kvm manually) is quite a task the first time.

So while I’m using VirtualBox to virtualise a Windows XP install (which, alas, I have to use for some work tasks and to offer Windows support to my “customers”) I decided to try QEmu for a FreeBSD (vanilla) virtual system; I needed a vanilla FreeBSD to try a couple of things out, so that was a good choice to start. I was actually impressed by the sheer speed of FreeBSD install in the virtual system even without kqemu or KVM, it indeed took less than on my old test systems. I don’t know if the I/O difference between QEmu and VirtualBox was because VirtualBox uses more complex virtual disk images (with recovery data I expect), or because I set QEmu to work straight on a block device (lvm logical volume); I had, though, to curse a bit to get networking working.

A side on networking; since what I wanted was to be able to interface the virtual machines with the rest of my network transparently, I decided to give a try to net-misc/vde; unfortunately getting the thing working with that has been more troublesome than expected. For once, if you don’t set up the TAP device explicitly with OpenRC, vde will try to do so for you, but on my system, it put udev in a state that continuously took up and down the interface, quite unusable. Secondly, I had some problem with dhcpd: even if I set the DHCPD_IFACE variable in /etc/conf.d/dhcpd, the init script does not produce proper service dependencies, I have to explicitly set RC_NEED. In both those case the answer would be “dynamic” dependencies of the scripts, calculating the needed network services based on the settings in the conf.d files. I guess I should open bugs for those.

Once I finally had the networking support working properly, I set up SSH, connected and started the usual task of basic customisation. The first step for me is always to get zsh as shell. Not bash because I don’t like bash as a project, I find zsh more useful too. But once it started building m4, and in particular to test for strstr() time linearity, the virtual machine was frozen solid; qemu strted taking 100% CPU constantly, and even after half an hour it never moved from there. I aborted the VM and tried again, hoping it was just a glitch, but it’s perfectly reproducible. I don’t know what the problem is with that.

So I decided to give a try to installing Solaris, I created a new logical volume, started up again qemu and .. it is frozen solid during boot from the 2008.11 DVD.

In truth, I’m disappointed because the FreeBSD install really looked promising: fast, nice, not overloading more than a single core (I have eight, I can spare one or two for constantly-running VMs), it also worked fine when running as unprivileged user (my user) after giving it access to the kqemu device and the block device with the virtual disk; it didn’t work as nice with the tun/tap support in qemu itself in this setup since it required root to access the tap device, but at least with vde it reduced the amount of code running unprivileged.

On the other hand, since the KVM and QEmu configuration is basically identical (beside the fact that they emulate different network cards), I just tried again kvm, using the manual configuration I used for QEmu and vde for networking (networking configuration was what made me hope to use virt-manager last time, but now I know I don’t need it); it does seem faster, and it also passed the strstr() test before. So I guess the winner this round is KVM, and I’ll wait for the next Linux release to test the QEmu+Linux KVM support.

Post Scriptum: even KVM is unable to start the OpenSolaris LiveDVD though, so I wonder if it’s a problem with Solaris itself; I’d like to try the free as in soda version of Solaris 10, but the “Sun Download Manager” does not seem to work with IcedTea6 and downloading that 4GB image with Firefox is masochistic.

Like a vampire

I have problems with Sun, just a different kind of problem and a different kind of Sun. And I don’t mean I have a problem with the company, Sun Microsystems but rather with some of their products.

The first problem is that, as an upstream they aren’t pleasing to work with. For instance, they changed without notice the file containing the tarball of StudioExpress, just to add some checksumming functionality to make sure the file was properly downloaded before starting. They had the decency of adding a -v2 note on the filename, but it still doesn’t help that they don’t show any changelog for that change, or announce it. I guess somebody resumed looking at security bugs for this to happen since the thing happened almost at the same time as their bug tracker started spamming me many times a day with a notice that some of the bugs I reported has changed details.

The second problem is less of today and more a continuation if a long saga that is actually quite boring. I’ve tried again to get OpenSolaris to work on VirtualBox, but with the new networking support (the vboxnetflt module), the network is tremendously slow, and both NFS and SSH over it as as slot as using them on a 56k modem connection. The main problem being that from time to time the ssh stream freezes entirely, making it quite infeasible to run builds with. Since Solaris, VirtualBox and networking has never been quite that good, and the thing hasn’t improved much now that VirtualBox is developed directly by Sun.

So I decided to use the recently resurrected Enterprise to install OpenSolaris on a real box; the idea was to use the dismissed working disks from Yamato to install not only OpenSolaris but also FreeBSD, NetBSD, DragonFly and other operating systems so I could make sure that the software I work on is actually portable. Unfortunately since I moved to pure SATA (for hard disks at least) a longish time ago, it seems like it’s not that easy: OpenSolaris failed to see any of my disks.

Okay so today I finally took the time to look up an EIDE disk and set it up, I start the OpenSolaris live CD and ask for install. And again it fails to find the hard disks; I would have thought it would be a problem with the motherboard, if it wasn’t that using SysRescueCD I get everything exactly as it should be. Which is more than I can say of my MacBook Pro, whose logic board seems to be bad, and can’t find its own hard drive any longer. I’m waiting to know how much money would it cost me to repair it and then I’ll take it to be repaired (unless it is way too much). This has been unlucky since I had to buy a new laptop for my mother just last week (the iBook I’ve bought six years ago has a bad hard drive now).

So I still don’t have a working setup where to try OpenSolaris stuff, this is quite not nice since I really would like to have my stuff portable to OpenSolaris as well as Linux. Oh well.

Journey in absurd: open-source Windows

Today’s interesting reading is certainly Stormy Peter’s post about hypothetically open-sourcing Windows, while I agree that the conclusion is that Windows is unlikely to get open sourced any time soon, I don’t sincerely agree on other points.

First of all, I get the impression that she’s suggesting that the only reason Linux exists is to be a Free replacement for WIndows, which is certainly not the case; even if Windows were open-source by nature, I’m sure we’d have Linux, and FreeBSD, and NetBSD, and OpenBSD, and so on so forth. The reason for this is that the whole architecture behind the system is different, and is designed to work for different use-cases. Maybe we wouldn’t have the Linux desktop as we know it by now, but I’m not sure of that either. Maybe the only project that would then not have been created, or that could be then absorbed back into Windows, would be ReactOS.

Then there is another problem: confusing Free Software and Open Source. Even if Microsoft open-sourced Windows, adopting the same code would likely not be possible even for projects like Wine and ReactOS that would be able to use it as it is, because the license might well be incompatible with the rest of them.

And by the way, most of the question could probably be answered by looking at how Apple open sourced big chunks of its operating system . While there is probably no point in even trying to get GNU/Darwin to work, the fact that Apple releases code for most of its basic operating system does provide useful insights for stuff like filesystem hacking and even SCSI MMC commands hacking, even just being able to read its sources. It also provides access to the actual software which for instance give you access to the fsck command for HFS+ volumes on Linux (I should update it by the way).

Or if you prefer, at how Sun created OpenSolaris, although one has to argue that in the latter case there is much more similarity with Linux and the rest of *BSD systems that it says very little about how a similar situation with Windows would turn out to be. And in both cases, people still pay for Solaris and Mac OS X.

In general, I think that if Microsoft were to open-source even just bits of its kernel and basic drivers, the main advantages would again come out of filesystem support (relatively, since the filesystems of FreeBSD, Sun Solaris, NetBSD and OpenBSD are really not that well supported by Linux already), and probably some ACPI support that might be lacking in Linux for now. It would be nice, though, if stuff like WMI would then be understandable.

But since we know already that open-sourcing Windows is something that is likely to happen in conjunction with Duke Nukem Forever release, all this is absolutely absurd and should not be thought too much about.

Multiple mini libraries, –as-needed and wrappers

In a previous post of mine, Mart (leio) advocated the use of misdirected link to enable splitting the non-RSS-hungry libxml2 modules from the ones that create a lot of dirty pages; his concern is very true and I can feel it very well, since libxml2 is indeed a bit of a memory-hungry library. On my firefox instance it reports this:

     vmsize   rss clean   rss dirty   file
      32 kb        0 kb       32 kb   /usr/lib64/
       8 kb        0 kb        8 kb   /usr/lib64/
    1396 kb      336 kb        0 kb   /usr/lib64/

While it is shared, it still has 336KiB of resident memory, which is something that is not too bad but not even too good, after all. But how would one split that library? Well you got to know libxml2 interface a bit to understand this fully, so let’s just try to say that libxml2 has a modular design, and it offers a series of interfaces that are more or less tied together.

For instance, for my daily job I had to write a proprietary utility that uses libxml2 XPath interface as well as the writer module that allows for easy writing of XML files with a very nice interface (the work was done under Windows; building and using libxml2 was much easier than trying to get Codegear’s parser to work, or to interface to Microsoft’s MSXML libraries). I disabled everything that was not needed for this to work, and reduced libxml2 to the minimum amount of needed code.

Software that only needs parsing wouldn’t need the writer module, and not all would require DOM, SAX or PUSH, or XPath and XPointer, and so on so forth. To be able to disable the extra stuff there are a series of ./configure flags, but mapping those to USE flags is not really feasible since you’d be breaking ABI; plus a solution should be found with upstream in my opinion.

So what Mart suggested was breaking the library in half, with a “mini” version being the non-memory-hungry and the rest of the interfaces. My proposal here would be much bigger, breaking the ABI a lot, but also very very exhaustive: break up libxml2 in a series of small libraries each representing an interface. A software needing one of them would link it in and be done with it. Beside breaking ABI, this would also break all the software using libxml2 though, even rebuilding it, which is very very bad. Well, the solution is actually much easier:

OUTPUT_FORMAT ( elf64-x86-64 )
GROUP ( AS_NEEDED ( .... ) )

This is an ldscript, which tells the linker what to do; save it as and linking with -lxml2 will pull in just the required libraries for the interface used by the program. If you look at your /usr/lib, you got already quite a few of these because Gentoo installs those for the libraries that are moved into /lib instead. This works around the inability to use misdirected linking for wrappers.

Now of course this trick does not work with every linker out there; but it works with GNU ld and with Sun’s linker, and those are the two for which --as-needed makes sense; if libxml2 where to break itself in multiple libraries, they could decide depending on a configure option whether to install a ldscript wrapper or a non-asneeded capable library, so that Linux, FreeBSD and Solaris (and others) would use the ldscript without adding further ELF files, and the others would go with a compatibility method.

Please also note that using pkg-config for libraries discovery would make this also easier without having wrappers at all, as libxml2.pc would just have to list all the interfaces in their Require: line.

Ruby-Elf and Sun extensions

I’ve written in my post about OpenSolaris that I’m interested in extending Ruby-Elf to parse and access Sun-specific extensions, that is the .SUNW_* sections of ELF files produced under OpenSolaris. Up to now I only knew the format, and not even that properly, of the .SUNW_cap section, that contains hardware and software capabilities for an object file or an executable, but I wasn’t sure how to interpret that.

Thanks to Roman, who sent me the link to the Sun Linker and Libraries Guide (I did know about it but I lost the link to it quite a long time ago and then I forgot it existed), now I know some more things about Sun-specific sections, and I’ve already started implementing support for those in Ruby-Elf (unfortunately I’m still looking for a way to properly test for them, in particular I’m not yet sure how I can check for the various hardware-specific extensions — also I have no idea how to test the Sparc-specific data since my Ultra5 runs FreeBSD, not Solaris). Right at the moment I write this, Ruby-Elf can properly parse the capabilities section with its flags, and report them back. Hopefully, with no mistakes, since only basic support is in the regression test for now.

One thing I really want to implement in Ruby-Elf is versioning support, with the same API I’m currently using for GNU-style symbol versioning. This way it’ll be possible for ruby-elf based tools to access both GNU and Sun versioning information as it was a single thing. Too bad I haven’t looked up yet how to generate ELF files with Sun-style versioning support. Oh well, it’ll be one more thing I’ll have to learn. Together with a way to set visibility with Sun Studio, to test the extended visibility support they have in their ELF extended format.

In general, I think that my decision of going with Ruby for this is very positive, mostly because it makes it much easier to support new stuff by just writing an extra class and hook it up, without needing “major surgery” every time. It’s easy and quick to implement new stuff and new functions, even if the tools will require more time and more power to access the data (but with the recent changes I did to properly support OS-specific sections, I think Ruby-Elf is now much faster than it was before, and uses much less memory, as only the sections actually used are loaded). Maybe one day once I can consider this good enough I’ll try to port it to some compiled language, using the Ruby version as a flow scheme, but I don’t think it’s worth the hassle.

Anyway, if you’re interested in improving Ruby-Elf and would like to see it improve even further, so that it can report further optimisations and similar things (like for instance something I planned from the start: telling which shared objects for which there’s a NEEDED line are useless, without having to load the file trough to use the LD_* variables), I can ask you one thing and one thing only: a copy of Linkers and Loaders that I can consult. I tried preparing a copy out of the original freely available HTML files for the Reader but it was quite nasty to see, nastier than O’Reilly freely-available eBooks (which are bad already). It’s in my wishlist if you want.

Supporting more than one compiler

As I’ve written before, I’ve been working on FFmpeg to make it build with the Sun Studio Express compiler, under Linux and then under Solaris. Most sincerely, while supporting multiple (free) operating systems, even niche Unixes (like Lennart likes to call them) is one of the things I spend a lot of time on, I have little reason to support multiple compilers. FFmpeg on the other hand tends to support compilers like the Intel C Compiler (probably because it sometimes produces better code than the GNU compiler, especially when coming to MMX/SSE code — on the other hand it lacks some basic optimisation), so I decided to make sure I don’t create regressions when I do my magic.

Right now I have five different compile trees for FFmpeg: three for Linux (GCC 4.3, ICC, Sun Studio Express), two for Solaris (GCC 4.2 and Sun Studio Express). Unfortunately the only two trees to build entirely correctly are GCC and ICC under Linux. GCC under Solaris still needs fixes that are not available upstream yet, while Sun Studio Express has some problem with libdl under Linux (but I think the same applies to Solaris), and explodes entirely under Solaris.

While ICC still gives me some problems, Sun Studio is giving me the worst headache since I started this task.

While Sun seems to strive to reach GCC compatibility, there are quite a few bugs in their compiler, like -shared not really being the same as -G (although the help output states so). Up to now the most funny bug (or at least absurd idiotic behaviour) has been the way the compiler handles libdl under Linux. If a program uses the dlopen() function, sunc99 decides it’s better to silently link it to libdl, so that the build succeeds (while both icc and gcc fail since there is an undefined symbol), but if you’re building a shared object (a library) that also uses the function, that is not linked against libdl. It remembered me of FreeBSD’s handling of -pthread (it links the threading library in executables but not in shared objects), and I guess it is done for the same reason (multiple implementation, maybe in the past). Unfortunately since it’s done this way, the configure will detect dlopen() not requiring any library, but then later on libavformat will fail the build (if vhook or any of the external-library-loading codecs are enabled).

I thus reported those two problems to Sun, although there are a few more that, touching some grey areas (in particular C99 inline functions), I’m not sure to treat as Sun bugs or what. This includes for instance the fact that static (C99) inline functions are emitted in object files even if not used (with their undefined symbols following them, causing quite a bit of a problem for linking).

The only thing for which I find non-GCC compilers useful is to take a look to their warnings. While GCC is getting better at them, there are quite a few that are missing; both Sun Studio and ICC are much more strict with what they accept, and raise lots of warnings for things that GCC simply ignores (at least by default). For instance, ICC throws a lot of warnings about mixing enumerated types (enums) with other types (enumerated or integers), which gets quite interesting in some cases — in theory, I think the compiler should be able to optimise variables if they know they can only assume a reduce range of values. Also, both Sun Studio, ICC, Borland and Microsoft compilers warn when there is unreachable code in sources; recently I discovered that GCC, while supporting that warning, disables it by default both with -Wall and -Wextra to avoid false positives with debug code.

Unfortunately, not even with the combined three of them I’m getting the warning I was used to on Borland’s compiler. It would be very nice if Codegear decided to release an Unix-style compiler for Linux (their command-line bcc for Windows does have a syntax that autotools don’t accept, one would have to write a wrapper to get those to work). They already released free as in soda compilers for Windows, it would be a nice addition to have a compiler based upon Borland’s experience under Linux, even if it was proprietary.

On the other hand, I wonder if Sun will ever open the sources of Sun Studio; they have been opening so many things that it wouldn’t be so impossible for them to open their compiler too. Even if they decided to go with CDDL (which would make it incompatible with GCC license), it could be a good way to learn more things about the way they build their code (and it might be especially useful for UltraSPARC). I guess we’ll have to wait and see about that.

It’s also quite sad that there isn’t any alternative open source compiler focusing, for instance, toward issuing warnings rather than optimising stuff away (although it’s true that most warnings do come out of optimisation scans).

So, what am I doing with OpenSolaris?

I’ve written more than once in the past weeks about my messing with OpenSolaris, but I haven’t explained very well why I’m doing that, and what exactly is that I’m doing.

So the first thing I have to say is that since I started getting involved in lscube I focused on getting the buildsystem in shape so that it could be more easily built, especially with out-of-tree builds, which is what I usually do since I might have to try the build with multiple compilers (say hi to the Intel C Compiler and Sun Studio Express). But since then, I only tested it under Linux, which is quite a limitation.

While FreeBSD is reducing tremendously the gap it had against GNU/Linux (here it’s in full, since I intend Linux and glibc together), OpenSolaris has quite a few differences from it, which makes it an ideal candidate to check for possible GNUisms creeping into the codebase. Having the Sun Studio compiler available too makes it also much simpler to test with non-GCC compilers.

Since the OpenSolaris package manager sucks, I installed Gentoo Prefix, and moved all the tools I needed, included GCC and binutils, in that. This made it much easier to deal with installing the needed libraries and tools for the projects, although some needed some tweaking too. Unfortunately there seems to be a bug with GNU ld from binutils, but I’ll have to check if it’s present also in the default binutils version or if it’s just Gentoo patching something wrong.

While using OpenSolaris I think I launched quite a few nasty Etruscan curses toward some Sun developers for some debatable choices, the first being, as I’ve already extensively written about, the package manager. But there has been quite a few other issues with libraries and include files, and the compiler itself.

Since feng requires FFmpeg to build, I’ve also spent quite a lot of time trying to get FFmpeg to build on OpenSolaris, first with GCC, then with Sun Studio, then again with GCC and a workaround with PIC: the bug I noted above with binutils is that GNU ld doesn’t seem to be able to create a shared object out of object not compiled with PIC, so it requires -fPIC to be forced on for them to build, otherwise the undefined symbols for some functions, like htonl() become absolute symbols with value 0 which cause obvious linking errors.

Since I’ve been using OpenSolaris from a VirtualBox virtual machine (which is quite slow even though I’m using it without Gnome, using SSH to have a login, and jumping inside the Gentoo prefix installation right away), I ended up trying to first build FFmpeg with the Sun Studio compiler taken from Donnie’s overlay under Linux, with Yamato building with 16 parallel processes. The problem here is that the Sun Studio compiler is quite a moving target, to the point that a Sun employee, Roman Shaposhnik , suggested me on ffmpeg-devel to try Sun Studio Express (which is, after all, what OpenSolaris has too), that should be more similar to GCC than the old Sun Studio 10 was. This is why dev-lang/sunstudioexpress is in portage, if you didn’t guess it earlier.

Unfortunately even with the latest version of Sun Studio compiler, building FFmpeg has been quite some trouble. I ended up fighting quite a bit with the configure script and not limited to that, but luckily, now most of the patches I have written have been sent to ffmpeg-devel (and some of them accepted, others I’ll have to rewrite or improve depending on what the FFmpeg developers think about them). The amount of work needed just to get one dependency to work is probably draining up the advantage I had in using Gentoo Prefix for those dependencies that work out of the box with OpenSolaris.

(I’ll probably write about the FFmpeg changes more extensively as they deserve a blog entry on their own , and actually the drafts for the blog entries I have to write starts to pile up just as much as the entries in my TODO list.)

While using OpenSolaris I also started understanding why many people hate Solaris this much; a lot of command spit out errors that don’t let you know at all what the real problem is (for instance if I try to mount a nfs filesystem with a simple mount nfs://yamato.local/var/portage /imports/portage, I get an error telling me that the nfs path is invalid; on the other hand, the actual error here is that I need to add -o vers=2 to request NFSv2 (why it doesn’t seem to work with v3 is something I didn’t want to investigate just yet). Also, the OpenSolaris version I’m using, albeit it’s described as “Developer Edition”, lacks a lot of man pages for the library functions (although I admit that most of those which are present are very clear).

In addition to the porting I’ve written about, I’ve also taken the time to extend the testsuite of my ruby-elf, so that Solaris ELF files are better supported; it is interesting to note that the elf.h file from OpenSolaris contain quite more definitions about that, I haven’t yet looked at the man pages to see if Sun provide any description about the Sun-specific sections, for which I’d also like to add further parsers classes. It has been interesting since neither the GNU nor the Sun linkers set the ELF ABI property to values different from SysV (even though both Linux and Sun/Solaris have an ABI value defined in the standard), and GNU and Sun Sections have some overlapping values (like the sections used for symbol versioning: glibc and Solaris have different ways to handle those, but the section type ID used for both is the same; the only way to discern between the two is the section name).

At the end, to resolve the problem, I modified ruby-elf not to load the sections at once, but just on request, so that by the time most sections are loaded, the string table containing the sections’ names is available. This allows to know the name of the section, and thus discern the extended sections by name rather than ABI. Regression tests have been added so that the sections are loaded properly for different elf types too. Unfortunately I haven’t been able to produce a static executable on Solaris with neither the Sun Studio compiler nor GCC, so the only tests for the Solaris ELF executables are for dynamic executables. Nonetheless, the testsuite for ruby-elf (which is the only part of it to take up space: out of 3.0MB of space occupied by ruby-elf, 2.8MB are for the tests) reached 72 different tests and 500 assertions!

OpenSolaris Granularity

If you follow my blog since a long time ago you know I had to fight already a couple of time with Solaris and VirtualBox to get a working Solaris virtual machine to test xine-lib and other sotware on.

I tried again yesterday to get one working, since Innotek was bought by Sun, VirtualBox support for Solaris improved notably, to the point they now have a different network card emulated by default, that works with Solaris (that has been the long-standing problem).

So I was able to install OpenSolaris, and thanks to Project Indiana I was able to check which packages were installed, to remove stuff I don’t need and add what I needed. Unfortunately I think the default granularity is a bit concerning. Compiz on a virtual machine?

The first thing I noticed is that an update of a newly-installed system with the last released media requires to download almost the size of the whole disk in updates, the disk is a simple 650MB CD image, and the updates were over 500MB. I suppose this is to be expected, but at that point, why not pointing to some updated media by default, considering updating is far from being trivial? Somehow I was unable to perform the update properly with the GUI package manager, and I had to use the command-line tools.

Also, removing superfluous packages is not an easy task, since the dependency tracking is not exactly the best out there: it’s not strange for a set of packages not to be removed because some of them are dependencies… of one of them being removed (this usually seems to be due to plugin-ins; even after removing the plugins, it’d still cache the broken dependency and disallow me from removing the packages).

It’s not all here of course, for instance to find the basic development tools in their package manager is a problem of its own; while if you look for “automake” it will find a package named SUNWgnu-automake, if you look for “autoconf” it will find nothing; the package is called SUNWaconf. I still haven’t been able to find pkg-config, although the system installs .pc files just fine.

I guess my best bet would be to remove almost everything out the system from their own package manager and decide to try prefixed-Portage, but I just haven’t had the will to look into that just yet. I hope it would also help with the version of GCC that Sun provides (3.4.3).

I got interested back into Solaris since, after a merge of Firefox 3.0.2, I noticed cowstats throwing up an error on an object file, and following to that, I found out a couple of things:

  • cowstats didn’t manage unknown sections very well;
  • Firefox ships with some testcases for the Google-developed crash handler;
  • one of these testcases is an ELF ET_EXEC file (with .o extension) built for Solaris, that reports a standard SysV ABI (rather than a Sun-specific one), but still contains Sun-specific sections;
  • readelf from Binutils is not that solid as its homologue from Elfutils.

Now cowstats should handle these corner-cases pretty well, but I want to enrich my testcases with some Solaris objects. Interestingly enough, in ruby-elf probably 80% of the size of an eventual tarball would be taken up by test data rather than actual code. I guess this is a side-effect of TDD, but also exactly why TDD-based code is usually more solid (every time I find an error of mine in ruby-elf, I tend to write a test for it).

Anyway, bottom line: I think Project Indiana would have been better by adapting RPM to their needs rather than inventing the package manager they invented, since it doesn’t seem to have any feature lacking in Fedora, but it lacks quite a bit of other things.

And yet again, I miss Borland’s compiler

Don’t get the title wrong, I like GCC, but there are a few things that don’t trigger a warning in GCC, but do on Borland’s, which are quite useful and important.

The main thing I miss is the warning that Borland gives you when a variable is given a value that is never used. As I wrote more than a week ago GCC does not warn you about unused variables if they are assigned a value after they are declared. Which in xine tends to happen quite some times.

This is pretty important because, even if GCC is good enough not to emit the variable if it’s not used, if the assigned value is the return value of a function, the function call is unlikely to be optimised away. Pure and constant functions should be optimised away, but for functions the compiler has no clue about (which is the common status for non-static functions unless you tell it otherwise) the call is still executed, as it might change the global state variables. If the call is expensive, it would be a waste of CPU.

So I first tried ICC, remembering it used to have nicer and stricter warnings than GCC. Unfortunately even after installing it, getting a license key and opening a new shell with the environment set up, I get this:

/usr/include/stdlib.h(140): error: identifier "size_t" is undefined
  extern size_t __ctype_get_mb_cur_max (void) __THROW __wur;

As you can guess, it’s not very nice that size_t results undefined, and indeed it can’t even complete the ./configure run.

Then I decided to try Sun’s compiler. I remembered Donnie having an ebuild for sunstudio on his overlay, so I downloaded that and installed sunstudio. I had to fix a bit the build system of xine because Sun’s compiler was detected only under Solaris for PThread support, while of course you can use Sun’s compiler under Linux too.

After completing the ./configure run properly, I’ve started seeing issues with xine’s code.. well I expected that. Mostly, the short form of the ternary operation (foo ? : bar, which is equivalent to foo ? foo : bar but with a single evaluation of foo) is not supported – I suppose it’s a GNU extension – but that’s not difficult to fix by avoiding that form…

The problems started the moment it compiled the first source file for xine-lib itself (rather than its tools):

c99: Warning: illegal option -fvisibility=hidden
"../../../src/xine-engine/xine.c", line 83: internal compiler error: Wasted space
c99: acomp failed for ../../../src/xine-engine/xine.c

Now with all the good will I have, what should “Wasted space” mean to me‽

The illegal option is also a nice thing to see, considering that I test that during the ./configure phase, and Sun’s compiler answers me a lot like it works:

configure:49543: checking if compiler supports -fvisibility=hidden
configure:49560: sunc99 -c -features=extensions -errwarn=%all -fvisibility=hidden  conftest.c >&5
c99: Warning: illegal option -fvisibility=hidden
configure:49567: $? = 0
configure:49584: result: yes

Sincerely, I start to think a lot lately when I read about Sun wanting the good of Free Software. I had a few people telling me that xine lacks support for Solaris, Sun Studio compiler, UltraSPARC architecture, … well it’s not like it’s easy to support those, considering that Solaris for x86 is quite slow, and wasn’t working under VirtualBox for a while – it should work now but I haven’t had time to look at it yet, SunStudio for Linux fails, as I just noted, and the only way to get a decent Sun system for a standalone developer is looking and hoping at second hand offers on eBay and similar (a T2 basic server costs about $15K, a bit out of my league, for optimising xine, and as far as I can see all their workstation are now AMD64-based — or x64 as they call it, but I hate that market name as it really means nothing).

Maybe they are just interested in enterprise Free Software, but still… I sincerely think they have the right cards to make some difference, but I can’t see much Free Software development, beside the usual enterprise one, going on with Sun systems in the next future. Which is a bit sad considering I’ve seen my Ultra5 outpowering an Athlon almost twice its frequency…

And still VirtualBox does not allow to run Solaris with networking

I tried again, just to test, even if I knew it was unlikely, and even if I certainly don’t enjoy working on Solaris (well, not like I tried before, but whatever). The result is just the same: Solaris cannot get the network interface working.

I have to say, though, that there is something new in Solaris Express Developer Edition 09/7: the Developer version has a new installer that seems to be in GTK rather than Java, it also seems quite faster, but… it didn’t work for me. If I filled in the user information to create a ‘flame’ user, the result was that the shadow file couldn’t be opened, without that, it stalled after a couple of minutes reporting a failure in installation process.

The old installer of the non-Developer edition, in Java, is still slow as hell, especially to install the packages, but it worked quite fine. Although I still think that asking more than 8GB for the system is way too much: a server chroot of Gentoo takes up about half a gigabyte.

Anyway this does mean that I can’t test xine-lib-1.2-ac on Solaris yet, and thus I cannot merge it on main branch yet. This is good news for Amarok users as once the branches are merged Amarok 1.4 will stop working, unless I can force someone from the Amarok team to focus a bit on that xine branch ;)