Virtualisation WTF once again.

To test some more RTSP clients I’ve been working to get more virtual machines available in my system; to do so I first extended the space available in my system by connecting one more half-a-terabyte hard drive (removing the DVD burner from Yamato), and then started again working on a proper init script for KVM/Qemu (as Pavel already asked me before, and provided me with an example).

Speaking about it, if somebody were to send my way an USB or FireWire DVD burner I’d be probably quite happy; while I have other three DVD burners around – iMac, MacBook Pro and Compaq laptop – having one on Yamato from time to time came out useful; not necessary, so wasting a SATA port for it was not really a good idea after all, but still useful.

I started writing a simple script before leaving for my vacation and extended it a bit more yesterday. But in line with the usual virtualisation woes the results aren’t excessively positive:

  • FreeBSD 8 pre-releases no longer seem to kernel panic when run in qemu (the last beta I tried did, the latest rc available does not); on the other hand it does seem to have problems with the default network (it works if started after boot but not at boot); it works fine with e1000;
  • NetBSD still is a desperate case: with qemu (and VDE) no network seem to work; e1000 is not even recognised, while the others end up timing out, silently or not; this is without ACPI enabled, if I do enable ACPI, no network card seems to be detected; with KVM, it freezes, no matter with or without ACPI, during boot up;
  • Pavel already suggested a method using socat and the monitor socket for qemu to shut down the VM cleanly; the shutdown request will cause the qemu or kvm instance to send the ACPI signal (if configured!) and then it would shut down cleanly… the problem is that the method requires socat, which is quite broken (even in the 2-beta branch).

Let me explain what the problem is with socat: its build system tries to identify the size of various POD types that are used by the code; to do so it uses some autoconf trickery, the -Werror switch and relies on pointer comparison to work with two POD types of the same size, even if different. Guess what? That’s no longer the case. A warning sign was already present: the code started failing some time ago when -Wall was added to the flags, so the ebuild strips it. Does that tell you something?

I looked into sanitizing the test; the proper solution would be to use run-test, rather than build-tests, for what I can see; but even if that’s possible, it’s quite intrusive and it breaks cross-compilation. So I went to look why the thing really needed to find the equivalents… and the result is that the code is definitely messy. It’s designed to work on pre-standard systems, and keep compatible with so many different operating systems that fixing the build system up is going to require quite a bit of code hacking as well.

It would be much easier if netcat supported handling of unix local sockets, but no implementation I have used seem to. My solution to this problem is to replace socat with something else; based on a scripting language, such as Perl so that’s as portable, and at the same time less prone to problems like those socat is facing now. I asked a few people to see if they can write up a replacement, hopefully this will bring us a decent replacement so we can kill that.

So if you’re interested in having a vm init script that works with Gentoo without having to deal with stuff like libvirt and so on, then you should probably find a way to coordinate all together and get a socat replacement done.

Random bits

Since I haven’t had much time to write in the past few days (lots of stuff going on in my life, both personal and professional), I’m just going to draft up a few quick, random bits that might be of interest:

  • I’ve written some notes about automake 1.11 on the Axant blogs; as I said before, you might want to take a look to that blog too since I’ll post something there from time to time; I’m also working on some extension to the Autotools Mythbusters guide which I’ll post later;
  • staying in topic with the guide, it is no longer donation-based (for the same reason why I cannot accept money donations any longer); instead I’ll work on it on a free-time basis; you can still send me a gift if you wish for me to write about a particular topic about Autotools, or you can hire me if you need it for sure;
  • for what concerns my return to Gentoo/FreeBSD I still haven’t finished porting PulseAudio; but I was able to tackle a few more problems, included unused and missing USE flags in the system ebuilds, now it should look nicer;
  • I’m currently on hold when it comes to feng but I count on getting back on track quite soon;
  • no luck with NetBSD yet, even with the latest version of QEmu (0.10.4);
  • while I have scanned over 400 sheets of paper , there are more documents, bills, and packaging slips that I haven’t scanned; and I haven’t even found the time to actually scan the house’s bills (which wouldn’t be required for me to move out but would still be nice); scanning the stuff by hand with the flatbad scanner is not too nice though, so if somebody has a suggestion for a cheap (less than €300) scanner with ADF properly supported by sane, it’d be nice;
  • the Typo install on the blog has been modified a bit to improve the tagging functionalities, but I haven’t merged duplicated tags yet, nor added proper descriptions where it’s needed, for this reason I’ve temporarily removed indexing of tags for all the search engines; once the tags are cleaned up I’ll see to put it back on; I should also hide tags with just one post each, for safety;
  • I need some personal time to handle some things with the change of season, starting with discarding old clothes that I can’t wear any longer (the Summer Of Code 2006 T-Shirt is one of those unfortunately): with the whole thing about hospital, I went from an XXL to a M size..

And for those who actually wonder what the heck I’ve been doing lately, there’s always identi.ca (and Twitter, but I prefer identi.ca).

Virtualization updates

Seems like that one way or another a common “column” on my blog is reserved to virtualisation issues. I blogged a lot about VirtualBox (before dissing it finally), and then I moved on to KVM and QEmu.

Last time I blogged about it, I was still unable to get NetBSD to detect any network card with KVM, while I had OpenSolaris, FreeBSD and Ubuntu working fine. I also had some problems with Gentoo/FreeBSD and the KVM video emulation. But since then, stuff changed, in particular, QEmu now supports KVM technologies natively (and it’s not yet updated to the latest version). Let’s see if this changed something.

Thanks to aperez I now know how to get NetBSD to identify the network card: disabling ACPI. Unfortunately disabling ACPI with KVM freezes the boot. And I want to use VDE for networking since I already have Yamato configured as a router and file server for the whole network, which seems to fail when using NetBSD with QEmu: while dhcpd receives the requests, the replies never reach NetBSD, and I’m stuck for now. I’m going to try again with the newer QEmu version. Also, out of all the cards I tried in QEmu, the Intel E1000 fails because it cannot find the EEPROM.

The Gentoo/FreeBSD video problem that stopped me from using vim during the configuration phase on the minimal CD does not happen when using QEmu; on the other hand since the SDL output is tremendously slow, I’m using the VNC support; quite nice if it wasn’t that Vinagre does not seem to support VNC over Unix sockets, which would make the whole configuration much nicer, without consuming precious network ports. I have to see if I just missed something, and if I didn’t, I should either request for it to be added, or write the support myself (even better). I guess that the underlying code supports the Unix socket since I expect the virt-manager to use that to communicate with the VM.

Speaking of which, I haven’t looked at virt-manager or anything in quite a while; I should see if they still insist on not giving me the choice of just using VDE for networking instead of dnsmasq and similar; for now the whole configuration is done manually with a series of aliases in my ~/.shrc file, with (manually) sequential MAC addresses hardcoded, as well as VNC ports, LVM volumes (used for the virtual disks, seem to be quite faster than using a file over VFS), and hostnames (in /etc/hosts beside for Ubuntu that has Avahi working).

I have to admit, though, that I have some doubts about the performances of QEmu/KVM versus the usual KVM, at least it’s taking quite a long time to unpack the tarball with the stage3 of Gentoo/FreeBSD 7.1. I hope I/O is the bottleneck here.

Speaking of I/O as bottleneck, I was finally able to get a gigabit switch for the office, the next step is to buy some many metres of cable so I can actually wire up my bedroom with the office, passing through a few other rooms of the house so that I can actually have a fast enough network for all the computers in their standard setup (and use wireless only when strictly needed). Although I do have some doubts about this since I really want to move out.

In the mean time, Enterprise is soon going to be re-used as a backup box, I just need to find an easy way to send a WOL packet, wait for the box to come up, backup everything, and shut down, once a week. I have the last unused 500GB disk on that box so it should be easy. But I’d like to have an mtree of the data that has been backed up, which I’m still unsure on how to get.

Virtualisation woes, again

I know this starts to get old, with my ranting about virtualisation software, but since I’m trying my best to optimise the power of Yamato to software testing, I’m still working on getting virtualised systems to properly work for me.

In a long series of blog posts ranting about VirtualBox, QEmu, KVM and so on, there was exactly one system that worked quite fine up to now: Windows XP (SP3) under VirtualBox. With the latest release, though, this was broken too: network started up then came crashing down, with a striking resemblance to an old Solaris problem .

Since I was in need to have my Windows XP virtual machine working for a job, I tried porting it to Parallels on my iMac, with the Parallels demo (since my license was only valid for 3.x series). After waiting for the 64GB image file to convert, it turns out that there is no hope in getting it to start: the VirtualBox additions drivers crash with a blue screen of death at boot when they are executed outside of a VirtualBox instance; the Windows Recovery console does not allow to remove the drivers from loading, and trying to delete the drivers to avoid them from loading was not an option, since they get installed in the program files directory (that the recovery console cannot access).

At the end, given the absolute unreliability of VirtualBox on every operating system at this point, I simply gave up and paid for the upgrade of my license to Parallels 4, which is now providing as my only Windows XP instance (which I’m still unfortunately tied to for work), and deleted VirtualBox from my system. Why, you’d ask, since networking not working is far from the biggest problem out there? Well the biggest problem, and the final straw that broke the camel’s back, was that while trying to figure out why Samba was not working, VirtualBox’s network filter module crashed the kernel. So what? Well, VirtualBox decided that rather than using the quite well-tested mixed kernel/userland TUN/TAP networking system, or the userland virtual network (with tap to interfacing it with the rest) provided by VDE, they had to provide a kernel module instead. For performance reasons, or, quite most likely, so that they can have the same interface to the network internals between different operating systems. Do I have to make it explicit how this is a problem?

Interestingly, while writing this I noticed that there are problems downloading VirtualBox and the thing also reminded me of how many time they messed up the ebuilds by changing the tarballs…

But it doesn’t stop here. Remember the NetBSD trouble with the networking I reported about one month ago? Well, I wanted to see if something changed with the new NetBSD 5.0 release (I actually wanted to make sure that feng detected the newly-added POSIX Message Queue support properly), but still no luck, I don’t see any network card with whatever model I provide to KVM, included the e1000 that I’d expect NetBSD to support at least.

On the other hand I was at least able to get Ubuntu (9.04) working on KVM, next step is Fedora 11, so I can actually test feng on other distributions as well as Gentoo.

More virtually real troubles

So after fighting with QEmu and surrendering to KVM I finally got a FreeBSD 7.1 vanilla instance, and an OpenSolaris instance running; I made sure that feng builds on both, and since I was there I also fixed up the SCTP autoconf check on both, so that feng can ideally speak SCTP with both of them.

A note here for those interested: SCTP (Stream Control Transmission Protocol) is a protocol, alternative to TCP and UDP, that is designed to work well for streaming applications; the fact that feng supports it is more a proof of concept than an actually useful feature, I’m sincerely not sure how well it works nowadays, but since I had to fight to get it to build correctly on Linux already, I just wanted to fix it up for FreeBSD and Solaris implementations as well; I assumed that Apple had its own implementation as well but even though there are APPLE defines in the FreeBSD implementations, at least OS X 10.5 lacks any SCTP support that I can see.

I already have reserved a logical volume for Gentoo/FreeBSD 7.1 which I’m hopefully going to test today, but in the mean time I wanted to fix up NetBSD too, since I have seen that it also has an SCTP stack, and since none of the three we support now is identical to the other it seemed worth looking into it; unfortunately NetBSD is proving to have no network to offer me. While I set up the KVM instance just like any other, no matter which model I use I can see no device in ifconfig -a output of NetBSD; I have chosen the full installation, but still it doesn’t seem to have much. The documentation also doesn’t seem to help.

I guess NetBSD will keep waiting in line for now, unless somebody has a suggestion on how to deal with it.

Yet another Solaris screwup

Today I spent almost the whole day (after resurrecting Farragut from the downtime — gotta love the Italian default power company (ENEL); there are people who forgot when they got they last blackout, I forgot which was the last month I didn’t have a blackout of more than 45 minutes!) sleeping on my bed, or watching something on the TV through the laptop. I pulled an allnighter last night for a midsummer event, so I was mostly KO, then I also got a bad headache and was unable to actually do anything useful, add to that the temperature and you can pretty much understand how I feel now.

Anyway, before my power was cut out by ENEL, I was able to complete downloading the tarballs forming the official Solaris Developer Express (why does this remind me of Agatha Christie?) VMware datafiles. I supposed that if it’s official, it should be already well configured, and should not have the same problems I had after installing VMware tools.. Unfortunately, that wasn’t the case: also this one gets a default screen size similar to my real screen size, counting both monitors, which makes it basically impossible to use for me.

This starts to be ridiculous, especially since Solaris is basically the only thing that is forcing me back from merging the xine audio conversion branch into 1.2 series (okay it will still be a bit rough, but it mostly works, and it just needs more people testing it to get good). This is quite interesting for xine users wanting to use JACK output, as the JACK plugin in 1.2 audio conversion branch is basically rewritten: the previous version was kinda crazy, and with audio conversion now in place I could rewrite it to just accept exclusively 32-bit floating point samples, which solves a lot of troubles for JACK output.

Anyway, tomorrow I hope to be able to do something more useful: there is still FusionSound to be ported, even if I have no idea where to start with that, and I suppose I could try to get NetBSD online and see if I can port the SunAudio code to the new interface with that, although i doubt basing on NetBSD exclusively will make sure I don’t break it for Solaris too. Maybe I should see to fix xine-lib for NetBSD and OpenBSD for now and schedule Solaris support for the future…

Another thing I’ll be working on is trying to find a new issue tracker for the xine project. The SourceForge.net tracker simply sucks, the interface is one of the worse out there, the search is clumsy, there are tons of duplicate bugs, it’s difficult to understand when an issue was reported, with which version, it’s difficult to do complex queries, the attachments are messy to handle and at the end it’s not uncommon that I fix a bug because I find it, and then find it reported on SourceForge.

Most likely, we’ll end up using Roundup like FFmpeg; it might become a standard for Multimedia software, who knows? It’s nice, flexible.. it might need a bit of work to get a good CSS, but that’s true for any other bug tracker out there, will be liked by Ubuntu people who will be able to push stuff upstream as Malone supports it…

Anyway, hopefully tomorrow it will be a productive day, I sincerely hope so. I also need to see if I should buy a new sound card, the VIA82xx with DXS is driving me crazy as it requires me to run JACK with 48kHz resolution to have HW mixing together with PulseAudio, and I don’t want to stop PulseAudio just to play the keyboard. Does anybody have an EchoAudio Mia or MiaMidi card and can tell me how it works under Linux? There is an ALSA driver that should be fine – most EchoAudio cards are – but I’m not sure whether the HW mixing support has the same constrain as VIA82xx’s (only 48kHz streams can be mixed together), if that was the case it would mean nothing to change the card for me.

Oops I did it again.

What, you’ll ask? I broke Gentoo/FreeBSD, or at least I’m preparing locally to break it, badly.

With 6.1 release I thought I was finally safe from libraries being moved out of base system, changing sonames (switching from the pimped up versions of FreeBSD to the proper ones from their authors), but it seems like I didn’t hit a crazy one now.

I should have paid more attention to Tiziano the other day, and I would have broke Gentoo/FreeBSD a couple of days already, and fixed it too, but I had to end up with my head against the wall of libeditline to end up working on the problem.

A little background: GNU readline is a cool library for line editing, the thing you usually do in bash when you move on the command line you’re writing and change it here and there; it’s released under the GPL license, not the LGPL license, so the software that’s released with non-GPL licenses can’t use it (think of BSD or MPL licensed software). To overcome this problem, NetBSD project developed libeditline (or simply libedit) that is a BSD library, API compatible with GNU readline.

This library is used (with an inline copy) in Heimdal (which Tiziano was working on) and in Firebird (which I tried to work on last night); as this end up creating conflicts, and in general is against our policies, I told him (and knew for myself) that the inline copy should have been replaced by a shared copy of the library.. for Linux it’s easy, Mike added some time go the dev-libs/libedit package, but on FreeBSD, where the library should have been provided by sys-freebsd/freebsd-lib, they try to use add_history() and readline() functions that are not available.

I then tried to write an ebuild for the version of libedit that’s in ports but that didn’t work out much good, mostly because it would have meant having two packages almost identical in the tree, and because it was also older than Mike’s package. As this didn’t work, I then worked on making dev-libs/libedit build on FreeBSD by splitting the Gentoo patch from the GLIBC patch, and the result is quite neat, I’m just waiting for Mike to check it and say if he’s okay with this merge.

What’s the problem then, if this is done already? Eh, as I said, the library changed soname, from libread.so.5 to libread.so, and the result is that /bin/sh won’t load after you merge freebsd-lib and before merging freebsd-bin again with libedit installed.

Okay, it’s hardly a showstopper, considering that not much of portage relies on /bin/sh and most simply depends on bash (that links to readline and is thus safe) but it will be a mess for first time installs.

For this reason, I decided that as soon as Mike merges the changes, I’ll be rebuilding the 6.2 stage, and updating the documentation myself so that I can point people to 6.2 directly (that’s proving simpler to manage than 6.1 especially with the new baselayout). The problem is the migration from one to the other being not that trivial, but that can be fixed easily too.

Anyway, the new stages will be shinier and cleaner, with the libedit split out, and after that we can work on getting Firebird and Heimdal on Gentoo/FreeBSD :)