Is xine still alive? Well, let’s say it’s in life support

Personal note first: I’ve been swamped with deliveries these past two weeks, which means that most of my mail is parked in the inbox waiting for reply; if you read this post and wonder why I haven’t replied to you yet.. just know that anything that could be construed as work will have to fit in the 8am-7pm time range.. and I started writing this at 10pm.

In August I’ve migrated my blog, website (and a few customers’ websites as well) from a vserver to a KVM guest, thanks to my hoster – ios-solutions – having it available, and with IPv6. Move to IPv6 was something I was particularly interested in, since I’ve deployed it locally to have stable addresses for all systems and with it on the server I could tell who was connecting to what exactly.

Today, after making sure that the migration to KVM was working quite well, I applied the same migration to the server that hosts xine’s website and Bugzilla installation. This allows me to use a single local guest to build packages for both servers, and keep both Portage world files empty, using Portage 2.2 sets feature to distinguish what to install on one or the other.

The “new” server (which is actually a hybrid made by a base image of Earhart, this server, and the configurations from Midas, the old xine server) lost its name and naming scheme (given that when I called it Midas it was also “my” server) and is now simply xine-project.org — but at the same time it gained full IPv6 support and, contrarily to my own domain, that means that mail can be delivered on a pure IPv6 network as well.

At any rate, I wanted to take this opportunity to remember everyone that xine is not entirely dead, as it can be told by me actually spending my personal free time to work on its server rather than simply giving up. While I have not followed through with the 1.2 release – mostly because I lost track of xine after my weeks at the hospital three years ago – there should be enough code there to be released, if somebody cared about giving it some deserved love. But just doing that is … unlikely to help alone.

If you’ve been following me for long enough, you know that I have worked hard on xine and learnt a lot by doing so. One of the things I learned is that its design and not-invented-here-isms are not something you really want on a modern project. On the other hand, I still think that the overall, general, high-level design of splitting frontends and the library with more tightly coupled plugins is a good idea. Of course this is more or less the same overall, general, high-level design followed by VLC.

What are the problems with continuing with xine-lib the way it is? Well, the plugins right now are too abstracted; while they don’t reach the level of abstraction of GStreamer, that makes them obnoxious, and they are still mostly shipped with xine, there are limitations, which is why of all the major users of libav/ffmpeg, xine does not use libavformat to demux the files, d’oh. Plugins also have a long list of minor issues with their design, starting with the whole structure handling that is a total waste of space and CPU cycles.

So if you’re interested in xine, please come to #xine @ OFTC, the project has still potential, but it needs new blood.

Why there is no third party access to the tinderbox

So we seem all to agree that the tinderbox approach (or at least a few tinderbox approaches) are the only feasible way to keep the tree in a sane state without requiring each developer to test it all alone. So why is my tinderbox (and almost any other tinderbox) not accessible to third parties? And, as many asked before, why isn’t a tinderbox such as the one I’ve been running managed by infra themselves, rather than by me alone?

The main problem is how privileges are handled; infra policies are quite strict, especially for what concerns root access; right now, the tinderbox is implemented with LXC which is not dependable to have proper privilege separation, neither locally, nor for what concerns network (even if you firewall it out so that the mac address only has access to limited services, you can still change the mac address for the LXC). This approach is not feasible for infra-managed boxes, and can have quite a bit of trouble for what concerns private networks as well (such as mine).

The obvious alternative to this is to use KVM; while KVM provides a much better security encapsulation, it is not a perfect fit either; in the earliest iteration of working on the tinderbox I did consider using KVM; but even when using hardware support, it was too slow; the reason is that most of the work done by the tinderbox is I/O bound, to the point that even on the bare OS, with a RAID0, it leaves a lot to desire. While virtio-block increases a lot the performance of virtualised systems, it’s still nowhere near the kind of performance you can get out of LXC, and as I said, that’s still slow.

This brings up two possible solutions; one is to use tmpfs to do the build; which is something I actually do on my normal systems but not on the tinderbox, for a number of reasons I’ll be telling in a moment, and the other is the recently-implemented plan9-compatible virtualisation of filesystems (in contrast to block devices upon which you build filesystems), which is supposed to be faster than NFS, as well as less tricky. I haven’t had the chance to try that out yet, but while it sounds interesting as an approach I’m not entirely sure it’s reliable enough for what the tinderbox approach needs.

There is another problem with the KVM approach here, and it relates once again to the infra policies: KVM gets very high performance hits when you use hardened kernels, especially the grsecurity features. This performance hit makes it very difficult to depend on KVM within infra boxes; until somebody finds a way to work around that problem, there are very few chances of getting it working there.

Of course the question would then become “why do you need third-parties to access the tinderbox as root?”; you can much more easily deal with restriction for standard users rather than root, even though you still have to deal with privilege escalation bugs. Well, it’s complicated. While a lot of the obvious bugs are easily dealt with by looking at a build log and/or the emerge --info output, a lot require more information, such as the version of packages installed on the system, or the config.log file generated by autoconf, or a number of other files for other build systems. For packages that include computer-generated source or data files, relying on other software altogether, you also might need those sources to ensure that the generated code corresponds to what is expected.

All of this requires you to access the system first-hand, and by default it also requires you to access the system as root, as the portage build directories are not world-readable. I thought of two ways to avoid the need to accss the system, but neither is especially fun to deal with. The first is gathering the required logs and files and produce a one-stop-log with all that data. It makes the log tremendously long, and complex, and it requires the ebuilds to declare which files to gather out of a build directory to report the bugs. The other way is to simply save up a tarball with the complete build directory, and access that as needed.

I originally tried to implement this second method; storing the data that way is also helpful because you can then easily clean up the build directory, which is a prerequisite to build with tmpfs; unfortunately, I soon discovered that the builds are just too big. I’m not referring only to builds such as OpenOffice, but also a lot of the scientific software, as well as a number of unexpected software, especially when you run tests for it, as often enough tests involve generating output and comparing it to expected results. On the pure performance scale of using tmpfs, I found that this method ended up taking more time than simply building on the harddrives. I’m not sure how it would scale within KVM.

With all these limitations, I hope it is clear why the tinderbox has been up to now a one-man-gateway project, at least for me. I’d sincerely be happy if infra could pick up the project, and manage the running bits of it, but until somebody finds a way to deal with that that doesn’t require a level of magnitude more work than what I’ve been doing up to now, I’ll keep running my own little tinderbox, and hope the others will run one of theirs as well.

On Virtual Entropy

Last week Jaervosz blogged about using the extras provided by the ekeyd package (which contains the EntropyKey drivers) — the good news is that as soon as I have time to take a breath I’ll be committing this egd ebuild into the tree for general availability. But before doing so, I wanted to explain a few more details about the entropy problem, since I’m pretty sure it’s not that easy to follow for most people.

First problem, what the heck is entropy, and why is it useful to have an EntropyKey? Without going in much of the details that escape me as well, for proper cryptography you have a need for a good source of random data; and with random data we mean data that cannot be predicted given a seed. To produce such good random data, kernels like Linux make it possible to gather some basic unpredictable data and condition that to transform it into a source of good random data; that unpredictable data is, basically, entropy.

Now, Linux gathers entropy from a number of sources; these include the changes in seek time on a standard hard disk; on the typing rate of the user at the keyboard, on the mouse movements, and so on. For old-style servers and most desktops, even modern ones, this is quite feasible; on the other hand for systems like embedded routers, headless servers, and so on you start to lack many of these sources: CompactFlash cards and Solid State Disks have mostly-predictable seek time; headless systems don’t have an user typing on them; I don’t remember whether modern Linux uses the network as source of entropy, but even if it did, it would be opinable as a source of entropy since the data is predictable if you’re sniffing it out, not random.

Taking this into consideration, you reach a point where entropy-focused attack become interesting; especially with the modern focus on providing SSL- or TLS-protected protocols, which need good random sources, you can create a denial of service situation simply forcing the software to deplete quickly the entropy reserve of the kernel. When the kernel has not enough entropy, reading from /dev/random is no longer immediate and becomes blocking (do not confuse this with /dev/urandom that is the pseudo-random number generator from the kernel, a totally different beast!). If it takes too much time to fetch the random data, requests will start timing out, and you have a DoS served.

To overcome this problem, we have a number of options: audio_entropyd, video_entropyd, timer_entropyd and the EntropyKey. The first two as the name says gather the data from the audio and video input; they condition the sound and video read from there into a suitable source of entropy; I sincerely admit I was unable to get the first working, and the second requires a video source that is not commonly found on servers and embedded systems (and drains battery power on laptops). On the other hand timer_entropyd does not require hardware directly but it rather uses the information on the timers that various software add to the system, such as timeouts, read callbacks, and so on so forth. Quite often, these are not really predictable so it’s a decent source of entropy. EntropyKey is instead designed to be a hardware device whose only aim is that of providing the kernel with high-quality entropy to use for random-number generation.

Obviously, this is not the first device that is designed to do this; high-end servers and dedicated embedded systems have had for a very long time support for the so-called hardware RNGs: random number generators that are totally separated from the kernel itself, and provide it with a string of random data. What might not be known here is that, just like the EntropyKey, there is need for a daemon that translates the data coming from the device in the form the kernel is going to serve to the users connecting to /dev/random; as it is, I cannot say there is an unified interface to do so (you can note the fact that the rngd init scripts have to go through a number of different device files to find which one applies to the current system.

At any rate, EntropyKey or dedicated RNG hardware tend to hide the problem quite nicely; with full-on Kerberos enabled on my system, I could feel the difference between having and not having the EntropyKey running. But how does this pair itself with the modern virtualisation trend? Not so well as it is, and as Sune said. While the EntropyKey can provide by itself a good random pool for the host system, as it is it doesn’t cover KVM hosts, for that you have to go around one of two different paths (in theory, in practice, you only have one). But why is it that I haven’t worked on this myself before? Well, since LXC shares almost all of the kernel, including the random device and the entropy state, pushing the host’s entropy means pushing the LXC guests’ entropies as well: they are the same pool.

Thankfully, EntropyKey was actually designed keeping in mind these problems; it provides a modified Entropy Gathering Daemon (ekeyd-egd) that allows to send the entropy coming from the EntropyKey to the virtual machines to use to produce their own random numbers. What Jaervosz was sad about was the need to set up and run another daemon on each of the KVM guests; indeed there should have been a different way to solve the problem, since the recent kernels support a device called virtio-rng that, as the name implies, provides a random number generator through the virtio interface that is used to reduce the abstraction between KVM virtual devices and the host’s kernel. Unfortunately, it seems like no current version of QEmu, even patched with KVM support, have a way to define a virtio-rng device, so for now it’s unusable. Further, as I said above, you still have to run rngd to fetch the data from hardware RNG and feed it to the kernel, so we’re still back at setting up and running a new service on all the KVM guests.

At any rate, I hope I’ll be able to commit Sune’s ebuild tomorrow, I’ll also be probably committing a few more things. I also have to say that running the rngtest as he noted in his post on my laptop running timer_entropyd takes definitely too much time to be useful, and actually scares me quite a bit because it would mean it cannot keep up with the EntropyKey at all, but using an external EntropyKey on the laptop is… not exactly handy. I wonder if Simtec is planning on a EntropyKey with SDIO (to use in the SD card readers on most laptops), ExpressCard or PCMCIA interfaces. I’d love to have the EntropyKey effect hidden within the laptop!

I know you missed them: virtualisation ranting goes on!

While I started writing init scripts for qemu I’ve been prodded again by Luca to look at libvirt instead of reinventing the wheel. You probably remember me ranting about the whole libvirt and virt-manager suite quite some time ago as it really wasn’t my cup of tea. But then I gave it another try.

*On a very puny note here, what’s up with the lib- prefix here? libvirt, libguestfs, they don’t look even remotely like libraries to me… sure there is a libvirt library in libvirt, but then shouldn’t the daemon be called simply virtd?*

The first problem I found is that the ebuild still tries to force dnsmasq and iptables on me if I have the network USE flags enabled; turns out that neither is mandatory so I have to ask Doug to either drop them or add another USE flag for them since I’m sure they are pain in the ass for other people beside me. I know quite a bit of people ranted about dnsmasq in particular.

Sidestepped that problem I first tried, again, to use the virt-manager graphical interface to build a new VM interface. My target this time was to try re-installing OpenSUSE, this time, though, using the virtio disk interface.

A word of note about qemu vs. qemu-kvm: at first I was definitely upset by the fact that the two cannot be present on the same system, this is particularly nasty considering the fact that it takes a little longer to get the qemu-kvm code bumped when a new qemu is released. On the other hand, after finding out that, yeah, qemu allows you to use virtio for disk device but no, it doesn’t allow you to boot from them, I decided that upstream is simply going crazy. Reimar maybe you should send your patches directly to qemu-kvm, they would probably be considered I guess.

The result of the wizard was definitely not good; the main problem was that the selection for the already-present hard disk image silently failed; I had to input the LVM path myself, which at the time felt a minor problem (although another strange thing was that it could see just one out of the two volume groups I have in the system); but the result was … definitely not what I was looking for.

First problem was that the selection dialog that I thought was not working was working alright… just on the wrong field, so it replaced the path to the ISO image to use for installing with that of the disk again (which as you might guess does not work that well). The second problem was that even though I set explicitly that I wanted to use a Linux version with support for virtio devices, it didn’t configure it to use virtio at all.

Okay, time to edit the configuration file by hand; I could certainly use virt-manager to replace vinagre to access the VNC connections (over unix path instead of TCP/IP to localhost), so that would be enough to me. Unfortunately the configuration file declares it to be XML; if you know me you know I’m not one of those guys who just go away screaming as soon as XML is involved, even though I dislike it as a configuration file it makes probably quite a bit of sense in this case, I found by myself trying to make the init script above usable that the configuration for qemu is quite complex. The big bad turn down for me is that *it’s not XML, it’s aXML (almost XML)!

With the name aXML I call all those uses of XML that are barely using the syntax but not the features. In this particular case, the whole configuration file, while documented for humans, is lacking an XML declaration as well as any kind of doctype or namespace that would tell a software like, say, nxml, what the heck is it dealing with. And more to the point, I could find no Relax-NG or other kind of schema for the configuration file; with one of those, I could make it possible for Emacs to become a powerful configuration file editor: it would know how to validate the syntax and allow completion of elements. Lacking it, it’s quite a task for the human to look at.

Just to make things harder, the configuration file, which, I understand, has to represent very complex parameters that the qemu command line accepts, is not really simplified at all. For instance, if you configure a disk, you have to choose the type between block and file (which is normal operation even for things like iSCSI); unfortunately to configure the path the device or file is found you don’t simply have a <source>somepath</source> element but you need to provide <source dev="/path" /> or <source file="/path" /> — yes, you have to change the attribute name depending on the type you have chosen! And no, virsh does not help you by telling you that you had an invalid attribute or left one empty; you have to guess by looking at the logs. It doesn’t even tell you that the path to the ISO image you gave is wrong.

But okay, after reconfiguring the XML file so that the path is correct, that network card and disks are to use virtio and all that stuff, as soon as you start you can see a nice -no-kvm in the qemu command line. What’s that? Simple: virt-manager didn’t notice that my qemu is really qemu-kvm. Change the configuration to use kvm and surprise surprise: libvirtd crashes! Okay to be fair it’s qemu that crashes first and libvirtd follows it, but the whole point is that if qemu is the hypervisor, libvirtd should be the supervisor and not crash if the hypervisor it launched doesn’t seem to work.

And it goes even funnier: if I launch as root the same qemu command, it starts up properly, without network but properly. Way to go libvirt; way to go. Sigh.

Virtualisation WTF once again.

To test some more RTSP clients I’ve been working to get more virtual machines available in my system; to do so I first extended the space available in my system by connecting one more half-a-terabyte hard drive (removing the DVD burner from Yamato), and then started again working on a proper init script for KVM/Qemu (as Pavel already asked me before, and provided me with an example).

Speaking about it, if somebody were to send my way an USB or FireWire DVD burner I’d be probably quite happy; while I have other three DVD burners around – iMac, MacBook Pro and Compaq laptop – having one on Yamato from time to time came out useful; not necessary, so wasting a SATA port for it was not really a good idea after all, but still useful.

I started writing a simple script before leaving for my vacation and extended it a bit more yesterday. But in line with the usual virtualisation woes the results aren’t excessively positive:

  • FreeBSD 8 pre-releases no longer seem to kernel panic when run in qemu (the last beta I tried did, the latest rc available does not); on the other hand it does seem to have problems with the default network (it works if started after boot but not at boot); it works fine with e1000;
  • NetBSD still is a desperate case: with qemu (and VDE) no network seem to work; e1000 is not even recognised, while the others end up timing out, silently or not; this is without ACPI enabled, if I do enable ACPI, no network card seems to be detected; with KVM, it freezes, no matter with or without ACPI, during boot up;
  • Pavel already suggested a method using socat and the monitor socket for qemu to shut down the VM cleanly; the shutdown request will cause the qemu or kvm instance to send the ACPI signal (if configured!) and then it would shut down cleanly… the problem is that the method requires socat, which is quite broken (even in the 2-beta branch).

Let me explain what the problem is with socat: its build system tries to identify the size of various POD types that are used by the code; to do so it uses some autoconf trickery, the -Werror switch and relies on pointer comparison to work with two POD types of the same size, even if different. Guess what? That’s no longer the case. A warning sign was already present: the code started failing some time ago when -Wall was added to the flags, so the ebuild strips it. Does that tell you something?

I looked into sanitizing the test; the proper solution would be to use run-test, rather than build-tests, for what I can see; but even if that’s possible, it’s quite intrusive and it breaks cross-compilation. So I went to look why the thing really needed to find the equivalents… and the result is that the code is definitely messy. It’s designed to work on pre-standard systems, and keep compatible with so many different operating systems that fixing the build system up is going to require quite a bit of code hacking as well.

It would be much easier if netcat supported handling of unix local sockets, but no implementation I have used seem to. My solution to this problem is to replace socat with something else; based on a scripting language, such as Perl so that’s as portable, and at the same time less prone to problems like those socat is facing now. I asked a few people to see if they can write up a replacement, hopefully this will bring us a decent replacement so we can kill that.

So if you’re interested in having a vm init script that works with Gentoo without having to deal with stuff like libvirt and so on, then you should probably find a way to coordinate all together and get a socat replacement done.

Linux Containers and the init scripts problem

Since the tinderbox is now running on Linux containers I’m also experimenting with making more use of those. Since containers are, as the name implies, self contained, I can use them in place of chroots for testing stuff that I’d prefer wouldn’t contaminate my main system, for instance I can use them instead of the Python virtualenv to get a system where I can use easy_install to copy in the stuff that is not packaged in Portage as a temporary measure.

But after some playing around I came to the conclusion that we got essentially two problems with init scripts. Two very different problems actually, and one involves more than just Linux Containers, but I’ll just state both here.

The first problem is specific to Linux Containers and relates to one limitation I think I wrote of before; while the guest (tinderbox) cannot see the processes of the host (yamato) the opposite is not true, and indeed the host cannot really distinguish between its processes and the ones from the guest. This isn’t much of a problem, since the start and stop of daemons is usually done through pidfiles that list the started process id, rather than doing a search and destroy over all the processes.

But the “usually” part here is the problem: there are init scripts that use the killall command (which as far as I can tell does not take namespaces into consideration) to identify which process to send signals to. It’s not just a matter of using it to kill processes; most of the times, it seems to be used to send signals to the daemon (like SIGHUP for reloading configuration or stuff like that). This was probably done in response to changes to start-stop-daemon that asked for it not to be used for that task. Fortunately, there is a quick way to fix this: instead of using killall we can almost always use kill and take the PID to send the signal to through the pidfile created either by the daemon itself or by s-s-d.

Hopefully this won’t require especially huge changes, but it brings up the issue of improving the quality assurance over the init scripts we currently ship. I found quite a few that dependent on services that weren’t in the dependencies of the ebuild (either because they are “sample configurations’ or because they lacked some runtime dependencies), a few that had syntax mistakes in them (some due to the new POSIX-correctness introduced by OpenRC, but not all of them), and quite a bit of them which run commands in global scope that slow down the dependencies regeneration. I guess this is something else that we have to decide upon.

The other problem with init script involves KVM and QEmu as well. While RedHat has developed some tools for abstracting virtual machine management, I have my doubts about them as much now as I had some time ago for what concerns both configuration capabilities (they still seem to bring in a lot of unneeded stuff – to me – like dnsmasq), and now code quality as well (the libvirt testsuite is giving me more than a few headaches to be honest).

Luca already proposed some time ago that we could just write a multiplex-capable init script for KVM and QEmu so that we could just configure the virtual machines like we do for the network interfaces, and then use the standard rc system to start and stop them. While it should sound trivial, this is no simple task: starting is easy, but stopping the virtual machine? Do you just shut it down, detaching the virtual power cord? Or do you go stopping the services internal to the VM as you should? And how do you do that, with ACPI signals, with SSH commands?

The same problem applies to Linux containers, but with a twist: trying to run shutdown -h now inside a Linux container seem to rather stop the host, rather than the guest! And there you cannot rely on ACPI signals either.

If somebody has a suggestion, they are very welcome.

Virtualization updates

Seems like that one way or another a common “column” on my blog is reserved to virtualisation issues. I blogged a lot about VirtualBox (before dissing it finally), and then I moved on to KVM and QEmu.

Last time I blogged about it, I was still unable to get NetBSD to detect any network card with KVM, while I had OpenSolaris, FreeBSD and Ubuntu working fine. I also had some problems with Gentoo/FreeBSD and the KVM video emulation. But since then, stuff changed, in particular, QEmu now supports KVM technologies natively (and it’s not yet updated to the latest version). Let’s see if this changed something.

Thanks to aperez I now know how to get NetBSD to identify the network card: disabling ACPI. Unfortunately disabling ACPI with KVM freezes the boot. And I want to use VDE for networking since I already have Yamato configured as a router and file server for the whole network, which seems to fail when using NetBSD with QEmu: while dhcpd receives the requests, the replies never reach NetBSD, and I’m stuck for now. I’m going to try again with the newer QEmu version. Also, out of all the cards I tried in QEmu, the Intel E1000 fails because it cannot find the EEPROM.

The Gentoo/FreeBSD video problem that stopped me from using vim during the configuration phase on the minimal CD does not happen when using QEmu; on the other hand since the SDL output is tremendously slow, I’m using the VNC support; quite nice if it wasn’t that Vinagre does not seem to support VNC over Unix sockets, which would make the whole configuration much nicer, without consuming precious network ports. I have to see if I just missed something, and if I didn’t, I should either request for it to be added, or write the support myself (even better). I guess that the underlying code supports the Unix socket since I expect the virt-manager to use that to communicate with the VM.

Speaking of which, I haven’t looked at virt-manager or anything in quite a while; I should see if they still insist on not giving me the choice of just using VDE for networking instead of dnsmasq and similar; for now the whole configuration is done manually with a series of aliases in my ~/.shrc file, with (manually) sequential MAC addresses hardcoded, as well as VNC ports, LVM volumes (used for the virtual disks, seem to be quite faster than using a file over VFS), and hostnames (in /etc/hosts beside for Ubuntu that has Avahi working).

I have to admit, though, that I have some doubts about the performances of QEmu/KVM versus the usual KVM, at least it’s taking quite a long time to unpack the tarball with the stage3 of Gentoo/FreeBSD 7.1. I hope I/O is the bottleneck here.

Speaking of I/O as bottleneck, I was finally able to get a gigabit switch for the office, the next step is to buy some many metres of cable so I can actually wire up my bedroom with the office, passing through a few other rooms of the house so that I can actually have a fast enough network for all the computers in their standard setup (and use wireless only when strictly needed). Although I do have some doubts about this since I really want to move out.

In the mean time, Enterprise is soon going to be re-used as a backup box, I just need to find an easy way to send a WOL packet, wait for the box to come up, backup everything, and shut down, once a week. I have the last unused 500GB disk on that box so it should be easy. But I’d like to have an mtree of the data that has been backed up, which I’m still unsure on how to get.

Virtualisation woes, again

I know this starts to get old, with my ranting about virtualisation software, but since I’m trying my best to optimise the power of Yamato to software testing, I’m still working on getting virtualised systems to properly work for me.

In a long series of blog posts ranting about VirtualBox, QEmu, KVM and so on, there was exactly one system that worked quite fine up to now: Windows XP (SP3) under VirtualBox. With the latest release, though, this was broken too: network started up then came crashing down, with a striking resemblance to an old Solaris problem .

Since I was in need to have my Windows XP virtual machine working for a job, I tried porting it to Parallels on my iMac, with the Parallels demo (since my license was only valid for 3.x series). After waiting for the 64GB image file to convert, it turns out that there is no hope in getting it to start: the VirtualBox additions drivers crash with a blue screen of death at boot when they are executed outside of a VirtualBox instance; the Windows Recovery console does not allow to remove the drivers from loading, and trying to delete the drivers to avoid them from loading was not an option, since they get installed in the program files directory (that the recovery console cannot access).

At the end, given the absolute unreliability of VirtualBox on every operating system at this point, I simply gave up and paid for the upgrade of my license to Parallels 4, which is now providing as my only Windows XP instance (which I’m still unfortunately tied to for work), and deleted VirtualBox from my system. Why, you’d ask, since networking not working is far from the biggest problem out there? Well the biggest problem, and the final straw that broke the camel’s back, was that while trying to figure out why Samba was not working, VirtualBox’s network filter module crashed the kernel. So what? Well, VirtualBox decided that rather than using the quite well-tested mixed kernel/userland TUN/TAP networking system, or the userland virtual network (with tap to interfacing it with the rest) provided by VDE, they had to provide a kernel module instead. For performance reasons, or, quite most likely, so that they can have the same interface to the network internals between different operating systems. Do I have to make it explicit how this is a problem?

Interestingly, while writing this I noticed that there are problems downloading VirtualBox and the thing also reminded me of how many time they messed up the ebuilds by changing the tarballs…

But it doesn’t stop here. Remember the NetBSD trouble with the networking I reported about one month ago? Well, I wanted to see if something changed with the new NetBSD 5.0 release (I actually wanted to make sure that feng detected the newly-added POSIX Message Queue support properly), but still no luck, I don’t see any network card with whatever model I provide to KVM, included the e1000 that I’d expect NetBSD to support at least.

On the other hand I was at least able to get Ubuntu (9.04) working on KVM, next step is Fedora 11, so I can actually test feng on other distributions as well as Gentoo.

More virtually real troubles

So after fighting with QEmu and surrendering to KVM I finally got a FreeBSD 7.1 vanilla instance, and an OpenSolaris instance running; I made sure that feng builds on both, and since I was there I also fixed up the SCTP autoconf check on both, so that feng can ideally speak SCTP with both of them.

A note here for those interested: SCTP (Stream Control Transmission Protocol) is a protocol, alternative to TCP and UDP, that is designed to work well for streaming applications; the fact that feng supports it is more a proof of concept than an actually useful feature, I’m sincerely not sure how well it works nowadays, but since I had to fight to get it to build correctly on Linux already, I just wanted to fix it up for FreeBSD and Solaris implementations as well; I assumed that Apple had its own implementation as well but even though there are APPLE defines in the FreeBSD implementations, at least OS X 10.5 lacks any SCTP support that I can see.

I already have reserved a logical volume for Gentoo/FreeBSD 7.1 which I’m hopefully going to test today, but in the mean time I wanted to fix up NetBSD too, since I have seen that it also has an SCTP stack, and since none of the three we support now is identical to the other it seemed worth looking into it; unfortunately NetBSD is proving to have no network to offer me. While I set up the KVM instance just like any other, no matter which model I use I can see no device in ifconfig -a output of NetBSD; I have chosen the full installation, but still it doesn’t seem to have much. The documentation also doesn’t seem to help.

I guess NetBSD will keep waiting in line for now, unless somebody has a suggestion on how to deal with it.

Miracle on the nth try: OpenSolaris on KVM

So after my previous post about virtualisation software I decided to spend some extra time on trying out KVM, manually. Having to manually set the macaddress every time is a bit obnoxious but thanks to alias I can do that at least somewhat fine.

KVM is also tremendously faster compared with QEmu 0.10 using kqemu; I’m curious to see how the thing will change with the new 2.6.29 kernel where QEmu will be able to use the KVM device itself. At any rate, the speed of FreeBSD in the KVM virtual system is almost native and worked quite nicely. It also doesn’t hog the CPU when it’s idling, which is quite fine too.

As I’ve written, though, OpenSolaris also refused to start; after thinking a bit around, I thought about the amount of memory and… that was it. With the default 128MB of RAM provided by KVM and QEmu, OpenSolaris cannot even start the text-mode installation. Giving it 1 GB of memory actually made it work. Fun.

As Pavel points out in the previous post, though, the default QEmu network card will blatantly fail to work with OpenSolaris; Jürgen is right when he says that OpenSolaris is quite picky with its hardware. At any rate the default network card for KVM (RTL8169) seems to work just fine. And networking is not lagged like it is on VirtualBox, at all.

I’ve now been working on getting Gentoo Prefix on it already, and then I’ll probably resume my work on getting FFmpeg to build, since I need that to work on lscube . For now, though, it’s more a matter to have it installed.

Later this week I’ll probably also make use of its availability to work on Ruby-Elf more and in particular on the two scripts I want to write to help identify ABI changes and symbol collisions inside a given executable, that I promised in the other previous post .