Ruby-Elf and Sun extensions

I’ve written in my post about OpenSolaris that I’m interested in extending Ruby-Elf to parse and access Sun-specific extensions, that is the .SUNW_* sections of ELF files produced under OpenSolaris. Up to now I only knew the format, and not even that properly, of the .SUNW_cap section, that contains hardware and software capabilities for an object file or an executable, but I wasn’t sure how to interpret that.

Thanks to Roman, who sent me the link to the Sun Linker and Libraries Guide (I did know about it but I lost the link to it quite a long time ago and then I forgot it existed), now I know some more things about Sun-specific sections, and I’ve already started implementing support for those in Ruby-Elf (unfortunately I’m still looking for a way to properly test for them, in particular I’m not yet sure how I can check for the various hardware-specific extensions — also I have no idea how to test the Sparc-specific data since my Ultra5 runs FreeBSD, not Solaris). Right at the moment I write this, Ruby-Elf can properly parse the capabilities section with its flags, and report them back. Hopefully, with no mistakes, since only basic support is in the regression test for now.

One thing I really want to implement in Ruby-Elf is versioning support, with the same API I’m currently using for GNU-style symbol versioning. This way it’ll be possible for ruby-elf based tools to access both GNU and Sun versioning information as it was a single thing. Too bad I haven’t looked up yet how to generate ELF files with Sun-style versioning support. Oh well, it’ll be one more thing I’ll have to learn. Together with a way to set visibility with Sun Studio, to test the extended visibility support they have in their ELF extended format.

In general, I think that my decision of going with Ruby for this is very positive, mostly because it makes it much easier to support new stuff by just writing an extra class and hook it up, without needing “major surgery” every time. It’s easy and quick to implement new stuff and new functions, even if the tools will require more time and more power to access the data (but with the recent changes I did to properly support OS-specific sections, I think Ruby-Elf is now much faster than it was before, and uses much less memory, as only the sections actually used are loaded). Maybe one day once I can consider this good enough I’ll try to port it to some compiled language, using the Ruby version as a flow scheme, but I don’t think it’s worth the hassle.

Anyway, if you’re interested in improving Ruby-Elf and would like to see it improve even further, so that it can report further optimisations and similar things (like for instance something I planned from the start: telling which shared objects for which there’s a NEEDED line are useless, without having to load the file trough ld.so to use the LD_* variables), I can ask you one thing and one thing only: a copy of Linkers and Loaders that I can consult. I tried preparing a copy out of the original freely available HTML files for the Reader but it was quite nasty to see, nastier than O’Reilly freely-available eBooks (which are bad already). It’s in my wishlist if you want.

My fiddling with OpenJDK

It might sound strange coming from someone who wrote a post titled The Java t..crap but I’m liking playing with OpenJDK.

Not like my Java skills (pretty much non-existant, I used it only a few times in my life) are improved, but I find myself comfortable when working with most build systems, after you get scons out of the way (note that I didn’t say I like it, but I can do it just fine), and I do have experience with adding support for external libraries, after all, and that is something that OpenJDK might need.

So, let’s start from the start. The first problem was getting to build OpenJDK on Linux. If you try to build it on AMD64 you’ll see it fails, the reason for this is that the list of files to compile on hotspot is too long for the maximum size of arguments available on Linux; this in turn is caused by the long default path for sources that Portage sets. Petteri was able to build it fine because on x86 the architecture name is i486, while on AMD64 is, well, amd64: the extra character there is enough to make it burn in flames. Kelly O’Hair from Sun is looking after the issue now, but he says it might take a few weeks to get the results out; not a problem, I suppose, in the mean time you might want to try changing PORTAGE_TMPDIR to use /tmp for openjdk (if you have space in the /tmp partition) or you might want to fool the ebuild by renaming the openjdk directory inside $WORKDIR to something shorter, like “ojdk”; actually, a single character less will work just fine.

After getting a working build of OpenJDK, and trying so that it worked fine on Konqueror (it does!), I had to start messing with it, or I wouldn’t have been myself ;) The first thing was to get it to build with my flags correctly; I was able to get something that worked, but there are a few issues, and I’d like to be able to provide a less invasive patch. O’Hair also got me on the right track now, by telling that hotspot and j2se build systems are different, which might call for just using the same method on boths, by ignoring the default C (XX)FLAGS values coming from environment, and instead use the _OTHER variables. This should allow for just four lines changed, if I find where to put the CFLAGS/CXXFLAGS unset.

After this I started thinking of an useful hack to do to at least understand how the build system works; I could have looked at fixing a few warnings, but I thought it would have been more useful to actually start by looking at the build system, that is what I will need to understand to be able to get OpenJDK building on FreeBSD. I’ve seen zlib-1.1.3 sources in the tree, so I decided to find a way to build against external zlib. Easy, quick, interesting and useful.

Why useful? Well, first of all, it would use a more stable version of zlib, and would share its code with the rest of the system (almost every process has a zlib copy loaded one way or another), second of all that would make it non-vulnerable by possible attacks on the zlib code. Unfortunately adding the support for external zlib was not as easy as I thought, but it was mostly because I stupidly forgot FEATURES=keepwork on the environment (I wanted to get the object files output, as I want to see if the executable stack problems that affects Java are due to the architecture, or just a problem of not having the GNU stack markings on the source files).

And while looking at fixing this, and seeing a tempnam() warning at linking stage, I ended up finding a tiny memory leak, as the string coming from tempnam() is never freed. I’ve prepared a patch and sent it, hopefully to the right mailing list.

Other hacks that might be useful would be finding a way to enable or disable ALSA and Motif bindings building, to reduce the size of the final output, and to avoid depending on those to build openjdk, but I wonder if Sun would ever accept such changes.

When I’ll be comfortable enough with both build system and development environment, I’ll see to start working on building OpenJDK on FreeBSD; for what I can read on the build system mailing list, it should be possible, for now, to build OpenJDK using a JDK 1.5, which is available for Gentoo/FreeBSD, even if the best thing would be using 1.6. Probably I’ll have to provide a openjdk-bin package for Gentoo/FreeBSD to use as a seed, afterward.. this of course if the license allows to do this (most of OpenJDK is GPL-2, but there are some binary blobs still present, until they are replaced, it might not be possible to redistribute the binary itself).

Unfortunately a big problem I have to face while hacking at this is that I don’t have any Solaris/OpenSolaris installation, and I suppose that Sun will accept more willingly contributions that don’t break their main target platform :)

Now, of course, the reasons I have to find this project interesting and to revert my attitude toward Java are not limited to the simple GPL-2 licensing. What I find more attractive, and makes me hope I’m not mistaken on my judgement, is that now it would be possible to support more hardware platforms, making Java a true platform agnostic language (okay, Kaffe help a bit with this, but the results are not really at the same level), and will also allow to support more operating systems. And with more users’ contributions, it will probably also get better optimisations (hey we got guys like FFmpeg devs who are able to optimise the hell out of multimedia codecs, someone might find useful things to improve on OpenJDK too!).

For what concerns Gentoo/FreeBSD, for sure FreeBSD guys have already some build definitions to support FreeBSD; the problem of those are that are likely not GPL-2 as it is, as they were designed for previous JDK versions, so I’d rather not even look at them to avoid legal issues. The assembler code would likely be the same between Linux and FreeBSD, with the same architecture; for SPARC, the code might be just taken out of the Solaris assembler sources, at least on i486, the assembler between Linux and Solaris doesn’t change besides from a few minor places, and the main issue there is the difference in the assembler program used, and thus in the syntax.

Let’s try to make Java a truly Free platform, and allow it to run on as many system as possible, then it might replace Flash and beat the heck out of Silverlight; Sun seems to be in the right mind set… so waste no time!

OpenJDK and Gentoo/FreeBSD

So, Betelgeuse blogged about the new partly-free JDK released by Sun Microsystems. Kudos to them for actually starting to fix the Java Trap. Hopefully in the next months we’ll finally see Java taking the place it was designed for in development of platform-agnostic software, and applets in web pages, with the (hoped) availability of it on more platform that are available right now.

Of course I want to get OpenJDK working on Gentoo/FreeBSD as soon as possible, as the FreeBSD Foundation’s “Diablo” project seems to be stale and broken (latest version is 1.5.0-7, I’m not even sure if it’s vulnerable, as I remember some security advisories for sun-jdk for Linux), and I wanted to move to Tomcat for my blog. So I asked Betelgeuse a bit of information about openjdk and downloaded the Java Experimental overlay, and decided to merge openjdk on Linux first.

To get it to work on Gentoo/FreeBSD the first thing is to get the Linux emulation support at least partly working, as a JDK 1.7 is needed as a seed to build OpenJDK (this reminds me a lot of GHC unfortunately). So I’ll work on it until I can get OpenJDK building here.

Unfortunately I can’t build openjdk yet, the build fails with an “Argument list too long” error in an execve() call. I’m not sure what causes it, but I hope to investigate it further in the near future (maybe I should ask Betelgeuse tomorrow after his exam).

Another problem I’ve found with OpenJDK is that the build of hotspot mixes the values of CFLAGS and CXXFLAGS, throwing a lot of warnings about -Wno-pointer-sign and -Wno-format-zero-length that I have in CFLAGS and are not supported by GNU C++ building. I’ve tried fixing this one but I was unable to find the perfect patch to fix it, so I reported the bug with the tentative patch to Sun, hopefully they know their build system better than I do :)

When I’m able to get OpenJDK building on Linux, and I understand the process well enough to work on it, then I’ll try to do my best so that OpenJDK works on Gentoo/FreeBSD at the best I can. I’m sure there are people from FreeBSD project working on it too, they did this already to get Diablo out, after all, but most of them don’t blog so you can’t really know what they are up to (beside if they are the same who would be releasing the new Diablos, they are probably missing in action for months now).

Who knows, maybe we’ll be able to get OpenJDK working on Gentoo/FreeBSD/SPARC64 too, that would be neat :) After all, FreeBSD and Solaris aren’t that much different (as the DTrace and the ZFS ports shows) and SPARC64 is supported by Sun for sure ;)

The ZFS tale

Wonder what? I cannot sleep tonight either. I think my biological clock is not totally broken, and I cannot get to sleep at all. The nasty thing is that I’m dead tired, maybe too tired to sleep, and that is driving me crazy because I want to sleep, today like I didn’t in the past months. And the bad thing is that I’m supposed to go to the mall later on today with my brother in law, as I need to pay the last month of the old iBook, and look up a decent bag for the new one. You close one circle to open another sometimes.

Of the long TODO list I had yesterday, I wasn’t really able to make much; the hope to watch a movie while lying on the couch vanished easily, as I’ve started taking care of a few things. For once, I wanted to be able to compile and run pan on Gentoo/FreeBSD, for the pure sport of fixing things while I’m at it (actually, gmime was fixed already by upstream, so I didn’t have to patch it at all at the end, thanks to Ticho who simply bumped it). I didn’t have time to look at nss-mdns, and I now seen that there’s a new release too (I hope that 0.9 won’t break my patch too badly), but I had another entry on the TODO list that was interesting enough to prioritise over the rest for a day at least.

[Now of course, even tonight, like last night, and the night before, and three nights ago, my cat had to come in my room to see if I was awake, until I brought her down, and gave her an old shirt of mine – that I was already going to give her as it was kinda ruined now – which lowered my hopes to be able to sleep soonish… at least I drank two glasses of white milk, it might make me sleep better]

So what is the interesting thing that fascinated me away from finishing the port of nss-mdns? It had to be something cool, or I would have preferred helping FreeBSD devs for sure ..

Well, Astinus pointed me to a patch to FreeBSD -CURRENT to support ZFS natively; considering that FreeBSD does not have license problems with using cddl code in the kernel, it was interesting for them to integrate. I looked at the patch and the main issue was to break it down into a kernel patch and an userland package, so I started working on that, with Astinus who’s gonna be the test moneky for it :)

The results aren’t totally ready yet, as I was able to get the kernel patched, but not the userland to compile; I haven’t tried compiling the kernel y et, neither. The main issue is that the userland not only requires the patched kernel, but seems to rely on some compatibility support that was probably added to 7-CURRENT, so I have to work it around to get it working on 6.2, but it seems to come out nicely up to now.

For what regards Klothos, still missing a way to parse and interpret the coredump, I’ve decided to get a try to make catalyst build a new stage of that too, with the libedit problem fixed. The main problem that made me gave it up before was that it had to load the nullfs module, that caused the Kernel to panic, as I told already; by building nullfs inside the kernel, I can make catalyst work for the time being. Unfortunately when I started looking to start the catalyst build there, I got in a fight with my soundcard, that required me to shut Enterprise off (which shared the Portage tree I was going to snapshot at the time).

Now there are two things I want to talk with in this blog entry, before the end. The first is the soundcard fight and the second is a problem with the Gentoo/FreeBSD 6.2_rc2 stages that Astinus pointed me at.

Starting with the second, that is more interesting to users, there seems to be a problem with the Portage version shipped with the 6.2_rc2 stages I released a couple of days ago; that version of portage is unable to create $DISTDIR if it’s missing, so fetching packages will fail until you create the directory yourself and give it to portage user and group; to fix this issue, I now have Prakesh doing a new catalyst run for a stage3 only (stage1 and stage2 are byproducts of the process to create the stage3, I publish them for completeness as releng does so, but I won’t support them anyway), so that it can be refreshed on mirrors and I can tell Jeffrey to remove 6.2_beta3 too to avoid wasting space :)

And if you didn’t already notice, nightmorph updated the Gentoo/FreeBSD Installation Guide that now covers the 6.2 stages (built with Catalyst and with baselayout 1.13), and the problems with the partition being created as ad0s1d rather than ad0s1a. Thanks goes to him who updated it in a matter of hours from my pretty long and boring mail.

For what concerns my fight with the soundcard, I have to say I came to hate not only ALSA but also the via82xx driver; the former because the new kernel release is, surprise, breaking alsa-driver again, so unless upstream this time is nice enough to give me an rc2 before the new kernel is released, I’ll have to resort to a snapshot once again; the latter because I wasn’t able to get it to work on 6 channels in any way.

Let me try to explain: John Myers commented on my previous post giving me a command line that actually works to extract the audio tracks of a DVD with mplayer (<tt>mplayer dvd://[<title>] [-chapter <start>[-<end>]] -vo null -vc null -ao pcm:fast:[file=<filename>] -channels 6</tt>), thanks John a lot, it works as a charm!

The extraction works, WavPack encoding works (besides, thanks Timothy too, for pointing me at the ability of wavpack command to set the metadata on ripping, now I can rip my next CDs without having to tag them later with Amarok!), but I cannot play the files. Audacious doesn’t support multichannel WavPacks, FFmpeg (at least the version in portage) does not demux them correctly, and also mplayer fails to read them (because it seems to use FFmpeg’s demuxer for them); the best result I got with xine, that plays them although very garbled, I suppose I can be proud of this :)

While trying to find what was causing the problem, so that I could try to see if newer FFmpeg works, or if I can fix it myself, I tried to see if the original multichannel WAV file was playing correctly… but aplay was able only to give it to me in 2 channels. I tried every configuration and every choice, but my via82xx just plainly refuses to play more than two channels unless I use IEC958 passthrough and an AC3 or DTS track. The speaker-test does not work (it used to last year, at least I think I remember it did), it does not even do more than one channel when I point it to hw:0,0, while surround51 has the appearance to work, while it only outputs on left and right channel, as usual. I start to hate that card, not only it does not work as intended now, but it also requires me to shut both the computer and the external amplifier down to reset if I crash something down (in my last case, the a52 ALSA plugin), so that not even iecset can reset. I’d really like a decent soundcard, but I’m not even sure what I should look for; tsunam and nightmorph suggested Audigy2 ZS but I can’t even find them on ebay at a first glance (let alone finding them in a shop), and the newer Creative cards are not supported at all. Suggestions welcome.

Dumping down

So, first of all, a thanks to 8an that commented on my previous post suggesting me to limit the dimension of the memory to get a dump that is small enough to enter my current swap partition without need to buy an extra CF card. This solution didn’t pass in my mind before, so I used that one to start testing :)

Now, as Roy seems to have taken Daniel’s (dsd) place as drunken brit, I’ve looked up how to hook savecore(8) and dumpon(8) admin utilities into localmount, and prepared a bug report that contains the patch and a configuration file. I tested it out and it works nicely, the good part is that you can easily enable dump on a per-boot basis rather than having it always enabled by using dumpon directly, it will still save your core file if found.

So for the baselayout part I’m gold, the problem does not arise at all. More difficult is to get kgdb to work. As I supposed, there’s no way on earth that Mike will install libgdb for me (as upstream don’t seem to like that approach anyway, if I read correctly the documentation), so either I build a package that takes the FreeBSD sources and GDB itself, and builds a copy of GDB just to build kgdb(1),, which is not practical, not counting difficult to maintain over a long period (for instance, which versioning scheme should I use? GDB’s? FreeBSD’s?).

The only other thing that I’m left with is to fork kgdb, and try to make it a frontend to gdb itself, not by using the library calls, but by using the commandline interface of gdb, and commanding it from outside.

It might as well work, although I’ll have to talk with someone from FreeBSD as I doubt I would be able to keep it in sync alone. I see that obrien committed to at least two files in the 6.2_rc2 release, and he’s a nice guy, so I might have some hope for that :)

So I have to add “Make kgdb work as GDB master” to my TODO list, although I hope to find the cause of the misalignment before that time.

Debugging debuggable

Now that Prakesh was able to complete the build of the three stages for Gentoo/FreeBSD 6.2_rc2, and they are available on mirrors, I have a few things to take care of in Gentoo/FreeBSD that I overlook for too long time.

The first is for sure updating the documentation, so that new suers can install the 6.2 stages fine, without all the workarounds we used to have for 6.1 (because it wasn’t built with catalyst); done that, I have to deprecate 6.1 in favour of 6.2, as that version is pretty much where we’re focusing right now, with the libedit fixes and the new baselayout 1.13 (that Roy made perfect on FreeBSD!); and then there’s to fix the modules loading problem with SPARC64.

So, let’s start with the first step, I’ve asked jforman to remove the 6.1 stage from the mirror, so that there won’t be new installation of it. Later on I’ll see to write a deprecated file for 6.1 profile, with some short instructions to upgrade to 6.2 somewhat smoothly.

Instead, for what concerns SPARC64, Klothos is currently helping me understanding the issue. My first task was to get on it an editor I could actually use, which meant, for me, emacs. Unfortunately, not counting the issue with gcc’s CSU object files being in a different place than standard FreeBSD (that I already worked around with the ebuild in the transition overlay), there was a nasty SIGILL while building some elisp code, and I never got around debugging it. After all it was easier than i expected: the problem was called by an inline asm() call, that called the instruction ta 3, that after a bit of googling turned up being a trap call (kinda like software interrupts in x86) that triggered some Kernel service to flush registers, which is not implemented for FreeBSD (for instance Emacs.app disable this too for GNUstep on FreeBSD operating system). An easy patch to make the call conditional solved the issue for me.

So I first wanted to confirm one thing, whether the problem was while building the modules or while building the kernel: if the problem was the kernel, even trying to load a module compiled by vanilla FreeBSD should cause the same panic, while if the problem was in the building of the modules, the module would have loaded without issues. I checked, and the problem happens only with our modules, even when loaded in an official kernel, which mean it’s safe to assume that the problem is building modules rather than the kernel. Which is both good and bad, because even if it limits my scope and my need to debug the kernel, it’s not like I have so much knowledge of the ELF loading to find the issue easily. I was tempted to buy Sun’s “Linker and Libraries Guide”, but not only the book is far from cheap ($49 at least), it’s not even found in Amazon (UK)’s availability.

Anyway, a quick comparison of the zlib.ko module from FreeBSD proper and Gentoo/FreeBSD shown me that the size of our own is about twice the original one (but I think it might be caused by the -ggdb3 build), and that there are more SPARC64_RELATIVE relocations, while there are no R_SPARC_32 at all in our copy.

I was looking forward for a more throughout debug tonight, but I was stopped by two incidents that are going to make my life in the next weeks harder than I expected. The first is that we don’t currently build the kernel debugger (kgdb), and we cannot easily build it (because it requires libgdb, that we currently don’t install… and I doubt I will be able to convince vapier to install it).

The second is that to get a coredump of the crash, we need to use the kernel’s dump facilities, that requires a swap partition, of at least the size of the RAM in the machine (and I don’t have one on Klothos, as it was originally built with only 128MB of memory, while now it has 1GB), and the run of some commands during boot phase, specifically savedump between the R/W mount of partitions (to save the dump) and the enabling of swap space (because that would destroy the dump), and dumpon after the swap is loaded. For the way baselayout works now, I need to change the localmount init script, but as I don’t like that solution, I’ll have to talk about this with Roy; the important thing to me is being able to enable/disable dump through conf.d files (similarly to what’s done in FreeBSD); I suppose a solution could be to use some addons and install them with one of the freebsd ebuilds, or with baselayout proper, depending on how Roy prefer).

Now, it’s not like the baselayout issue is not easily solvable, once Roy is around (he’s partying for the new year now, I suppose); but the swap size is what is going to stop me from using this feature. My only solution would be to add another compact flash card (the adapter I’m using is capable of connecting two cards already, one master and the other slave, which is kinda good for what I paid it), but it has to be at least 2GB (the ram is only 1GiB, of course, but I don’t want to start crying when I get hit by the GiB > GB thing, as I’m not sure if the CF cards are sold by the decimal GB or by the binary GB). I once again compared the prices here with the Germany’s one, and it seems I would pay 34+20 euros from there, or 89 here.. I don’t think I’ll go buying one just yet, not a big deal to buy, but I want to do some more tries without spending more money on that box, considering that I already loaded it with new (or newish for the SATA controller and disk) stuff that did cost me at least €100, box included, and it was just to debug a kernel problem…

One of the things I found difficult to grasp about SPARC asm, anyway, beside not finding a decent reference manual of it (call me crazy, I usually understand better a language by looking at its reference rather than to explanations and tutorials), is that load and store instructions seems to be written in “orig, dest” format rather than the usual “dest, orig” that I was used to under x86.. but it’s not that difficult to understand after all, most of the instructions are named after logical operations, and the ld/ldx and st/stx instructions make also easier to understand when the register is destination or origin, would have been nice to learn SPARC assembler at school rather than 8086.

I love the European Union

Sometimes something good happens even here on the other side of the pond… but let’s go with order.

First of all, I want to thank Christian Iuga, who sent me 1GB of RAM for Klothos (received them yesterday), that now compiles quite faster (which is very good for my debugging sessions, or every time I had to wait eternity :)), so last night I returned working on ~sparc-fbsd (also because there’s a new FreeBSD release ready, but I’ll talk about that later), but now the bottleneck, instead of the RAM, seemed to be the network… okay, hme0 is known to be the worst network driver for FreeBSD, and it ended up giving me NFS performance comparable to a 10Mbit network.. not that good when you have the portage tree over network :/

Unfortunately, the only PCI network cards I have at home are Realtek-based, 8139 chipset, that Ciaran told me likely not supported by SPARC, and indeed I simply get a “Data Access Error” on the serial line trying to boot with one plugged in. So I had to find some better supported card… e100 was the suggestion, but a quick skim over my usual retailers, both in shops (through the sites) and via Internet, told me that none carried E100 cards; the only Intel cards I could find were the Gigabit ones, that cost about €50, which is not exactly cheap.. but okay, maybe I can slowly start updating the local network to Gigabit, now that both Enterprise and Intrepid have Gigabit-capable cards, so I can try one of those… but even that card is difficult to find on my retailers…. okay, so hold on for now.

But also, two nights ago I had some trouble with one of the fans of Enterprise, that started doing a really bad noise, and my mother forced me to turn it off during the night (sigh a batch-compile cancelled), and last afternoon I spent trying to find which fans was.. after an half-working suggestion from Jakub and Javier (to try stopping the power supply fan to see if that was it, but I did it wrong and stopped the CPU fan.. that refused to restart till I powercycled the system), I found the dying fan, the rear-case one.. unfortunately trying to stop it, I also broken it definitely, so I simply removed it, luckily there seems to be no risk for my CPU for now, the temperature goes between 34°C while playing music to 47°C while compiling, although KingTaco suggested me to find a new CPU cooler.

And again, finding a decent one from my usual retailers was difficult… the best I could find, the Thermaltake Silent 939, would cost me €29 plus VAT (20%) and €11 for shipping; which is not really acceptable to me…

Introducing the European Union and the single market. Some time ago, someone (luckyduck maybe, it was just when I did join Gentoo), gave me the site of a German shop that ships to the rest of Europe too. I decide to give it a shot, although I used it before to compare prices, I never tried to order from it before.

The network card is at €34.49, the CPU cooler is at €24.19, both are quite a bit cheaper than in Italy. The shipping cost is €20 though, which removes most of the saving, and I have to count it will probably take a week or two to get the stuff here, I suppose.

But then I get the great idea.. I have a laser printer at home, a Kyocera-Mita FS-1020D, pretty cool of a printer, the toner kits are quite cheap too, €100 in shop, €90+€6 of shipping from an online shop.. how much would they cost on the German shop? €67.38 .. which, even if it was the only thing I ordered, added the shipping costs, is lower than both. I ordered one of that, even if I still have probably half the toner in the current cartridge, because it’s something that won’t go wasted anyway, and that alone makes the deal affordable and a good saving for me.

So at the end, thanks also to zzam who translated a few phrases for me, and pointed me at where to look for the info, I was able to order both the network card and the CPU cooler at a good price, and I’m pretty much happy about it, as I won’t burn down the CPU of Enterprise, and I’ll be finally able not to have to wait for eons to download portage on Klothos ;)

For once, I want to thank the European Union and the existence of Euro :)

Now, on a more technical level, FreeBSD 6.2_rc2 was released yesterday, thanks again to AMD64 team, I downloaded and repacked the sources from pitr, and they are already on the mirrors; even the ebuilds should have hit the RSync mirrors by now. This time, dev-libs/libedit is being used, which means that while upgrading you need to symlink libedit.so.5 to libedit.so or it will fail to run /bin/sh (I know it’s annoying, I’m working on new stage for this reason).. for who’s following emerge upgrade order, which will miss libedit.so.5 before libedit.so is merged, you can take my libedit.so and use that in the mean time.

Now, while working on Klothos last night, I also found how to tell FreeBSD kernel to boot from a different partition than the default one (ad0a in the case of SPARC64 hardware). You need to edit loader.conf and set this:

vfs.root.mountfrom="ufs:/dev/ad1a"

The result is that I can now boot Klothos unattended, and not have to retype the string every damn time I reboot (which happens pretty much every time if I’m debugging the Kernel).

Debugging

Tonight I couldn’t sleep. What do I do when I cannot sleep? I debug!

As yesterday libvorbis was enough of an headache for me (ended up that the parameters are being reset by libvorbis itself because the third header has an error in parsing… now to find where the error is, that’s a good question), I decided to go with something easier, like Gentoo/FreeBSD/SPARC64 kernel debugging. No I’m not kidding, debugging the problem in the kernel is resulting easier and funnier than debugging an userland library… to decode audio files… that’s parsing a damn header!

Anyway, thanks to Javier (I won’t mistype his nick this time), I got into FreeBSD’s kernel debugging by building a kernel with -g and DDB support. Then, I easily got the trace of the kernel panic:

Tracing pid 1258 tid 100054 td 0xfffff80007aa7c80
panic() at panic+0xcc
trap() at trap+0x38c
-- fast data access mmu miss tar=0xc0b70000 %o7=0xc01ae8ec --
malloc_type_zone_allocated() at malloc_type_zone_allocated+0x14
malloc() at malloc+0x7c
hashinit() at hashinit+0x4c
nullfs_init() at nullfs_init+0x1c
vfs_modevent() at vfs_modevent+0x244
module_register_init() at module_register_init+0x58
linker_load_module() at linker_load_module+0x844
kldload() at kldload+0xf4
syscall() at syscall+0x334
-- syscall (304, FreeBSD ELF64, kldload) %o7=0x100a1c --
userland() at 0x40421288
user trace: trap %o7=0x100a1c
pc 0x40421288, sp 0x7fdffffde31
pc 0x10080c, sp 0x7fdffffdef1
pc 0x4020ab74, sp 0x7fdffffdfb1

this was tracing a kldload of nullfs, but any kldload produces errors, although they seem to be different from module to module.

db> x/ia malloc_type_zone_allocated,16
malloc_type_zone_allocated:     save            %sp, -0xc0, %sp
malloc_type_zone_allocated+0x4: call            critical_enter
malloc_type_zone_allocated+0x8: nop
malloc_type_zone_allocated+0xc: lduw            [%g7 + 0x3c], %g1
malloc_type_zone_allocated+0x10:        sllx            %g1, 6, %g3
------
malloc_type_zone_allocated+0x14:        ldx             [%i0 + 0x40], %g2
------
malloc_type_zone_allocated+0x18:        brz,pt          %i1, malloc_type_zone_allocated+0x38
malloc_type_zone_allocated+0x1c:        add             %g3, %g2, %g4
malloc_type_zone_allocated+0x20:        ldx             [%g3 + %g2], %g1
malloc_type_zone_allocated+0x24:        add             %g1, %i1, %g1
malloc_type_zone_allocated+0x28:        stx             %g1, [%g3 + %g2]
malloc_type_zone_allocated+0x2c:        ldx             [%g4 + 0x10], %g1
malloc_type_zone_allocated+0x30:        add             %g1, 0x1, %g1
malloc_type_zone_allocated+0x34:        stx             %g1, [%g4 + 0x10]
malloc_type_zone_allocated+0x38:        subcc           %i2, -0x1, %g0
malloc_type_zone_allocated+0x3c:        be,pn           malloc_type_zone_allocated+0x58
malloc_type_zone_allocated+0x40:        or              %g0, 0x1, %g2
malloc_type_zone_allocated+0x44:        sll             %g2, %i2, %g2
malloc_type_zone_allocated+0x48:        sra             %g2, 0x0, %g2
malloc_type_zone_allocated+0x4c:        ldx             [%g4 + 0x20], %g1
malloc_type_zone_allocated+0x50:        or              %g1, %g2, %g1
malloc_type_zone_allocated+0x54:        stx             %g1, [%g4 + 0x20]
malloc_type_zone_allocated+0x58:

This is the disassembly of the function that died, I’ve artificially separated the point where the crash happens from the rest of the code.
First of all, the thing that scared me was that even if I know nothing of SPARC assembler, and even my Intel assembler is pretty much limited to 8086 instructions (although I still remember most of them clearly, as I wrote an 8086 emulator when I was in high school), I was able to correlate more or less that code with

        struct malloc_type_internal *mtip;
        struct malloc_type_stats *mtsp;

        critical_enter();
        mtip = mtp->ks_handle;
        mtsp = &mtip->mti_stats[curcpu];
        if (size > 0) {
                mtsp->mts_memalloced += size;
                mtsp->mts_numallocs++;
        }
        if (zindx != -1)
                mtsp->mts_size |= 1 << zindx;
        critical_exit();

that is the code of the function in C source.

Now my problem is to find what causes the “fast mmu miss”, or in general the panic. The registers are funny:

db> show reg
g0          0xffffffffffffffff
g1          0xc04ce800  log_cdevsw+0x48
g2          0xffffffffffffffff
g3             0x870ad
g4          0xfffff8000040fff8
g5              0x1dfd  fpu_fault_size+0x1c49
g6          0xcb591980
g7          0xc054d7b0  pcpu0+0x1a90
i0                0x12  pcpup+0xb
i1          0xc04816e0
i2          0xcb590ab8
i3                 0xa  pcpup+0x3
i4          0xcb590b70
i5                 0x1
i6          0xcb590221
i7          0xc01d7bac  kdb_enter+0x34
tnpc        0xc01d7bb8  kdb_enter+0x40
tpc         0xc01d7bb4  kdb_enter+0x3c
tstate      0x441d001601
kdb_enter+0x3c: ta              %xcc, 1

as i0 is set to a quite low value (0x12) and the debugger tells me it’s referred to pcpup (“pcpu pointer”) address plus a value… the problem is that pcpup is .. uh.. loaded in g7:

#define PCPU_REG %g7

register struct pcpu *pcpup __asm__(__XSTRING(PCPU_REG));

I wonder if it’s a miscompile or simply ddb going crazy; still if I understood the ldx operator well enough, it’s trying to load data from g2 (that’s –1) into the address 0x42 … it does not feel too much right.

Anyway, will try to debug this further when I can find someone in Gentoo/SPARC team who can help me understanding SPARC assembler.

Booting Gentoo/FreeBSD/SPARC64

First of all, a service note regarding my previous ALSA post. Seems like I was lucky, and the ALSA code in the current repository is good enough to at least build and work on 2.6.18 and .19, so now there are alsa-driver-1.0.14_pre20061130 in the tree that will work until upstream releases a new version.

Tonfa, sorry if the words about mercurial were a bit too harsh than they needed to be, I was pretty much pissed off by it giving up on me when I needed it, although I’m still not sure why it continue crashing, it’s a memory corruption problem, so the backtrace won’t be useful, I’ll try to build it with minimal CFLAGS and see if perhaps it’s that creating the problem.

Now, on a more Gentoo/FreeBSD related note, today I received two packages by mail, the first was an Amazon package (only reported as “From an happy user”, that I thank even if I have no idea who he is :) ) containing Rhapsody’s Live in Canada 2005 that I’m listening to right now, and Gibson’s Count Zero that I’ll surely read as soon as possible; the second was the IDE-CF adaptor I ordered.. the shipping was well handled, very little package and not much waste in advertising and whatnot, I like that; the thing itself is pretty minimal, although it supports two compact flash cards, and it was easy to set up, being just a standard IDE device, OpenBoot recognised it correctly.

The problem after this was to actually get the partitioning done, as the bsdlabel command didn’t work… a quick check around shown me that being SPARC64 architecture, I should have used sunlabel command instead, and the problem was easily solved. After booting and setting up the partition, I tried to find a way for the loader there work with a partition that contains /boot without a “boot” directory.. it didn’t like that.

The solution should be to pass the string “/loader” to boot1 (the first stage loader, the equivalent of boot0) but to do that I need to change the property “bootargs” of the “/chosen” OpenBoot device.. which is not possible from the OBP console. I looked around, and found ioctls to actually do the job, so I decided to try writing a simple software to write to the settings, an “ofwedit” considering “ofwdump” exists already, and it has already code for writing to the openfirmware interface. Unfortunately, my easy program that should just set the value of the property I named, well, panics the kernel :|

Another problem is that the rootdev parameter on loader.conf is simply ignored, so I still have to find a way to properly tell the kernel to use /dev/ad1a as root partition, and it continue asking me for it at boot… at least with the serial console is easy to do so.

The problem with Catalyst I talked about previously seems to be again the same problem Roy found initially, kldload leading to kernel panic when called, as it tries to load the nullfs module; this means that I need to fix that bug, but also that catalyst can be used to build the stages, you just need to build nullfs in kernel rather than as a module.

I also found some other minor problems, like some manpages being installed in arch subfolders like sparc64, but I’ll fix that later with some more time. Emacs unfortunately does not want to build, first there’s a problem as it tries to use /usr/lib/crtbegin.o and /usr/lib/crtend.o, that are instead in a completely different location for us (inside gcc’s directory) since I removed it from freebsd-contrib. Also, once I fixed this trouble, it gave me a SIGILL during elisp building, which is all but promising.

And yes, I know there’s a KOffice bump to do, I just need the time to handle that, as right now I’m working on VLC’s release candidate, with the nsplugin support thanks to the information provided by Tavin Cole in bug #156067. Kaffeine 0.8.3 is in the tree now with a build patch as it didn’t build for me on three boxes already.

Soldering iron

Today I decided it was time to try the nullmodem cable, so I put the Ultra5 where it belonged, and connected it to the only port I recognised as a serial port, a DE-9M connection. After trying to understand what the hell was happening, I dropped by #gentoo-sparc, and Weeve tells me the bad news: the serial port I need is the DB-25 one, that’s also a DB-25-F port, of the wrong genre, even if I were to find a DE-9 to DB-25 connector (that I know I have somewhere in my house, but where I’m not sure), I wouldn’t be able to use it.

And if I struggled to find a nullmodem cable, how many possibilities to find a genderswap converter would I have had? Very little I’m afraid.

But I’m not the kind of person refusing a challenge, so I thought a bit about it and I found I had a DB-25-M connector, the solderable kind, and I knew I had some DE-9-F connectors often used to move ports out of a motherboard, if you ever touched a Pentium-class or older computer you know them. I looked up a pinout of them, with a simple schematic for the DE-9<->DB-25 conversion, and took my soldering kit.

29-11-06_1746

For a while I tried to solder against gravity, but after ten minutes, I started to see a pattern and decided to reverse the connector, soldering was way easier.

Actually at the first series of soldering I ended up swapping the cable 8 and 9 (of the DE-9), so I got data out of the sparc, but wasn’t able to send commands to it.

After fixing that, i was finally able to connect the serial console of the SPARC to enterprise and controller the boot using minicom.

29-11-06_1750

I’m pretty happy of how the soldering ended up, I never worked on such little connections, now I know I’m able to do it, even if at the start my hands trembled a bit too much, but it’s probably also caused by the fact that I’m soldering laid down on the floor, as I don’t have a working table.. I need to buy one.

Danny, this should cover for the nullmodem cable I bought rather than build myself ;)