Why people insist on using /boot

You might or might not know that for a while now I’ve passed most of my time idling and chatting in #gentoo-it, trying to offer support whenever I can (when the user asking support deserves it at least). One of the strangely quite common type of support request involved to some extent the standalone /boot partition.

But why people insist on using a standalone /boot partition?

The /boot partition, where you add not only the grub configuration (with its stages), but also the kernels (you might, and probably should, have multiple copies), with their System.map, and optionally their configuration files, the eventual splashscreen for grub and some other stuff, was classically used to allow grub to access the kernel even on systems with a BIOS unable to allow access over the 1024 sector of an hard drive (grub can’t obviously have drivers for all the controllers, so it only uses the BIOS to access the disk). As a partition that would cross that boundary wouldn’t be properly readable by the BIOS, and thus by grub, the common solution was to put a small /boot before that boundary, and then leave the root partition to cross it, as once the kernel booted, the limitation could be ignored safely.

There are of course other cases where a standalone /boot partition could be useful, one case can be to have a way for grub to start and load the kernel, which in turn can boot with a rootfs stored on a device that the BIOS wouldn’t have been able to see (like a software raid1 or a PCI controller that couldn’t be detected); this is my reason to use a /boot on a CF memory card for Klothos: OpenBOOT doesn’t recognize the Promise SATA controller (I just have a SATA disk for that box), and thus I need to boot the kernel from an EIDE-compatible storage (in this case, the CF through an adapter). Please note that Klothos runs FreeBSD; more on that later.

Other cases where having /boot standalone can help is for half-thin clients where the kernel is stored locally, and then the rootfs is mounted via NFS: you can use a simple storage, like the CF I use, to keep /boot, and then load the rest, if the network card doesn’t support proper network boot.

But for the average user, does /boot provide any advantage? Maybe the only one is to avoid the user from deleting the kernel with rm -rf / but that’s almost useless: you would have screwed your system anyway at that point. I find it actually has a big disadvantage: if the user forgets to put it in fstab, he’d have to always mount it before running a make install for the kernel, and that’s something easily forgot.

Also, the use of a different partition for /boot confuses the hell out of some users, who don’t really understand the difference between Linux’s root filesystem’s partition and grub’s root. When I get a support request about installing grub, and I understand the user is confusing the root= parameter to the kernel and the (root hdX,Y) parameter for grub, my suggestion is to just get rid of the standalone /boot.

Not only this, it’s also difficult to decide the size for such a partition: a lot of people would use a size too small, or too big and then waste space.

Now about FreeBSD, well, it also uses a /boot directory, although it contains not only the kernel but also all its modules, and it makes it way harder to move it on a standalone partition. The FreeBSD documentation doesn’t really cover that option, and even looking around you’ll see a lot of people telling you it can’t be done, that FreeBSD ain’t Linux and that /boot is not something to move to its own partition. The truth is that sometimes you just need to do it, and you can, it’s just something much harder to do than in Linux. I had my own trouble, but then solved it.

So, while I can’t say I like FreeBSD idea of hiding the information that shouldn’t be used by the average user, I think that they are cutting out a lot of possible problems this way, and I think that Linux documentation should actively discourage average users of modern system from using a standalone /boot partition.

So my suggestion is: if you can’t name the reason why you’re using /boot as a standalone partition, then don’t use it.

A possible solution out of Farragut’s trouble

Yesterday I tried to fix a few of the issues with ~sparc-fbsd lagging behind (expected, considering that only me and Roy are handling it); unfortunately the PSU seems to be running quite hot, so when I started it again today to try a patch Ulrich gave me to try fixing emacs-cvs-23, it started making bad smell.

So a possible solution out of this mess for me would be to take Klothos to an undefined break, taking out its Promise SATA controller and putting it into Farragut, so I can connect the 160GB SATA spare drive I have; I actually need a noise dampener, but those are cheap, and I’ll make an order for them soon anyway.

For Klothos to come back up, I’ll have to wait so I can buy a new PSU for it, with active PFC and possibly less noisy.. not sure how much time will that take. In the mean time Roy (who’s currently on his honeymoon) is the only reference for ~sparc-fbsd keywording.

Anyway for those interested, yesterday it was quite a spree of keywording for ~x86-fbsd and somewhat for ~sparc-fbsd (fixed the most prominent problems with the latter, not everything though :( ).

Today I should probably update the Amarok’s maintainer guide, since although nothing changed since I left till I returned, I changed some stuff myself already (and that’s why the live version is now 9999-r2), and on that note I should test PyQt on G/FBSD to fix the new .badindev on Amarok.

So much stuff to do! And this is just a fraction of what I used to do, by the way.

Hardware needs

Sometimes I wonder if my hardware needs would be more bearable if I wasn’t working as a Free Software developer. Two days ago, as I wrote, I received the shipment from K&M with the new PSU for Farragut, and new fans for Enterprise.

The PSU, a be quiet Straight Power 400W works as a charm; the load of the UPS under normal system operation fell down from 29% to 21%, which means it’s saving about 80W, as unlikely as that might sound. In my previous blog post I estimated a 10% improvement of 30W, seems like the improvement was of more than twice what I was expecting. Not a bad thing at all for me.

About the fans, I bought a Sharkoon Silent Eagle 2000 120×120 (I didn’t know this brand but it was one of the cheapest); they say that the fan is covered with the same material of golf balls; not sure if it’s that or not, but it certainly does seem to work fine and silent even if it beats the hell out of my previous fan.

For the hard disks I ordered two Revoltec Hard Drive Freezer another brand I didn’t know before, but half cheap and I wanted to give it a try; they also work quite well.

The nice thing to see about this is that the three brands are all Germans, as far as I can see, this explains why they cost so much less than here: the manufacturers probably sell directly to shops like K&M rather than using a distributor that then ships to the other part of the world to be sold (most of the components I can buy here are made in Taiwan, then sold to a distributor in Italy, which then sells to shop, which then sell to me); this way it cuts down the shipment costs, even if I pay extra to get the final goods to me, and is probably more environmentally friendly than having stuff moved through the whole world before reaching my house.

But, I titled this post about hardware needs, and what I just wrote was about hardware I bought and thus I don’t need anymore. There are though a few things that I’m still in need for.

The first is the PSU for Klothos that I already blogged about; when I placed my order to K&M last time I ordered the 370W Be Quiet BQT-P5 Blackline Titanium 2-Lüfter PSU that should be designed for server usage, so should fit nicely into the Ultra 5 pizzabox. The alternative, suggested to me by nightmorph, would be the Antec Neo He 380 (as you can see it’s quite overdimensioned, the original PSU should be around 250/300W; but this is the minimum you find, and as it is I could use it for other boxes in case Klothos dies). The problem with this is that I can’t seem to find be quiet anywhere in Italy, and the Antec one I found… for €70 + €18 of shipment (from 50/60Km from where I live, by the way), which would be almost three times what I paid for the original Ultra 5.

Although Klothos had a main contribution by Christian Iuga who sent me 1GB of RAM for it (which makes building stages way less painful), I invested on it quite a bit already: I put on it a new CD burner (or it wouldn’t have read CD-RW medias), a Promise SATA controller, a 160GB Samsung SATA hard disk, a Compact Flash to IDE adapter, together with a 1GB Compact Flash card, and the most expensive new component I bought, the Intel GigaBit card (which will never arrive to GigaBit on an Ultra5 anyway, but at least does not work as badly as the integrated card) that costed me €45, (one time and a half what I paid the original box).

It certainly disappoints me not to be able to use the box right now, but beside the noise made by the PSU, it’s too hot to turn it on; I’d like to buy the PSU, but that would mean I’d put components for €300 in a box that I paid €30.. a bit too much for me, especially until I get a new job (and then I’d probably not have time anymore to use Klothos anyway).

Beside this obnoxious matter, I end up having more “mundane” needs lately, for instance to be able to work from my room, rather than from the office (where the heat is usually quite a lot), I’d be needing an external keyboard for the MacBook Pro, possibly BlueTooth to make it more practical, but again I’m stuck as I need the new job, and then I’d probably not be able to make use of it anymore.

So this post is here just to pose a big question: how’s it possible that if I take the job, and then stop doing most of my contributions to Free Software (most likely), I can afford myself the hardware I need now, but I won’t be able to employ it? Is this Murphy’s law? Karma? What else? Sigh!

More progress for XDG support in xine

Returning on yesterday’s blog entry, today I’ll see to update you on the status of xine-lib’s XDG support.

Thanks to Mark Nevill, I didn’t have to reinvent the wheel by parsing the various XDG_ variables to check for the directories I have to search in; he already wrote a libxdg-basedir library that takes care of most of it, allowing me to take care of the implementation details.

Now, xine-lib-1.2 branch has an internal copy of libxdg-basedir (two source files, so it’s not a big deal, and I’ve added a switch to use the external copy of it if needed), and uses it to decide where to read and write some files.

For instance, the plugins cache is no more in ~/.xine/catalog.cache but it’s in $XDG_CACHE_HOME/xine-lib/plugins.cache which both makes more sense and can be decided by user to be moved out of its home (for instance I change the value to XDG_CACHE_HOME, but that’s a topic for another post). The CDDB cache is also moved on the cache home, while the fonts are now discovered in the XDG_DATA_DIRS defined paths. Darren also moved channels.conf load from ~/.xine to XDG_CONFIG_HOME, which means it’s loaded in ~/.config/xine-lib by default.

Please note that I’ve been using xine-lib all over rather than xine, so that users know what is generated/handled by the frontends and what is actually part of the library, if they care.

There are still a few changes pending, for instance Darren wants to support loading of a system-level channels.conf, so that it can be put in /etc/xdg and used for all users without having to put it in xine’s configuration for every user.

I’ve also decided to pay more attention to the security side of xine-lib, for instance, after talking with Taviso today, I’ve added a xine_xcalloc function to wrap around calloc, which should avoid possible overflows (there was one in input_dvb); I’ve changed some of the code, but of course it’s not totally cleaned up. xine-lib really should be audited piece by piece for improvement, every time I touch something I end up cleaning it up by adding more const keywords (trying to let the compiler optimise a bit more) and adding/removing/cleaning up code .

One thing that certainly would help would be to put a better wall between contributed code (where we should always do the work on upstream’s side and then merge it back into xine-lib, to avoid getting them out of sync, unless fixing some xine corner case or similar — although there it could certainly help to put a proper wrapping around the two), and the xine proper code that should fixed/cleaned/improved as we go.

Really, while xine actually do its job most of the times, it still suffers from a lot of possible problems just because the code is too old and stratified. I think I should simply undergo an audit file per file trying to fix stuff while I also update the documentation to be doxygen-compatible, but that’s going to take so much time that I’m not sure how realistic is to work on it; beside I don’t have hardware such as a DVB tuner that would allow me to complete the audit (I can’t try that code and I can’t ensure its working state).

On the other hand, tonight I fired up Klothos again; it has been some time since I’ve done any Gentoo/*BSD work, but I’ve lately asked Roy to put me back in bsd@gentoo.org alias, to see if there is work that needs to be done that I can do. Yes, I suppose I’m considering coming back, but if I will do that, it will be on a lower profile; maybe I’ll help Mike, Frysinger, SpanKY and vapier with the ARM architecture, maybe I’ll just decide to take care of BSD alone.

The problem with Klothos is the PSU; the whole box is quite silent, as the UltraSPARC CPU is cooled by a slow and silent fan, but the PSU is annoyingly noisy, and I can’t just put a be quiet PSU like I put on Enterprise because of the Ultra5 casing (it’s a desktop machine, the case is high as the ATX power supply, which means the big fan used by most silent units would be obstructed by it); I could try a passive PSU, as the box shouldn’t be sucking too much power anyway, but I’m not sure about it and I don’t have money to waste (at the moment I’m unemployed).

If anybody has a suggestion about which PSU I could use for such a box, it’s really appreciated. It’s funny that I paid €35 for a working box, and then paid way more to bring it to a standard worth to be a dev machine… well, I got the SATA controller already in my possession, as well as the SATA hard disk; the DVD reader was also an old one I used, but I had to buy a new ethernet card to avoid using the obnoxious hme0 driver..

Debugging debuggable

Now that Prakesh was able to complete the build of the three stages for Gentoo/FreeBSD 6.2_rc2, and they are available on mirrors, I have a few things to take care of in Gentoo/FreeBSD that I overlook for too long time.

The first is for sure updating the documentation, so that new suers can install the 6.2 stages fine, without all the workarounds we used to have for 6.1 (because it wasn’t built with catalyst); done that, I have to deprecate 6.1 in favour of 6.2, as that version is pretty much where we’re focusing right now, with the libedit fixes and the new baselayout 1.13 (that Roy made perfect on FreeBSD!); and then there’s to fix the modules loading problem with SPARC64.

So, let’s start with the first step, I’ve asked jforman to remove the 6.1 stage from the mirror, so that there won’t be new installation of it. Later on I’ll see to write a deprecated file for 6.1 profile, with some short instructions to upgrade to 6.2 somewhat smoothly.

Instead, for what concerns SPARC64, Klothos is currently helping me understanding the issue. My first task was to get on it an editor I could actually use, which meant, for me, emacs. Unfortunately, not counting the issue with gcc’s CSU object files being in a different place than standard FreeBSD (that I already worked around with the ebuild in the transition overlay), there was a nasty SIGILL while building some elisp code, and I never got around debugging it. After all it was easier than i expected: the problem was called by an inline asm() call, that called the instruction ta 3, that after a bit of googling turned up being a trap call (kinda like software interrupts in x86) that triggered some Kernel service to flush registers, which is not implemented for FreeBSD (for instance Emacs.app disable this too for GNUstep on FreeBSD operating system). An easy patch to make the call conditional solved the issue for me.

So I first wanted to confirm one thing, whether the problem was while building the modules or while building the kernel: if the problem was the kernel, even trying to load a module compiled by vanilla FreeBSD should cause the same panic, while if the problem was in the building of the modules, the module would have loaded without issues. I checked, and the problem happens only with our modules, even when loaded in an official kernel, which mean it’s safe to assume that the problem is building modules rather than the kernel. Which is both good and bad, because even if it limits my scope and my need to debug the kernel, it’s not like I have so much knowledge of the ELF loading to find the issue easily. I was tempted to buy Sun’s “Linker and Libraries Guide”, but not only the book is far from cheap ($49 at least), it’s not even found in Amazon (UK)’s availability.

Anyway, a quick comparison of the zlib.ko module from FreeBSD proper and Gentoo/FreeBSD shown me that the size of our own is about twice the original one (but I think it might be caused by the -ggdb3 build), and that there are more SPARC64_RELATIVE relocations, while there are no R_SPARC_32 at all in our copy.

I was looking forward for a more throughout debug tonight, but I was stopped by two incidents that are going to make my life in the next weeks harder than I expected. The first is that we don’t currently build the kernel debugger (kgdb), and we cannot easily build it (because it requires libgdb, that we currently don’t install… and I doubt I will be able to convince vapier to install it).

The second is that to get a coredump of the crash, we need to use the kernel’s dump facilities, that requires a swap partition, of at least the size of the RAM in the machine (and I don’t have one on Klothos, as it was originally built with only 128MB of memory, while now it has 1GB), and the run of some commands during boot phase, specifically savedump between the R/W mount of partitions (to save the dump) and the enabling of swap space (because that would destroy the dump), and dumpon after the swap is loaded. For the way baselayout works now, I need to change the localmount init script, but as I don’t like that solution, I’ll have to talk about this with Roy; the important thing to me is being able to enable/disable dump through conf.d files (similarly to what’s done in FreeBSD); I suppose a solution could be to use some addons and install them with one of the freebsd ebuilds, or with baselayout proper, depending on how Roy prefer).

Now, it’s not like the baselayout issue is not easily solvable, once Roy is around (he’s partying for the new year now, I suppose); but the swap size is what is going to stop me from using this feature. My only solution would be to add another compact flash card (the adapter I’m using is capable of connecting two cards already, one master and the other slave, which is kinda good for what I paid it), but it has to be at least 2GB (the ram is only 1GiB, of course, but I don’t want to start crying when I get hit by the GiB > GB thing, as I’m not sure if the CF cards are sold by the decimal GB or by the binary GB). I once again compared the prices here with the Germany’s one, and it seems I would pay 34+20 euros from there, or 89 here.. I don’t think I’ll go buying one just yet, not a big deal to buy, but I want to do some more tries without spending more money on that box, considering that I already loaded it with new (or newish for the SATA controller and disk) stuff that did cost me at least €100, box included, and it was just to debug a kernel problem…

One of the things I found difficult to grasp about SPARC asm, anyway, beside not finding a decent reference manual of it (call me crazy, I usually understand better a language by looking at its reference rather than to explanations and tutorials), is that load and store instructions seems to be written in “orig, dest” format rather than the usual “dest, orig” that I was used to under x86.. but it’s not that difficult to understand after all, most of the instructions are named after logical operations, and the ld/ldx and st/stx instructions make also easier to understand when the register is destination or origin, would have been nice to learn SPARC assembler at school rather than 8086.

I love the European Union

Sometimes something good happens even here on the other side of the pond… but let’s go with order.

First of all, I want to thank Christian Iuga, who sent me 1GB of RAM for Klothos (received them yesterday), that now compiles quite faster (which is very good for my debugging sessions, or every time I had to wait eternity :)), so last night I returned working on ~sparc-fbsd (also because there’s a new FreeBSD release ready, but I’ll talk about that later), but now the bottleneck, instead of the RAM, seemed to be the network… okay, hme0 is known to be the worst network driver for FreeBSD, and it ended up giving me NFS performance comparable to a 10Mbit network.. not that good when you have the portage tree over network :/

Unfortunately, the only PCI network cards I have at home are Realtek-based, 8139 chipset, that Ciaran told me likely not supported by SPARC, and indeed I simply get a “Data Access Error” on the serial line trying to boot with one plugged in. So I had to find some better supported card… e100 was the suggestion, but a quick skim over my usual retailers, both in shops (through the sites) and via Internet, told me that none carried E100 cards; the only Intel cards I could find were the Gigabit ones, that cost about €50, which is not exactly cheap.. but okay, maybe I can slowly start updating the local network to Gigabit, now that both Enterprise and Intrepid have Gigabit-capable cards, so I can try one of those… but even that card is difficult to find on my retailers…. okay, so hold on for now.

But also, two nights ago I had some trouble with one of the fans of Enterprise, that started doing a really bad noise, and my mother forced me to turn it off during the night (sigh a batch-compile cancelled), and last afternoon I spent trying to find which fans was.. after an half-working suggestion from Jakub and Javier (to try stopping the power supply fan to see if that was it, but I did it wrong and stopped the CPU fan.. that refused to restart till I powercycled the system), I found the dying fan, the rear-case one.. unfortunately trying to stop it, I also broken it definitely, so I simply removed it, luckily there seems to be no risk for my CPU for now, the temperature goes between 34°C while playing music to 47°C while compiling, although KingTaco suggested me to find a new CPU cooler.

And again, finding a decent one from my usual retailers was difficult… the best I could find, the Thermaltake Silent 939, would cost me €29 plus VAT (20%) and €11 for shipping; which is not really acceptable to me…

Introducing the European Union and the single market. Some time ago, someone (luckyduck maybe, it was just when I did join Gentoo), gave me the site of a German shop that ships to the rest of Europe too. I decide to give it a shot, although I used it before to compare prices, I never tried to order from it before.

The network card is at €34.49, the CPU cooler is at €24.19, both are quite a bit cheaper than in Italy. The shipping cost is €20 though, which removes most of the saving, and I have to count it will probably take a week or two to get the stuff here, I suppose.

But then I get the great idea.. I have a laser printer at home, a Kyocera-Mita FS-1020D, pretty cool of a printer, the toner kits are quite cheap too, €100 in shop, €90+€6 of shipping from an online shop.. how much would they cost on the German shop? €67.38 .. which, even if it was the only thing I ordered, added the shipping costs, is lower than both. I ordered one of that, even if I still have probably half the toner in the current cartridge, because it’s something that won’t go wasted anyway, and that alone makes the deal affordable and a good saving for me.

So at the end, thanks also to zzam who translated a few phrases for me, and pointed me at where to look for the info, I was able to order both the network card and the CPU cooler at a good price, and I’m pretty much happy about it, as I won’t burn down the CPU of Enterprise, and I’ll be finally able not to have to wait for eons to download portage on Klothos ;)

For once, I want to thank the European Union and the existence of Euro :)

Now, on a more technical level, FreeBSD 6.2_rc2 was released yesterday, thanks again to AMD64 team, I downloaded and repacked the sources from pitr, and they are already on the mirrors; even the ebuilds should have hit the RSync mirrors by now. This time, dev-libs/libedit is being used, which means that while upgrading you need to symlink libedit.so.5 to libedit.so or it will fail to run /bin/sh (I know it’s annoying, I’m working on new stage for this reason).. for who’s following emerge upgrade order, which will miss libedit.so.5 before libedit.so is merged, you can take my libedit.so and use that in the mean time.

Now, while working on Klothos last night, I also found how to tell FreeBSD kernel to boot from a different partition than the default one (ad0a in the case of SPARC64 hardware). You need to edit loader.conf and set this:

vfs.root.mountfrom="ufs:/dev/ad1a"

The result is that I can now boot Klothos unattended, and not have to retype the string every damn time I reboot (which happens pretty much every time if I’m debugging the Kernel).

Debugging

Tonight I couldn’t sleep. What do I do when I cannot sleep? I debug!

As yesterday libvorbis was enough of an headache for me (ended up that the parameters are being reset by libvorbis itself because the third header has an error in parsing… now to find where the error is, that’s a good question), I decided to go with something easier, like Gentoo/FreeBSD/SPARC64 kernel debugging. No I’m not kidding, debugging the problem in the kernel is resulting easier and funnier than debugging an userland library… to decode audio files… that’s parsing a damn header!

Anyway, thanks to Javier (I won’t mistype his nick this time), I got into FreeBSD’s kernel debugging by building a kernel with -g and DDB support. Then, I easily got the trace of the kernel panic:

Tracing pid 1258 tid 100054 td 0xfffff80007aa7c80
panic() at panic+0xcc
trap() at trap+0x38c
-- fast data access mmu miss tar=0xc0b70000 %o7=0xc01ae8ec --
malloc_type_zone_allocated() at malloc_type_zone_allocated+0x14
malloc() at malloc+0x7c
hashinit() at hashinit+0x4c
nullfs_init() at nullfs_init+0x1c
vfs_modevent() at vfs_modevent+0x244
module_register_init() at module_register_init+0x58
linker_load_module() at linker_load_module+0x844
kldload() at kldload+0xf4
syscall() at syscall+0x334
-- syscall (304, FreeBSD ELF64, kldload) %o7=0x100a1c --
userland() at 0x40421288
user trace: trap %o7=0x100a1c
pc 0x40421288, sp 0x7fdffffde31
pc 0x10080c, sp 0x7fdffffdef1
pc 0x4020ab74, sp 0x7fdffffdfb1

this was tracing a kldload of nullfs, but any kldload produces errors, although they seem to be different from module to module.

db> x/ia malloc_type_zone_allocated,16
malloc_type_zone_allocated:     save            %sp, -0xc0, %sp
malloc_type_zone_allocated+0x4: call            critical_enter
malloc_type_zone_allocated+0x8: nop
malloc_type_zone_allocated+0xc: lduw            [%g7 + 0x3c], %g1
malloc_type_zone_allocated+0x10:        sllx            %g1, 6, %g3
------
malloc_type_zone_allocated+0x14:        ldx             [%i0 + 0x40], %g2
------
malloc_type_zone_allocated+0x18:        brz,pt          %i1, malloc_type_zone_allocated+0x38
malloc_type_zone_allocated+0x1c:        add             %g3, %g2, %g4
malloc_type_zone_allocated+0x20:        ldx             [%g3 + %g2], %g1
malloc_type_zone_allocated+0x24:        add             %g1, %i1, %g1
malloc_type_zone_allocated+0x28:        stx             %g1, [%g3 + %g2]
malloc_type_zone_allocated+0x2c:        ldx             [%g4 + 0x10], %g1
malloc_type_zone_allocated+0x30:        add             %g1, 0x1, %g1
malloc_type_zone_allocated+0x34:        stx             %g1, [%g4 + 0x10]
malloc_type_zone_allocated+0x38:        subcc           %i2, -0x1, %g0
malloc_type_zone_allocated+0x3c:        be,pn           malloc_type_zone_allocated+0x58
malloc_type_zone_allocated+0x40:        or              %g0, 0x1, %g2
malloc_type_zone_allocated+0x44:        sll             %g2, %i2, %g2
malloc_type_zone_allocated+0x48:        sra             %g2, 0x0, %g2
malloc_type_zone_allocated+0x4c:        ldx             [%g4 + 0x20], %g1
malloc_type_zone_allocated+0x50:        or              %g1, %g2, %g1
malloc_type_zone_allocated+0x54:        stx             %g1, [%g4 + 0x20]
malloc_type_zone_allocated+0x58:

This is the disassembly of the function that died, I’ve artificially separated the point where the crash happens from the rest of the code.
First of all, the thing that scared me was that even if I know nothing of SPARC assembler, and even my Intel assembler is pretty much limited to 8086 instructions (although I still remember most of them clearly, as I wrote an 8086 emulator when I was in high school), I was able to correlate more or less that code with

        struct malloc_type_internal *mtip;
        struct malloc_type_stats *mtsp;

        critical_enter();
        mtip = mtp->ks_handle;
        mtsp = &mtip->mti_stats[curcpu];
        if (size > 0) {
                mtsp->mts_memalloced += size;
                mtsp->mts_numallocs++;
        }
        if (zindx != -1)
                mtsp->mts_size |= 1 << zindx;
        critical_exit();

that is the code of the function in C source.

Now my problem is to find what causes the “fast mmu miss”, or in general the panic. The registers are funny:

db> show reg
g0          0xffffffffffffffff
g1          0xc04ce800  log_cdevsw+0x48
g2          0xffffffffffffffff
g3             0x870ad
g4          0xfffff8000040fff8
g5              0x1dfd  fpu_fault_size+0x1c49
g6          0xcb591980
g7          0xc054d7b0  pcpu0+0x1a90
i0                0x12  pcpup+0xb
i1          0xc04816e0
i2          0xcb590ab8
i3                 0xa  pcpup+0x3
i4          0xcb590b70
i5                 0x1
i6          0xcb590221
i7          0xc01d7bac  kdb_enter+0x34
tnpc        0xc01d7bb8  kdb_enter+0x40
tpc         0xc01d7bb4  kdb_enter+0x3c
tstate      0x441d001601
kdb_enter+0x3c: ta              %xcc, 1

as i0 is set to a quite low value (0x12) and the debugger tells me it’s referred to pcpup (“pcpu pointer”) address plus a value… the problem is that pcpup is .. uh.. loaded in g7:

#define PCPU_REG %g7

register struct pcpu *pcpup __asm__(__XSTRING(PCPU_REG));

I wonder if it’s a miscompile or simply ddb going crazy; still if I understood the ldx operator well enough, it’s trying to load data from g2 (that’s –1) into the address 0x42 … it does not feel too much right.

Anyway, will try to debug this further when I can find someone in Gentoo/SPARC team who can help me understanding SPARC assembler.

Booting Gentoo/FreeBSD/SPARC64

First of all, a service note regarding my previous ALSA post. Seems like I was lucky, and the ALSA code in the current repository is good enough to at least build and work on 2.6.18 and .19, so now there are alsa-driver-1.0.14_pre20061130 in the tree that will work until upstream releases a new version.

Tonfa, sorry if the words about mercurial were a bit too harsh than they needed to be, I was pretty much pissed off by it giving up on me when I needed it, although I’m still not sure why it continue crashing, it’s a memory corruption problem, so the backtrace won’t be useful, I’ll try to build it with minimal CFLAGS and see if perhaps it’s that creating the problem.

Now, on a more Gentoo/FreeBSD related note, today I received two packages by mail, the first was an Amazon package (only reported as “From an happy user”, that I thank even if I have no idea who he is :) ) containing Rhapsody’s Live in Canada 2005 that I’m listening to right now, and Gibson’s Count Zero that I’ll surely read as soon as possible; the second was the IDE-CF adaptor I ordered.. the shipping was well handled, very little package and not much waste in advertising and whatnot, I like that; the thing itself is pretty minimal, although it supports two compact flash cards, and it was easy to set up, being just a standard IDE device, OpenBoot recognised it correctly.

The problem after this was to actually get the partitioning done, as the bsdlabel command didn’t work… a quick check around shown me that being SPARC64 architecture, I should have used sunlabel command instead, and the problem was easily solved. After booting and setting up the partition, I tried to find a way for the loader there work with a partition that contains /boot without a “boot” directory.. it didn’t like that.

The solution should be to pass the string “/loader” to boot1 (the first stage loader, the equivalent of boot0) but to do that I need to change the property “bootargs” of the “/chosen” OpenBoot device.. which is not possible from the OBP console. I looked around, and found ioctls to actually do the job, so I decided to try writing a simple software to write to the settings, an “ofwedit” considering “ofwdump” exists already, and it has already code for writing to the openfirmware interface. Unfortunately, my easy program that should just set the value of the property I named, well, panics the kernel :|

Another problem is that the rootdev parameter on loader.conf is simply ignored, so I still have to find a way to properly tell the kernel to use /dev/ad1a as root partition, and it continue asking me for it at boot… at least with the serial console is easy to do so.

The problem with Catalyst I talked about previously seems to be again the same problem Roy found initially, kldload leading to kernel panic when called, as it tries to load the nullfs module; this means that I need to fix that bug, but also that catalyst can be used to build the stages, you just need to build nullfs in kernel rather than as a module.

I also found some other minor problems, like some manpages being installed in arch subfolders like sparc64, but I’ll fix that later with some more time. Emacs unfortunately does not want to build, first there’s a problem as it tries to use /usr/lib/crtbegin.o and /usr/lib/crtend.o, that are instead in a completely different location for us (inside gcc’s directory) since I removed it from freebsd-contrib. Also, once I fixed this trouble, it gave me a SIGILL during elisp building, which is all but promising.

And yes, I know there’s a KOffice bump to do, I just need the time to handle that, as right now I’m working on VLC’s release candidate, with the nsplugin support thanks to the information provided by Tavin Cole in bug #156067. Kaffeine 0.8.3 is in the tree now with a build patch as it didn’t build for me on three boxes already.

Soldering iron

Today I decided it was time to try the nullmodem cable, so I put the Ultra5 where it belonged, and connected it to the only port I recognised as a serial port, a DE-9M connection. After trying to understand what the hell was happening, I dropped by #gentoo-sparc, and Weeve tells me the bad news: the serial port I need is the DB-25 one, that’s also a DB-25-F port, of the wrong genre, even if I were to find a DE-9 to DB-25 connector (that I know I have somewhere in my house, but where I’m not sure), I wouldn’t be able to use it.

And if I struggled to find a nullmodem cable, how many possibilities to find a genderswap converter would I have had? Very little I’m afraid.

But I’m not the kind of person refusing a challenge, so I thought a bit about it and I found I had a DB-25-M connector, the solderable kind, and I knew I had some DE-9-F connectors often used to move ports out of a motherboard, if you ever touched a Pentium-class or older computer you know them. I looked up a pinout of them, with a simple schematic for the DE-9<->DB-25 conversion, and took my soldering kit.

29-11-06_1746

For a while I tried to solder against gravity, but after ten minutes, I started to see a pattern and decided to reverse the connector, soldering was way easier.

Actually at the first series of soldering I ended up swapping the cable 8 and 9 (of the DE-9), so I got data out of the sparc, but wasn’t able to send commands to it.

After fixing that, i was finally able to connect the serial console of the SPARC to enterprise and controller the boot using minicom.

29-11-06_1750

I’m pretty happy of how the soldering ended up, I never worked on such little connections, now I know I’m able to do it, even if at the start my hands trembled a bit too much, but it’s probably also caused by the fact that I’m soldering laid down on the floor, as I don’t have a working table.. I need to buy one.

Danny, this should cover for the nullmodem cable I bought rather than build myself ;)

SPARCing around

While I’m listening to Interview to pkgsrc developer Johnny Lam (yes, I do like the bsdtalk podcast), I decided to write some little update on what I’m during on this “less work on Gentoo” week.

A part continuing my job with PHP, MySQL and JavaScript (brr), I’ve also continued using Klothos to keyword some of my preferred tools, like for instance metalog, and avahi, that I use to monitor my network to see which boxes are up.. unfortunately the latter does not complete the registering, and gdb seems to be useless to find where it gets stuck, so I’ll try to debug that better later on. I then decided to shift to something more useful like trying to get catalyst working, but even that ended up as a cul-de-sac because I got a kernel panic as soon as it tried to bind the directories for the chroot, the problem is probably either a bug in the kernel, or a misalignment between the userland (6.2-RC1) and the kernel (6.2-BETA3), I’ll solve that once I’ve got the CF memory working (I also received the memory today, it’s a geek’s dream to get a machine to boot within such a little device..

Tonight I also tested the new Audacious plugins package(s) on FreeBSD to get WavPack and MIDI decoding working, it’s kinda cool to have Audacious playing on a box connected to just a KVM and an ethernet cable, and listening it through the main box’s amplifier, viva PulseAudio! :)

I’m now thinking of branching xine-lib to work on a possible 1.2 version with FFmpeg imported without changes, maybe that would help to sync it more often, but I’m not really sure if I have the time to continue it on right now. I just can’t stand here doing nothing leaving xine dying, I simply can’t.

On good news, XCB problems with FreeBSD are now being solved, thanks to Jamey and Josh the solution will be a standalone library that would stub the needed functions for pthread, being a no-op for GLIBC and other systems where the functions are all stubbed in the C library (I think I should send a PR to FreeBSD about the missing pthread_equal stub, but I’m not yet sure).

On bad news, cdrkit is a desperate case on FreeBSD, gcvt, ecvt and fcvt are missing, and the ebuild currently hard-deps on libcap. I’ll try to see to fix it, but I’m not optimistic.

And now last but not least, a big thank to Mike (Arthur) for Fahrenheit 451, I read that one when I was 11, an old Italian copy of my teacher (class’s library) and I’m really looking forward for reading original Bradbury’s word for it, it’s still a current reading, unfortunately (not that’s a bad reading at all, but it would be surely better if there wasn’t the risk of ending up that way).