Debugging debuggable

Now that Prakesh was able to complete the build of the three stages for Gentoo/FreeBSD 6.2_rc2, and they are available on mirrors, I have a few things to take care of in Gentoo/FreeBSD that I overlook for too long time.

The first is for sure updating the documentation, so that new suers can install the 6.2 stages fine, without all the workarounds we used to have for 6.1 (because it wasn’t built with catalyst); done that, I have to deprecate 6.1 in favour of 6.2, as that version is pretty much where we’re focusing right now, with the libedit fixes and the new baselayout 1.13 (that Roy made perfect on FreeBSD!); and then there’s to fix the modules loading problem with SPARC64.

So, let’s start with the first step, I’ve asked jforman to remove the 6.1 stage from the mirror, so that there won’t be new installation of it. Later on I’ll see to write a deprecated file for 6.1 profile, with some short instructions to upgrade to 6.2 somewhat smoothly.

Instead, for what concerns SPARC64, Klothos is currently helping me understanding the issue. My first task was to get on it an editor I could actually use, which meant, for me, emacs. Unfortunately, not counting the issue with gcc’s CSU object files being in a different place than standard FreeBSD (that I already worked around with the ebuild in the transition overlay), there was a nasty SIGILL while building some elisp code, and I never got around debugging it. After all it was easier than i expected: the problem was called by an inline asm() call, that called the instruction ta 3, that after a bit of googling turned up being a trap call (kinda like software interrupts in x86) that triggered some Kernel service to flush registers, which is not implemented for FreeBSD (for instance Emacs.app disable this too for GNUstep on FreeBSD operating system). An easy patch to make the call conditional solved the issue for me.

So I first wanted to confirm one thing, whether the problem was while building the modules or while building the kernel: if the problem was the kernel, even trying to load a module compiled by vanilla FreeBSD should cause the same panic, while if the problem was in the building of the modules, the module would have loaded without issues. I checked, and the problem happens only with our modules, even when loaded in an official kernel, which mean it’s safe to assume that the problem is building modules rather than the kernel. Which is both good and bad, because even if it limits my scope and my need to debug the kernel, it’s not like I have so much knowledge of the ELF loading to find the issue easily. I was tempted to buy Sun’s “Linker and Libraries Guide”, but not only the book is far from cheap ($49 at least), it’s not even found in Amazon (UK)’s availability.

Anyway, a quick comparison of the zlib.ko module from FreeBSD proper and Gentoo/FreeBSD shown me that the size of our own is about twice the original one (but I think it might be caused by the -ggdb3 build), and that there are more SPARC64_RELATIVE relocations, while there are no R_SPARC_32 at all in our copy.

I was looking forward for a more throughout debug tonight, but I was stopped by two incidents that are going to make my life in the next weeks harder than I expected. The first is that we don’t currently build the kernel debugger (kgdb), and we cannot easily build it (because it requires libgdb, that we currently don’t install… and I doubt I will be able to convince vapier to install it).

The second is that to get a coredump of the crash, we need to use the kernel’s dump facilities, that requires a swap partition, of at least the size of the RAM in the machine (and I don’t have one on Klothos, as it was originally built with only 128MB of memory, while now it has 1GB), and the run of some commands during boot phase, specifically savedump between the R/W mount of partitions (to save the dump) and the enabling of swap space (because that would destroy the dump), and dumpon after the swap is loaded. For the way baselayout works now, I need to change the localmount init script, but as I don’t like that solution, I’ll have to talk about this with Roy; the important thing to me is being able to enable/disable dump through conf.d files (similarly to what’s done in FreeBSD); I suppose a solution could be to use some addons and install them with one of the freebsd ebuilds, or with baselayout proper, depending on how Roy prefer).

Now, it’s not like the baselayout issue is not easily solvable, once Roy is around (he’s partying for the new year now, I suppose); but the swap size is what is going to stop me from using this feature. My only solution would be to add another compact flash card (the adapter I’m using is capable of connecting two cards already, one master and the other slave, which is kinda good for what I paid it), but it has to be at least 2GB (the ram is only 1GiB, of course, but I don’t want to start crying when I get hit by the GiB > GB thing, as I’m not sure if the CF cards are sold by the decimal GB or by the binary GB). I once again compared the prices here with the Germany’s one, and it seems I would pay 34+20 euros from there, or 89 here.. I don’t think I’ll go buying one just yet, not a big deal to buy, but I want to do some more tries without spending more money on that box, considering that I already loaded it with new (or newish for the SATA controller and disk) stuff that did cost me at least €100, box included, and it was just to debug a kernel problem…

One of the things I found difficult to grasp about SPARC asm, anyway, beside not finding a decent reference manual of it (call me crazy, I usually understand better a language by looking at its reference rather than to explanations and tutorials), is that load and store instructions seems to be written in “orig, dest” format rather than the usual “dest, orig” that I was used to under x86.. but it’s not that difficult to understand after all, most of the instructions are named after logical operations, and the ld/ldx and st/stx instructions make also easier to understand when the register is destination or origin, would have been nice to learn SPARC assembler at school rather than 8086.

4 thoughts on “Debugging debuggable

  1. Yah that’s an overview, not a reference, I was already reading it in these days, but it’s far from an handy reference per-instruction (like Intel has).

    Like

  2. You can limit the memory available to system in loader.conf: hw.physmem=131072 (the number is in kB), it’s equivalent of mem=128M in Linux. So if the amount of memory you need to trigger the bug is lower than the size of your swap, you shouldn’t need to buy the CF card.

    Like

  3. Thanks 8an, I didn’t think of that, it’s probably the most sensible thing to do, as I don’t need too much memory to trigger the issue (I mostly need only when compiling :) ).

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s