Apple’s HFS+, open-source tools, and LLVM

The title of this post seems a bit messed up, but it’ll make sense at the end. It’s half a recount of my personal hardware trouble and half a recount of my fighting with Apple’s software, and not of the kind my reader hate to read about I guess.

I recently had to tear apart my Seagate FreeAgent Xtreme external HDD. The reasons? Well, beside leaving me without a connection while using it (with Yamato) on eSATA, and forcing me to use either Firewire or USB (both much slower — and I did pay it to use eSATA!), yesterday it decided it didn’t want to let me access anything via either of the three connections, not even after a number of power cycles (waiting for it to cool down as well); this was probably related to the fact that I tried to use it again as eSATA, connected to the new laptop to try copying an already set-up partition from the local drive to make space for (sigh) Windows 7.

Luckily, there was no data worth spending time on, in that partition, just a few GNOME settings I could recreate in a matter of minutes anyway.

Since the Oxford Electronics-based bridge on the device decided not to help me out to get my data back, I decided to break it up, with the help of a Youtube video (don’t say that Youtube isn’t helpful!), and took the drive itself out, which is obviously a Seagate 7200.11 1TB drive, quite a sturdy one to look at it. No I won’t add it at the 7th disk drive to Yamato, mostly because I fear it wouldn’t be able to start up anymore if I did so.

Thankfully, I bought a Nilox-branded “bay” a month or so ago, when I gave away what remained of Enterprise to a friend of mine (the only task that Enterprise was still doing was saving data out of SATA disks when people brought me laptops or PCs that fried up. My choice for that bay was due to the fact that it allows you to plug in both 3.5” and 2.5” SATA disks without having to screw them anywhere. It does look a lot like something out of the Dollhouse set, to be honest, but that doesn’t matter now.

I plugged it in, and started downloading the data; I can’t be sure it is all fine, so I deleted lots and lots of stuff I won’t be safe about for a while. Then I shivered, fearing the disk itself was bad, and that I had no way to check it out… thankfully, the bay uses Sunplus electronics in it, and – lo and behold! – smartmontools has a driver for the Sunplus USB bridge! A SMART test later, and the disk turns out to feel better than any other disk I ever used. Wow. Well, it’s expected as I never compiled on it.

Anyway, what can I do with a 1TB SATA disk I cannot plug into any computer as it is? Well, actually one thing I can do: backup storage. Not the kind of rolling backup I’m currently doing with rsnapshot and the WD MyBook Studio II in eSATA (anything else is just too slow to backup virtual machines), but rather a fixed backup of stuff I don’t expect to be looking at or using anytime soon. But to be on the safe side, I wanted to have it available in a format I can access, on the go, with the Mac as well as from Linux; and vfat is obviously not a good choice.

The choice is, for the Nth time, HFS+. Since Apple has published quite a bit of specs on the matter, the support in Linux is decent, albeit far from being perfect (I still haven’t finished my NFS export patch, it does not support ACLs or extended attributes, and so on). It’s way too unreliable for rsnapshot (with hardlinking) but should work acceptably well for the storage.

The only reason I have not to use it for something I want to rely on, as it is, is that the tools for filesystem creationa nd check (mkfs and fsck) are quite a bit old. I’m not referring to “hfsutils” or “hfsplusutils” both of which are written from scratch and have a number of problems, including but not limited to, shitty 64-bit code. I’m referring to the diskdev_cmds package in Gentoo which is a straight port of Apple’s own code, which is released as FLOSS under the APSL2 license.

Yes, I call that FLOSS! You may hate Apple as much as you wish, but even FSF considers APSL2 a Free Software license albeit with problems; on the other hand they explicitly state this (emphasis mine):

For this reason, we recommend you do not release new software using this license; but it is ok to use and improve software which other people release under this license.

Anyway, I went to Apple’s releases for 10.6.3 software (interestingly they haven’t published yet 10.6.4 which was released just the other day), and downloaded diskdev_cmds, and the xnu package that contains their basic kernel interfaces, and I started working on an autotools build system to make it possible to easily port the code in the future (thanks to git and branching).

The first obstacle, beside the includes obviously changing, was that Apple decided to make good use of a feature they implemented as part of Snow Leopard’s “Grand Central Dispatch”, their “easy” multi-threading implementation (somewhat similar to the concept of OpenMP): “blocks”. Anonymous functions for the C language, an extension they worked in LLVM. So GCC straight is unable to build the new diskdev_cmds. I could either go to fetch an older diskdev_cmds tarball, from Leopard rather than Snow Leopard, where GCD was not implemented yet, or I could up the ante and try to get it working with some other tools. Guess what?

In Gentoo we already have LLVM around, and the clang frontend as well. I decided to write an Autoconf check for blocks support, and rely on clang for the build. Unfortunately it also needs Apple’s own libclosure, that provides some interfaces to work with blocks. And the basis for the GDC interface. It actually resonated a bit when Snow Leopard was presented because Apple released it for Windows as well, with the sources under MIT license (very liberal). Unfortunately you cannot find it in the page I linked above but you have to look at 10.6.2 page for whatever reason.

I first attempted to merge this straight in the diskdev_cmds sources, but then I guessed that it makes more sense to try porting it alone, and make it available, maybe somebody will find some good use for it. Unfortunately the task is not as trivial as it looks. The package needs two very simple functions for “atomic compare and swap” which OS X provides as part of its base library, and so does Windows. On Linux, equivalent functions are provided by HP’s libatomic_ops (you probably have it around because of PulseAudio).

Unfortunately, libatomic_ops does not build, as it is, with clang/LLVM; there is a mistake in the code, or the way it’s parsed; it’s not something unexpected given that inline assembler is a lot compiler-dependent. In this case it’s a size problem: it uses a constraint for integer types (32-bit) but a temporary (and same-sized input) of type unsigned character (8-bit). The second stop is again libatomic_ops’s problem: while it provides an equivalent interface to do atomic compare and swap for long types, it doesn’t do so for int types; that means it works fine on x86 (and other 32-bit architectures where both types are 32-bit) but it won’t do for x86-64 and other 64-bit architectures. Guess what the libclosure code needs?

Now of course it would be possible to lift the atomic operations out of the xnu code, or just write them straight, as libatomic_ops already provides them all, just not correctly-sized for x86-64 but the problem remains that you then have to add a number of functions for the various architecture rather than having a generic interface; xnu provides functions only for x86/x86-64 and PPC (since that’s what Apple uses/used).

And where has this left me now? Well, nowhere far, mostly with a sour feeling about libatomic_ops inability to provide a common, decent interface (for those who wonder, they do provide char-sized inlines for compare and swap for most architecture, and even the int-sized alternatives that I was longing for… but only for IA-64. You wouldn’t believe that until you remembered that the whole library is maintained by HP.

If I could take the time off without risking trouble, I would most likely try to get better HFS+ support in Linux, if only to make it easier and less troublesome for OSX users to migrate to Linux at one point or another. The specs are almost all out there, the code as well. Unfortunately I’m no expert in filesystems and I lack the time to invest on the matter.

Interesting notes about filesystems and UTF-8

You probably know already that I’m an UTF-8 enthusiast; I use UTF-8 extensively in everything I do, mail, writings, irc, and whatever; not only because my name can only be spelled right when UTF-8 is used, but also because it really makes it nicer to write text that has proper arrows rather than semigraphical arrows, and proper ellipsis as well as dashes.

On Linux, UTF-8 is not always easy to get right, there is quite a bit of software out there that does not play nice with UTF-8 and unicode, included our GLSA handling software, and that can really be a bother to me. There are also problems when interfacing to filesystems like FAT that don’t support UTF-8 in any way.

Not so on Mac OS X usually, because the system was there designed entirely to make use of Unicode and UTF-8, included the filesystem, HFS+. There is, though, one big problem with this: since there are many ways to produce the same character in UTF-8, using either single codepoints or more complex, but easier to compare in case-insensitive way, combined diacritics markers. Since HFS+ can be case-insensitive (and indeed it is by default, and has to be for the operating system volume), Apple decided to force the use of the latter format for UTF-8 text in HFS+: all the file names are normalised before being used. This works fine for them, and the filenames are usually readable from Linux just as fine.

But there is a problem. Since I have lots of music on iTunes to be synced on my iPod, I usually keep my main music archive in OS X, and then rsync it over repeatedly on Linux so I can play it with my main system (or at least try to since most of the audio players I found are sucky for what I need). In my music archive, I have many tracks from Hikaru Utada (宇多田ヒカル), which are named with the original titles (most of them come from the iTunes Store itself; others are ripped from my CD); one EP I have is titled SAKURAドロップス now in this title there are two characters that are decomposed in base and marker (ド and プ). While it might not be obvious, I’ll just rely on Michael Kaplan to explain you why that happens.

Now, the the synced file maintains the normalised filename, which is fine. The problem is that something does not work right on zsh, gnome-terminal, or both. On Gentoo, with local gnome-terminal, both when showing me the completion alternatives, and when actually completing the filename, instead of ド I get ト<3099> on Fedora via SSH, the completion alternatives are fine, while it still gets the non-recomposed version on the commandline after completion.

Update (2017-04-28): I feel very sad to have found out over a year and a half later that Michael died. The links in this and other posts to his blog are now linked to the archive kindly provided and set up by Jan Kučera. Thank you, Jan. And thank you, Michael.

Hacking the kernel

So in my odissey between filesystems I had to do some serious kernel hacking to make sure I could get the HFS+ filesystem properly exported to my laptop. I actually have noticed iTunes failing often, to the point of annoyance, to copy the data, resulting in it to adapt automatically to a different path for the iTunes collection, but I didn’t understand the problem.

Turns out Christoph Hellwig knows the reason pretty well: my patch was incomplete, the get_parent() method is really needed for NFS to work properly, and he also provided me with a quick test case to try it out. Sure enough, there’s a problem. Unfortunately my first try, factoring out lookup() and use that to implement get_parent() like ext2/3 and other filesystems do, failed.

After looking around the code, and Apple’s specs, I start to understand: common UNIX-like filesystems always have an hardlink named .. for the parent directory, but HFS+ does not have that, instead you have to look up the “thread” catalog entries to know who’s the parent of a file or directory (or, as the specifications call it, folder).

Now, after having worked on this I feel like saying a couple of things: I really like Free Software because it makes it very nice to add support for something that didn’t exist before (like NFS export of HFS+), but I like the way Apple write complete specification for so many things. Maybe too specific, but still nice to have some rather than none.

This has been refreshing somewhat since I didn’t seriously hack at the kernel in years, since the time I dealt with porting LIRC to kernel 2.5, which happened, as you might guess, before 2.6 was released. Actually, during that period when I worked on the LIRC patchset, I also moved from Debian to Gentoo, which also was the main reason why I actually implemented devfs support for LIRC devices (hey you guys remember devfs, don’t you?).

Now, hopefully if I can find a bit more free time for this, I should be able to submit a couple more changes to the kernel which I’ve been scheduling for a while, and maybe I can try my Ruby-Elf scripts on the kernel itself. It would be great if they could help the kernel as much as they have helped me with xine and other projects.

At any rate, the future looks nicer tonight.

Filesystems — take two

After the problem last week with XFS, today seems like a second take.

I wake up this morning with a reply about my HFS+ export patch, telling me that I have to implement the get_parent interface to make sure that NFS works even when the dentry cache is empty (which is what caused some issues with iTunes while I was doing my conversion most likely), good enough, I started working on it.

And while I was actually working on it, I find that the tinderbox is not compiling. A dmesg later shows that, once again, XFS had in-memory corruption, and I have to restart the box again. Thankfully, I got my SysRescue USB stick, which allowed me to check the filesystem before restarting.

Now this brings me to a couple of problems I have to solve. The first is that I finally have to switch /var/tmp to its own partition so that /var does not get clobbered if/when the filesystem go crazy; the second is that I have to consider alternatives to XFS for my filesystems. My home is already using ext3, but I don’t need performance there so it does not matter much; my root partition is using JFS since that’s what I tried when I reinstalled the system last year, although it didn’t turn out very good, and the resize support actually ate my data away.

Since I don’t care if my data gets eaten away on /var/tmp (the worst that might happen is me losing a patch I’m working on, or not being able to fetch the config.log for a failed package – and that is something I’ve been thinking about already), I think I’ll try something more “hardcore” and see how it goes, I think I’ll use ext4 as /var/tmp, unless it panics my kernel, in which case I’m going to try JFS again.

Oh well, time to resume my tasks I guess!

Filesystems

It seems like my concerns were a little misdirected; instead of the disks dying, the first problem appeared was an XFS failure on /var, after about two runs and a half of tree build. I woke up in the middle of the night with the scariest thought about something not being fine on Yamato, and indeed I came to see it not working any more. Bad bad bad.

I’m now considering the idea of getting a box to just handle all the storage problems running something a bit more tested lately: Sun’s ZFS. While Ted Tso’s concerns are pretty scary indeed, it seems like ZFS is the one filesystem that I could be using to squirm out all the possible performance and quality of disks, for network serving. And as far as I remember, Sun’s Solaris operating system comes with an iSCSI target software out of the box, which would really work out well for my MacBook’s needs too.

Now the problem is, does Enterprise still work? The motherboard is what I’m not sure about, but I guess I can just try that and then replace it if needed; I certainly need to replace the power supply since it’s now mounting a 250W, and I also need to replace the chassis, since the one I have now, mounting a Plexiglass side, and that makes it too noisy to stay turned on all the time.

I’m considering setting it up with four 500GB drives, which would cost me around 600 euro, included case and power supply; having eight, using the Promise SATA PCI card I have already, would bring me to 1K euro, and 4TB of space, but I don’t think it’s worth that yet. Both the Promise card and the onboard controller are SATA/150 but that shouldn’t be too much of a problem, with the Gigabit Ethernet being the bottleneck more than likely. Unfortunately this plan will not be enacted until I get enough jobs to finish paying for Yamato, and save the money for that.

Now, while I have to do with what I have, there is one problem. I have my video and music collection on the external Iomega drive, RAID1 “hardware”, 500GB of actual space divided roughly in 200GB for music/video and 300GB for OSX’s TimeMachine; the partition table is GUID (EFI) and the partitions are HFS+, so that if ever Yamato is turned off, I can access the data directly on the laptop through FireWire. This is all fine and dandy, if it wasn’t that I cannot move my iTunes folder in there because I cannot export the filesystem through NFS.

Linux does need kernel support for exporting filesystems through NFS, and the HFS+ driver in current Linux does not support this feature — yet. Because the nice thing about Linux and Free Software is that you can make them do whatever you wish as long as you have the skills to do that. And I hope I have enough skill to get this to work. I’m currently setting up a Fedora 10 install on vbox so that I can test my changes without risking a panic on my running kernel.

Once that’s working I’ll focus again on the tinderboxing, even though I cannot deal with the disk problem just yet. I have a few things to write about on that regard, especially about the problem of bundled libraries.

Locales, NLS, kernels and filesystems

One issue that is yet to be solved easily by most distribution (at least those not featuring extensive graphical configuration utilities, like Fedora and Ubuntu do), is most likely localisation.

There is an interesting blog post by Wouter Verhelst on Planet Debian that talks about setting locales variables. It’s a very interesting reading, as it clarifies pretty well the different variables.

One related issue seems to be understanding the meaning of the NLS settings that are available in the kernel configuration for Linux. Some users seem to think that you have to enable the codepages in there to be able to use a certain locale as system locale.

This is not the case, the NLS settings in there are basically only used for filesystems, and in particular only VFAT and NTFS filesystems. The reason of this lies in the fact that both filesystems are case-insensitive.

In usual Unix filesystems, like UFS, EXT2/3/4, XFS, JFS, ReiserFS and so on, file names are case sensitive, and they end up being just a string of arbitrary characters. On VFAT and NTFS, instead, the filenames are case *in*sensitive.

For case sensitivity, you need equivalence tables, and those are defined by different NLS values. For instance, for Western locale, the character ‘i’ and ‘I’ are equivalent, but in Turkish, they are not, as ‘i’ pairs with ‘İ’ and ‘I’ with ‘ı’ (if you wish to get more information about this, I’d refer you to Michael S. Kaplan’s blog on the subject).

So when you need to support VFAT or NTFS, you need to support the right NLS table, or your filesystem will end up corrupted (on Turkish charset, you can have two files called “FAIL” and “fail” as the two letters are not just the same). This is the reason why you find the NLS settings in the filesystems section.

Of course, one could say that HFS+ used by MacOS is also case-insensitive, so NLS settings should apply to that too, no? Well, no. I admit I don’t know much about historical HFS+ filesystems, as I only started using MacOS from version 10.3, but at least since then, the filenames are saved encoded in UTF-8, which has very well defined equivalence tables. So there is no need for option selections, the equivalence table is defined as part of the filesystem itself.

Knowing this, why VFAT does not work properly with UTF-8, as stated by the kernel when you mount it as iocharset=utf-8? The problem is that VFAT works on a per-character equivalence basis, and UTF-8 is a variable-size encoding, which does not suit well VFAT.

Unfortunately, make oldconfig and make xconfig seem to replace, at least on my system, the default charset with UTF-8 every time, maybe because UTF-8 is the system encoding I’m using. I guess I should look up to see if it’s worth to report a bug about this, or if I can fix it myself.

Update (2017-04-28): I feel very sad to have found out over a year and a half later that Michael died. The links in this and other posts to his blog are now linked to the archive kindly provided and set up by Jan Kučera. Thank you, Jan. And thank you, Michael.