This Time Self-Hosted
dark mode light mode Search

Archiving

One of the requests at the past VDD when I shown some VLC developers my Ruby-Elf toolsuite was for it to access archive files directly. This has been in my wishlist as well for a while, so I decided to start working on it. To be precise, I started writing the parser (and actually wrote almost all of it!) on the Eurostar that was bringing me to London from Paris.

Now you have to understand that while the Wikipedia page is a quite good source of documentation for the format itself, it’s not exactly complete (it doesn’t really have to be). But it’s mostly to the point: there are currently two main variants for ar files: the GNU and the BSD variants. And the only real difference between the two is in the way long filenames are handled. And with long filenames I mean filenames that are long at least 16 characters.

The GNU format handles this with a single index file that provides the names for all the files, and provide you with an offset instead of the proper name in the header data, whereas the BSD format provides you with a length, and prepend the filename to the actual file data. Which of the two options should be the best is well up for debate.

I already knew of the difference, so I did code in support for both variants, but of course while on the train I only had access to the GNU version of ar which is present in Binutils, so I only wrote a test for that. Now that I’m back at the office (temporarily in Los Angeles, as it seems like I’ll be moving to London soon enough), I have a Mac to my side and I decided to prepare the files for testing with its ar(1) which is supposedly BSD.

I say supposedly because something strange happened! The long filename code is hit by two of my testcases: one is using an actual object file which happens to have a name longer than 16 characters, the other is an explicit long (very long) filename — 86 characters! But what happens is that Apple’s version of ar writes the filename as 88 characters, padding it with two final null bytes. At first I thought I got something wrong in the format, but if I use bsdtar on Linux, which provide among other formats support for the bsd ar format, it writes down properly 86 bytes without any kind of null termination.

More interestingly, the other archive, where the filename is just 20 characters long, is written the exact same way by both libarchive and Apple’s ar!

Comments 3
  1. Why do you have to reverse engineer the BSD format without looking at the code? Just look at the code in llvm (or system ar if it uses that) and in FreeBSD.

  2. I’m not reverse engineering it. I’m reimplementing it based on the documentation that is there. Which is exactly what one should do if he doesn’t want to be tied to the license of other projects for something that is generally trivial.

  3. Seems to me like the Apple version just makes the length a multiple of 4, at least that is consistent with the 20 vs. 86 character behaviour difference.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.