Encoding iPod-compatible audiobooks with Free Software

Since in the last few days I’ve been able to rest also thanks to the new earphones I’ve finally been able to think again of multimedia as well as Gentoo. But just to preserve my sanity, and to make sure I do something I can reuse to rest even better, I decided to look into something new, and something that I would like to solve if I could. Generating iPod-compatible audibook files from the BBC Radio CDs I got.

The audiobook that you buy from the iTunes Store are usually downloaded in multiple files, one per CD of the original CD release, sometimes with chapter markings to be able to skip around. Unfortunately they also are DRM’d so analysing them is quite a bit of a mess, and I didn’t go to much extent to identify how that is achieved. The reason why I’d like to find, or document, the audibook format is a two-fold interoperability idea. The first part is being able to play iPod-compatible audiobooks with Free Software with the same chapter marking system working, and the other is (to me more concerning to be honest) being able to rip a CD and create a file with chapter markings that would work on the iPod properly. As it is, my Audiobooks section on the iPod is messed up because, for instance, each piece of The Hitchhiker’s Guide To The Galaxy, which is a different track on CD, gets a single file, which is thus a single entry in the Audiobooks series. To deal with that I had to create playlists for the various phases, and play them from there. Slightly suboptimal, although it works.

Now, the idea would be to be able to rip a CD (or part of a CD) in a single M4B file, audiobook-style, and add chapter markings with the tracks’ names to make the thing playable and browsable properly. Doing so with just Free Software would be the idea. Being able to have a single file for multiple CDs would also be of help. The reason why I’m willing to spend time on this rather than just using the playlists is that it seems to me like the battery of the iPod gets consumed much sooner when using multiple files, probably because it has to seek around to find them, while a single file would be loaded incrementally without spending too much time.

In this post I really don’t have much in term of ideas about implementation; I know the first thing I have to do is to find a EAC -style ripper for Linux, based on either standard cdparanoia or libcdio’s version. For those who didn’t understand my last sentence, if I recall correctly, EAC can also produce a single lossless audio file, and a CUE file where the track names are timecoded, instead of splitting the CD in multiple files per track. Starting from such a file would be optimal, since we’d just need to encode it in AAC to have the audio track of the audiobook file.

What I need to find is how the chapter information is encoded in the final file. This wouldn’t be too difficult, since the MP4 format has quite a few implementations and I already have worked on it before. The problem is that, being DRM’d, analysing the Audiobooks themselves is not the best idea. Luckily, I remembered that there is one BBC podcast that provides an MP4 file with chapter markings: Best of Chris Moyles Enhanced which most likely use the same feature. Unfortunately, the mp4dump utility provided by mpeg4ip fails to dump that file, which means that either the file is corrupt (and how does iTunes play that?) or the utility is not perfect (much more likely).

So this brings me back to something I was thinking about before, the fact that we have no GPL-compatible MP4-specific library to handle parsing and writing of MP4 files. The reason for this is most likely the fact that the standards don’t come cheap, and that most Free Software activists in the multimedia area tend to think that Xiph is always the answer (I disagree), while the pragmatic side of the multimedia area would just use Matroska (which I admit is probably my second best choice, if it was supported by actual devices). And again, please don’t tell me about Sandisk players and other flash-based stuff. I don’t want flash-based stuff! I have more than 50GB of stuff on my iPod!

Back to our discussion, I’m going to need to find or write some tool to inspect MP4 files, I don’t want to fix mpeg4ip because of MPL license it’s released under, and I also think the whole thing is quite overengineered. Unfortunately this does not really help me much since I don’t have the full specs of the format handy, and I’ll have to do a lot of guessing to get it to work. On the other hand, this should be quite an interesting project, for as soon as I have time. If you have pointers or are interested in this idea, feel free to chime in.

My stake about iTunes, iPod, Apple TV and the like

I’ve been asked a few times why do I ever use an Apple TV to watch stuff on my TV, and why I’m using an iPod and buying songs from the iTunes store. Maybe I should try to write down my opinion on the matter, which is actually quite pragmatic, I think.

I like stuff that works. Even though AppleTV requires me some fiddling, once it got the videos in, it works. And I can be assured that if I get in my bed, I can watch something, or listen to something, without further issues. Of course it has to get the stuff right first.

The iPod lasts almost a week without charging, and I listen to it almost every night, it plays my music in formats that I can deal with, on Linux, just fine: the very common AAC and the ALAC format (Apple Lossless Audio Codec). FFmpeg plays ALAC; xine and mpd use FFmpeg. And with a container that I don’t dislike. Sure it really could use some more software implemented to deal with it on Linux, like an easy way to get the album art out of it (mpd does not seem to get that), and some better tagging too; I guess I could just buy the PDFs of the standard and try to implement some library to deal with it, or extend libavformat to do that).

I have most of my music collection ripped off the original CDs I have here. I used to have it in FLAC (even though I find its container a bit flakey), then I moved to wavpack which had a series of advantages but still used a custom container format. A few months ago I moved everything to ALAC instead, having a single copy of everything and having it in a container format that is a standard (even though a bit of a hard one).

As for what concerns the iTunes (Music) Store, I’m really happy that Apple is improving it and removing the DRM, even if it means that some songs will cost more than they do now. So you cannot use it from Linux because it only works with iTunes, but the music in the “Plus” format, without DRM, work just as fine under Linux, which is basically the only thing I care about. I’d sincerely be glad to buy TV Series on there if they were without DRM and in the usual compatible format; unfortunately this does not seem to be the case, yet. I bite the bullet with audiobooks, mostly because they are at an affordable price even though they are locked in. This is mostly a pragmatic choice.

Sure I’d love if it had a web-based interface that wouldn’t require me to use iTunes to buy the songs, but it works just as fine to me as it is now, since the one alternative that everybody told me when I was looking for one was Amazon’s MP3 Store. But that does not work where I live (Italy), while the iTunes Store does. What I totally don’t agree with is the people who scream to privacy breach because of the watermarking of the music files bought from the iTunes store. Sure there is my name and my ID in the file that I downloaded, but why should I care? The file is supposed only to be used on my systems, isn’t it? I can run it on any device I own, as long as it can understand the format, and I can re-encode it on a different format for devices that don’t use that. It’s not supposed to be published I’m sure, but the only place where having those data is a problem is usually for music piracy. Which by the way is not much hindered since it’s not too difficult to just get the data out. DRM bad. Watermarking no.

On the other hand, I really cannot get on the Xiph train with Ogg, Theora and Vorbis. Sure they are open formats and all that but the fact they aren’t really working on higher end devices makes them just vendor lock-ins just as bad as DRM is, in my opinion. Since even the patent-freeness of those formats is not entirely clear yet (beside the fact that nobody challenged it for now), I don’t see the point in having my music stored in a format that my devices can’t play just for the sake of it. But, I guess, I live somewhere in the world where this is still sane enough to be dealt with.

All in all, I’d be very glad if Apple extended even more the coverage of Japanese music, since paying customs for it is pretty bad and I cannot find most of the artists I’m interested in here in Italy otherwise.

And before I’m misunderstood, I’m not trying to just do advertising for Apple, I’m just saying that pragmatically I don’t count them off just because they sell proprietary software, beside the fact as you can probably tell by other posts in my blog, I tend to use or learn from their open source pieces too. I just grow tired of people saying that one should stay away from the ITunes Store because of DRM (which is going away) or watermarking (which is a good thing in my opinion).

A story about free software and free formats.

I already said I’m quite pragmatic when it comes to multimedia formats, as I’m an happy FFmpeg user, and I tend to be able to watch anything FFmpeg decodes, without limiting myself to non-encumbered formats.

But, it’s true that not everybody shares this view, and it’s always a good idea to provide at least some alternative when you’re developing Free Software. Especially for those users less fortunate, living in places where software patents are a problem.

Strangely enough, there is one very widely used software that does not seem to provide a free format alternative (Theora) to its users, The GIMP. It supports MPEG-1 and MPEG-2 video encoding, but not Ogg/Theora.

I was contacted by Ivo Emanuel Gonçalves, from Xiph, asking me if I could take a look into it. Unfortunately my knowledge of GTK+ is near zero, and I don’t even use GIMP. But I’m sure there are at least a few GIMP users reading my blog, and hopefully a few of them might be able to look into that.

GIMP developers don’t seem to be interested in adding Theora support proactively, but they said they’d be accepting patches if they are sent their way. So if anybody reading this blog is interested, please fire me an email and I’ll forward it to Ivo (unless he wants to make his email address public here directly ;) ).

A blog tam-tam is also welcome, so that we might reach more easily some people interested in this, so… spread the word!

Introducing cowstats

No it’s not a script to find statistics on Larry, it’s a tool to get statistics for copy-on-write pages.

I’ve been writing for quite a while about memory usage, RSS memory and other stuff like that on my blog so if you want to get some more in-deep information about it, please just look around. If I start linking here all the posts I’ve made on the topic (okay the last one is not a blog post ;) ) I would probably spend the best part of the night to dig them up (I only linked here the most recent ones on the topic).

Trying to summarise for those who didn’t read my blog for all this time, let’s start with saying that a lot of software, even free software, nowadays wastes memory. When I say waste, I mean it uses memory without a good reason to, I’m not saying that software that uses lots of memory to cache or precalculate stuff and thus be faster is wasting memory, that’s just using memory. I’m not even referring to memory leaks, which are usually just bugs in the code. I’m saying that a lot of software wastes memory when it could save memory without losing performances.

The memory I declare wasted is that memory that could be shared between processes, but it’s not. That’s a waste of memory because you end up using twice or more of the memory for the same goal, which is way sub-optimal. Ben Maurer (a GNOME contributor) wrote a nice script (which is in my overlay if you want; I should finish fixing a couple of things up in the ebuild and commit it to main tree already, the deps are already in main tree) that tells you, for a given process, how much memory is not shared between processes, the so-called “dirty RSS” (RSS stands for Resident Set Size, it’s the resident memory, so the memory that the process is actually using from your RAM).

Dirty RSS is caused by “copy-on-write” pages. What is a page, and what is a copy-on-write pages? Well, memory pages are the unit used to allocate memory to processes (and to threads, and kernel systems, but let’s not go too in deep there); when a process is given a page, it usually also get some permission on that, it might be readable, writable or executable. Trying not to get too in deep on this either (I could easily write a book on this, maybe I should, actually), the important thing is that read-only pages can easily be shared between processes, and can be mapped directly from a file on-disk. This means that two process can use both the same 4KB read-only page, using just 4KB of memory, while if the same content was present in a writable page, the two processes would have their own copy of it, and would require 8KB of memory. Maybe more importantly, if the page is mapped directly from a file on-disk, when the kernel need to make space for new allocated memory, it can just get rid of the page, and then re-load it from the original file, rather than writing it down on the swap file, and then load from that.

To make it easier to load the data from the files on disk, and reduce the memory usage, modern operating systems use copy-on-write. The pages are shared as long as they are not changed from the original; when a process tries to change the content of the page, it’s copied in a new empty, writable page, and the process gets exclusive access to the page, “eating” the memory. This is the reason why using PIC shared objects usually save memory, but that’s another story entirely.

So we should reduce the amount of copy-on-write pages, trying to favour read-only sharable pages. Great, but how? Well, the common way to do so is to make sure that you mark (in C) all the constants as constant, rather than defining them as variables even if you never change their value. Even better, mark them static and constant.

But it’s not so easy to check the whole codebase of a long-developed software to mark everything constant, so there’s the need to analyse the software post-facto and identify what should be worked on. To do so I used objdump (from binutils) up to now, it’s a nice tool to have raw information about ELF files, it’s not easy, but I grew used to it so I can easily grok its output.

Focusing on ELF files, which are the executable and library files in Linux, FreeBSD and Solaris (plus other Unixes), the copy-on-write pages are those belonging, mostly, to these sections: .data, .data.rel and .bss (actually, there are more sections, like .data.local and .data.rel.ro, but let’s just consider those prefixes for now).

.data section keeps the non-stack variables (which means anything declared as static but non-constant in C source) which were initialised in the source. This is probably the cause of most waste of memory: you define a static array in C, you don’t mark it constant properly (see this for string arrays), but you never touch it after definition.

.data.rel section keeps the non-stack variables that need to be relocated at runtime. For instance it might be a static structure with a string, or a pointer to another structure or an array. Often you can’t get rid of relocations, but they have a cost in term of CPU time used, and also a cost in memory usage, as the relocation will trigger for sure the copy-on-write… unless you use prelink, but as you’ll read on that link, it’s not always a complete solution. You usually can live with these, but if you can get rid of instances here, it’s a good thing.

.bss section keeps the uninitalised non-stack variables, for instance if you declare and define a static array, but don’t fill it at once, it will be added to the .bss section. That section is mapped on the zero page (a page entirely initialised to zero, as the name suggests), with a copy-on-write: as soon as you write to the variable, a new page is allocated, and thus memory is used. Usually, runtime-initialised tables falls into this section. It’s often possible to replace them (maybe optionally) with precalculated tables, saving memory at runtime.

My cowstats script analyse a series of object files (tomorrow I’ll work on an ar parser so that it can be ran on static libraries; unfortunately it’s not possible to run it on executables or shared libraries as they tend to hide the static symbols, which are the main cause of wasted memory), looks for the symbols present in those sections, and lists them to you, or in alternative it shows you some statistics (a simple table that tells you how many bytes are used in the three sections for the various object files it was called with). This way you can easily see what variables are causing copy-on-write pages to be requested, so that you can try to change them (or the code) to avoid wasting memory.

I wrote this script because Mike asked me if I had an automated way to identify which variables to work on, after a long series of patches (many of which I have to fix and re-submit) for FFmpeg to reduce the memory usage. It’s now available at https://www.flameeyes.eu/p/ruby-elf as it’s simply a Ruby script using my ELF parser for ruby started last May. It’s nice to see that something I did some time ago for a completely different reason now comes useful again ;)

I mailed the results on my current partly-patched libavcodec, they are quite scary, it’s over 1MB of copy-on-write pages. I’ll continue working so that the numbers will come near to zero. Tomorrow I’ll also try to run the script on xine-lib’s objects, as well as xine-ui. It should be interesting.

Just as a test, I also tried running the script over libvorbis.a (extracting the files manually, as for now I have no way to access those archives through Ruby), and here are the results:

cowstats.rb: lookup.o: no .symtab section found
File name  | .data size | .bss size  | .data.rel.* size
psy.o             22848            0            0
window.o          32640            0            0
floor1.o              0            8            0
analysis.o            4            0            0
registry.o           48            0            0
Totals:
    55540 bytes of writable variables.
    8 bytes of non-initialised variables.
    0 bytes of variables needing runtime relocation.
  Total 55548 bytes of variables in copy-on-write sections

(The warning tells me that the lookup.o file has no symbols defined at all; the reason for this is that the file is under one big #ifdef; the binutils tools might be improved to avoid packing those files at all, as they can’t be used for anything, bearing no symbol… although it might be that they still can carry .init sections, I admit my ignorance here).

Now, considering the focus of libvorbis (only Vorbis decoding), it’s scary to see that there are almost 55KB of memory in writable pages; especially since, looking down to it, I found that they are due to a few tables which are never modified but are not marked as constant.

The encoding library libvorbisenc is even worse:

File name   | .data size | .bss size  | .data.rel.* size
vorbisenc.o      1720896            0            0
Totals:
    1720896 bytes of writable variables.
    0 bytes of non-initialised variables.
    0 bytes of variables needing runtime relocation.
  Total 1720896 bytes of variables in copy-on-write sections

Yes that’s about 1.7 MB of writable pages brought in by libvorbisenc per every process which uses it. And I’m unfortunately going to tell you that any xine frontend (Amarok included) might load libvorbisenc, as libavcodec has a vorbis encoder which uses libvorbisenc. Yes it’s not nice at all!

Tomorrow I’ll see to prepare a patch for libvorbis (at least) and see if Xiph will not ignore me at least this time. Once the script will be able to act on static libraries, I might just run it on all the ones I have on my system and identify the ones that really need to be worked on. This of course will have not to hinder my current jobs (I’m considering this in-deep look at memory usage part of my job as I’m probably going to need it in a course I have to teach next month), as I really need money, especially to get a newer box before end of the year, Enterprise is getting slow.

Mike, I hope you’re reading this blog, I tried to explain the thing I’ve been doing in the best way possible :)

My pragmatic view on multimedia container formats

This post is brought to you by a conversation in #amarok between me, jefferai and eean. I titled it as it is because I don’t intend to write about specific audio or video encoding formats; their are waaay out of my league, especially the video ones, and I only know a bit of theory behind the audio compressions, lossy and lossless, but nothing good enough even to compare different compressions beside a few superficial functional details.

I also call it pragmatic because what I’m going to write about is not geared toward ethics, nor it can be considered critical as I don’t know all the possible alternative choices that could have been taken, nor I know the intimate details of all the formats I’ll name. What I learnt about the formats I learnt through my work on xine and little more, so it’s not really technical either.

First of all, I’m gonna say it right now: I’m biased toward the QuickTime/MP4/ISO Media; this is due to a few different factors I’m going to explain in this post, but the main one would be that this is the only format that, since I started working on xine, gave me only one single bug (which was also easy to fix, but lets leave that alone for now).

To quote from MultimediaWiki, the QuickTime format and derived formats are the ones that are specified in the tiniest detail; unfortunately they are also often specified in conflicting ways. This is certainly a problem, but the need for translation in more human language also allows to have different interpretations that are more useful than a single specification that phrases important information badly (a non-multimedia example of these two cases are the ELF format and the symbols’ versioning description on Ulrich Drepper page, or the fantastic OpenDocument format that is still defined in incompatible ways between OpenOffice and KOffice).

The most common video container format is certainly the AVI (Audio and Video Interleaved) that was introduced by Microsoft; this format was an hack to begin with; the wat it works is probably more sheer luck rather than design ideas. It’s not unlikely that a player has to cope with broken AVI files to be useful to users. An AVI file does not cope well with multiple streams (it can somehow handle two audio streams in a single file, but very few players support that), and has no way to handle soft subtitles nicely. Beside, it has little or no metadata.

The common free alternative is of course Xiph’s Ogg; unfortunately to begin with they don’t support all possible stream types: you can’t use, for instance, mp3s or xvid streams. To fill this hole there was a nasty hack called Ogg Media that uses partly incompatible extensions that allows more stream types to be used, this adds one more check to do between these two formats; in addition to this, every stream type in Ogg files require a different header, and thus a different parser. This for instance causes xine to require libtheora presence to parse the Theora headers and extract the raw stream from the Ogg container.

A more viable alternative is the Matroska Video container (MKV); it supports basically ant format out there, it also allows to add multiple streams, and subtitles, even in really fancy formats, so it’s quite nice. Unfortunately its implementation is far from trivial; it is based upon the EBML format, that is the corresponding format to XML for binary files, extensible, yeah, but complex too. There are two reference libraries, libebml and libmatroska, that allows an easy access to Matroska files, but the libraries are C++ based and require a pretty sensible implementation of this language to work correctly, and are not welcome on many multimedia related projects. Both FFmpeg’s libavformat and xine’s implementation are quite broken; MPlayer’s also improved lately, but has its own troubles, as far as I know; VLC is certainly the best option in this.

Luca repeated me a few times to look at nut, but that’s not exactly a common format.

Even if the QuickTime format has a few idiotic flaws, like QT components themselves not able to cope with the specifics extensions like mdhd atom version 1, which also extends to FrontRow and iTunes on OSX, limiting to a few minutes the maximum lenght of a video stream when using high precision fractional timebases, like FFmpeg does, I find them less messy than the problems declared above; add to that the almost universal availability of players for this format, and you might understand why I like it better than other formats.

Also there are other things like the seektables allowing decently perfect seeking, and the fact that we got more than one Free implementation of muxers and demuxers; FFmpeg being one, then there is mpeg4ip package under MPL, as free as that is free; and finally gpac.

Unfortunately the metadata support for this format is far from trivial, contrary to vorbiscomments, but it’s also easier than ID3v2, although of this one we have already enough implementations as free software (not always good, complete and compatible with each other). For mpeg4 files, there is libmp4v2, but it’s MPL licensed so GPL incompatible, that’s why distributions don’t enable read/write support on Amarok.

To cut this entry now, that I’m again writing from the E61 while watching I, Robot for the second time, I just want to say that from a purely practical point of view, Apple’s format is the logical choice to share audio/video files between platforms.

And I think Will Smith is a great actor.

Xiph, I beg you, please do better releases

Yesterday I tried to update the cdparanoia in portage to the newest released version, 3.10_pre0. Unfortunately a) cdparanoia’s versioning scheme is crazy, the tarball is named cdparanoia-III-10_pre0.src.tgz which is, well, pretty much out of any ordering and b) the build system still sucks.

Now, I understand that Xiph did a lot of good for Free Software, with their free formats and their libraries and miscellaneous software, but they really need to start following a course on releasing.

libtheora has an horrible history with releases totally broken that should have been replaced “in a couple of days” and were left behind for months, cdparanoia was left for more than one year or two without releases, and now its latest release sounds like a joke to me, a bad one too.

The configure still use a non-standard name for gnuconfig files (configure.guess and configure.sub rather than config.guess and config.sub) and they weren’t updated, so they don’t recognise AMD64 platform out of the box. The makefiles are the usual mess that has to be fixed for parallel make.

I given up on updating cdparanoia, too much work to do for such a package, I’ll probably be looking for a simple way to make rubyripper use libcdio’s cdparanoia compatible command, and maybe I’ll finally try to make KAudioCreator to link against that rather the original version; this way I might end up without cdparanoia in my system.

On a faintly related note, today I also tried to nail down a xine-lib bug with a Vorbis file encoded with an ancient version of libvorbis; unfortunately I haven’t been able to find the cause yet, but I mostly stopped after two hours of tries because gdb was giving me an hell of an hardtime, and ended up freezing when I tried to put a watchpoint on a raw memory address.

And I think there are still troubles with FLAC 1.1.3 and xine-lib 1.1.3 (that Miguel released yesterday), but I haven’t been able to find a solution with them yet, luckily this is no regression, as FLAC files play fine still (using FFmpeg).

I need longer days (and pleasant nights — dang, I quote the Dark Tower unconsciously).