On audio formats

Over identi.ca there has been some short discussion about audio container formats.. you might remember that Ogg is an old pet peeve of me and thus I really can understand people not implementing it as it is.

What I said on the microblogging platform is that in my opinion, we should be pushing for the so-called MKA format; which is the usual Matroska format, just with a single audio track rather than both audio and video tracks. I think this is one thing that makes quite a lot of sense, especially if the track you host in it is the usual Vorbis audio. It’s not just a matter of what Ogg does wrong, but also the fact that Matroska/Vorbis is a combination that is sponsored nonetheless than by Google itself, with WebM.

Okay so WebM is actually a subset of Matroska, so there isn’t an absolute certainty that using a WebM decoder will let you play MKA, but given that most software is likely going to use one of the already-present Matroska demuxers, which play both WebM and MKA, there is really little work to be done to support this. As I have tested this before, mplayer, xine and VLC all play back MKA just fine.

It’s not even just a random chance, or a hack that might not work on the long term, or something entirely new that hasn’t been tried before. Ogg itself is able to support either a single audio track or multiple audio and video streams; heck, even the M4A format derived off Apple’s MOV format has the same design. This means that most of the software out there is already designed to work this way and does not expect a given container to always contain either only audio or audio and video.

Now, speaking about the Ogg shortcoming – on which I have ranted before – I would like to point out that a lot of self-appointed Free Software advocates are still suggesting that it’s perfectly fine as a modern format because it solves a number of issues that there are with mp3; if you’re one of those, I beg you to check your calendar: we’re in 2010. So yes, even though Ogg does solve some of the mp3 problems (but causes a few more), it doesn’t really compare with the modern used formats such as M4A/MP4 used by Apple, Sony, Android, …

Some of the limitations are actually taken directly from mp3 itself, like the design that allows for a single Ogg file to be created simply by concatenating a number of other files, or the fact that you need to know the content of the stream itself to be able to decode the containers’ data. But I think that the most iconic (sorry for the pun) is the support for album art/cover art.

If you want to add cover art data to an Ogg file, you have to add it as a comment (which is the freeform implementation of tracks’ tags for Vorbis; yes for the codec not the container); by its own design you cannot push binary data in a comment, so you have to encode it in Base64 (which is a representation of the 8-bit binary range into an ASCII-compatible alphabet). Unfortunately, encoding in this format means that the size is increased by an average 30%. This wasn’t much of a problem when you used simple icons to recognise one file from the other, but once again, we’re in 2010, and Apple actually spoiled us with their CoverFlow (and Amarok with CoverBling): you are now supposed to use high-resolution cover art in your files, to have the best effect at showing off the file you’re playing, and that 30% is now quite a lot of wasted space.

So, while Vorbis as a codec is not that bad, let’s stop pretend Ogg is a fine, modern format, and move on. We have Matroska, let’s use that!

Some personal comments about Google’s WebM

So, finally Google announced what people called for — the release as free software, and free format, of the VP8 codec as developed by On2 (the company that developed VP3 – from which Theora is derived – and that Google acquired a bit of time ago).

Now, Dark Shikari of x264 fame dissected the codec and in part the file format; his words are – not unexpectedly, especially for those who know him – quite harsh, but as Mike put it “This open sourcing event has legs.”

It bothers me, though, that people dismissed Jason’s comments as “biased FUD” from the x264 project. Let’s leave alone the fact that I think developers who insist that other FLOSS projects spread FUD about their own are just paranoid, or are just calling FUD what actually are real concerns.

Sure, nobody is denying that Jason is biased by his work on x264; I’m pretty sure he’s proud of what they have accomplished with that piece of software; on the other hand, his post is actually well-informed, and – speaking as somebody who has been reading his comments for a while – not so negative as people seem to write it off as. Sure he’s repeating that VP8 is not on technical par with H.264, but can you say he’s wrong? I don’t think so, he documented pretty well why he thinks so. He also has quite a bit of negative comments on the encoder code they released, but again that’s nothing strange; especially for the high code quality standard FFmpeg and x264 got us used to.

Some people even went as far as saying that he’s spreading FUD agreeing with MPEG-LA for what concerns the chances that some patents still apply to VP8. Jason, as far as I know, is not a lawyer – and I’d probably challenge any general lawyer to take a look at the specs, the patents, and give a perfect dissection about the chance they apply or not – but I would, in general, accept his doubts. That does not have much to say in all this, I guess.

To put the whole situation under perspective, let’s try to guess what Google’s WebM is all about:

  • getting a better – in the technical meaning of the term – codec than H.264; or
  • getting an acceptable Free codec, sidestepping Theora and compromising with H.264.

Without agreeing on one or the other, there is no way to tell whether WebM is good or not. So I’ll start with dismiss the first option, then. VP8 is not something new, they didn’t develop it in the first year or so after the acquisition of On2; it was in the work for years already, and has more or less the same age as H.264 — it’s easy demonstrated by the fact that Adobe and Sorenson are ready to support it since Day 1; if it was too new that was impossible to do.

Jason points out weaknesses in the format (ignore the encoder software for now!), such as the lack of B-frame, and the lower quality than the highest-possible H.264 options. I don’t expect those comments to come new to the Google people who worked on it (unless they are in denial), most likely, they knew they weren’t going to shoot H.264 down with this, but they accepted the compromise.

He also points out that some of the features are “copied” from H.264; that is most likely true, but there is a catch: while not being a lawyer, I remember reading that if you implement the same algorithm described by a patent but you avoid hitting parts of the claims, you’re not infringing upon it; if that’s the case, then they might have been looking at those patents and explicitly tried to null them out. Also, if patents have a minimum of common sense, once a patent describe an algorithm, patenting an almost identical one shouldn’t be possible; that would cover VP8 if it stays near enough, but not too near, H.264. But this is just pure conjecture on my part based on bits and pieces of information I have read in the past.

Some of the features, like B-frames, that could have greatly improved compression, have been avoided; did they just forget about them? Unlikely; they probably decided that B-frames weren’t something they needed. One likely option is that they wanted to avoid the (known) patent on B-frames, as Jason points out; the other is that they might have simply decided that the extra disk space and bandwidth caused by ignoring B-frames was an acceptable downside to have a format simpler to process on mobile devices in software — because in the immediate future, no phone is going to process this format in hardware.

Both Jason and Mike point out that they definitely are better than Theora; that is more than likely, given that the algorithms had a few more years to be developed. This would actually suggest that Google also didn’t consider Theora good enough for their needs; like most of the multimedia geeks have been saying all along. Similarly, they rejected the idea of using Ogg as container format, while accepting Vorbis; does that tell you something? It does to me: they needed something that actually worked (and yes that’s a post from just shy of three years ago I’m linking to) and not only something that was Free.

I have insisted for a long time that the right Free multimedia container format is Matroska, not Ogg; I speak from the point of view of a developer who fought long with demuxers in xine (because xine does not use libavformat for demuxing, so we have our own demuxers for everything), who actually read through the Ogg specification and was scared. The fact that Matroska parallels most of the QuickTime Format/ISO Media/MP4 features is one very good reason for that. I’m happy to see that Google agrees with me…

Let me comment a bit about their decision to rebrand it and reduce to a subset of features; I have sincerely not looked at the specs for the format, so I have no idea which subset is that; I read they skipped things like subtitles (which sounds strange, given that YouTube does support them), I haven’t read anybody commenting on them doing something in an incompatible way. In general, selecting a subset of the features of another format is a good way to provide easy access to decoders; any decoder able to read the super-set format (Matroska) will work properly with the reduced one. The problem will be in the muxer (encoder) software, though, to make use or not of various features.

The same has been true for the QuickTime format; generally speaking the same demuxer (and muxer) can be shared to deal with QuickTime (.mov) files, Mpeg4 files (.mp4), iTunes-compatible audio and video files (.m4a, .m4v, .m4b), 3GPP files (.3gp) and so on. Unfortunately here you don’t have a super-/sub-set split, but you actually got different dialects of the same format, which are slightly different one from the other. I hope Google will be able to avoid that!

Let me share some anecdotal evidence of problems with these formats, something that really happened to me; you might remember I wrote that a friend of mine directed a movie last year; on the day of the first projection, he exported the movie from Adobe Premiere to Mpeg4 format; then he went to create a DVD with iDVD on his MacBook – please here, no comment on the software choice, not my stuff – but … surprise! The movie was recorded, and exported, in 16:9 (widescreen) ratio, but iDVD was seeing it as 4:3 (square)!

The problem was the digital equivalent of missing anamorphic lens — the widescreen PAL 576i format uses non-square pixels, so together with the size in pixel of the frames, the container file need to describe the ratio of the pixel (16:15 for square, 64:45 for widescreen). The problem start with the various dialects use different “atoms” to encode this information — iDVD is unable to fetch it, the way Adobe writes it. Luckily, FFmpeg saved the day: a 9-second processing with FFmpeg, remuxing the file in iTunes-compatible QuickTime-derived format solved the issue.

This is why with these formats a single, adapting demuxer can be used — but a series of different muxers is needed. As I said, I sure hope Google will not make WebM behave the same way.

Beside that, I’m looking forward to the use of WebM: it would be a nice way to replace the (to me, useless) Theora with something that, even though not perfect, sure comes much nearer. The (lousy) quality of the encoder does not scare me, as Mike again said at some point FFmpeg will get from-scratch decoders and encoders, which will strive for the best results — incidentally, I’d say that x264 is one of the best encoders because it is not proprietary software; proprietary developers tend only to care about having something working; Free Software developers want something working well and something they can be proud of.

Ganbatte, Google!

The mad muxer

I have expressed my preference for the MP4 format some time ago, which was to be meant mostly over the Ogg format from Xiph, most of the custom audio container formats like FLAC and WavPack (to a point; WV’s format could be much worse), and of course the AVI format. Although I do find quite a few advantages of MP4 over Matroska, I don’t despise that format at all.

The main problem I see with Matroska, actually, is the not so widely available implementation; I cannot play a Matroska video on either my cellphone (which, if somewhere were to think about it, is a Nokia E71 right now, not an iPhone, and will probably never be, I have my limits!), the PlayStation3 or the PSP. Up to last month, I couldn’t play it with my AppleTV either, which was quite a bit of a problem. Especially considering a lot of Anime fansubbers use that.

Now, since I was actually quite bored by the fact that, even though I was transcoding them, a lot of subtitles didn’t appear properly, I decided to try out XBMC; it was quite a pleasing experience actually, the Linux-based patch stick works like a charm, without going too much in the way of infringing Apple’s license as far as I can see, and the installation is quite quick. Unfortunately the latest software update on AppleTV (2.3.1) seems to have broken xbmc, which now starts no longer in fullscreen but rather in a Quartz window in the upper half of the screen.

So I ditched also XBMC (for now) for a derivative, boxee which, while being partially proprietaryware, seems to be a little more fine-tuned for AppleTV; it’s still more free software than the original operating system. Both XBMC and Boxee solve my subtitles problem since they both have a video calibration setup that allows to tell the software how much overscan the LCD panel is doing, and which pixel size ratio it has. Quite cool, Apple should really have done that too.

Also, I started using MediaTomb to serve my content without having to copy it on the AppleTV itself; this is working fine since I’m using ethernet-over-powerline adapters to connect the office with my bedroom, and thus there is enough bandwidth to stream it over. Unfortunately, here starts the first problem: while I was able somehow to get XBMC to play AVI files with external .srt subtitles, it fails on Boxee. Since the whole thing is bothersome anyway, I wanted to try an alternative: remux the content without re-encoding it, in Matroska Video files, with the subtitles embedded in them as a third track.

This seems to work fine from an AVI file, but fails badly when the source is an MP4 file, the resulting file seems corrupted with MPlayer and crashes Totem/GStreamer (that’s no news, my dmesg output fills easily with Totem’s thumbnailer’s crashes when I open my video library). Also, I have been yet unable to properly set the language of the tracks, which would help me to have the jap-sub-eng setup automatic on XBMC. If somebody knows how to do that properly, I’d be glad.

Anyway, there it goes another remuxing of the video library…

Encoding iPod-compatible audiobooks with Free Software

Since in the last few days I’ve been able to rest also thanks to the new earphones I’ve finally been able to think again of multimedia as well as Gentoo. But just to preserve my sanity, and to make sure I do something I can reuse to rest even better, I decided to look into something new, and something that I would like to solve if I could. Generating iPod-compatible audibook files from the BBC Radio CDs I got.

The audiobook that you buy from the iTunes Store are usually downloaded in multiple files, one per CD of the original CD release, sometimes with chapter markings to be able to skip around. Unfortunately they also are DRM’d so analysing them is quite a bit of a mess, and I didn’t go to much extent to identify how that is achieved. The reason why I’d like to find, or document, the audibook format is a two-fold interoperability idea. The first part is being able to play iPod-compatible audiobooks with Free Software with the same chapter marking system working, and the other is (to me more concerning to be honest) being able to rip a CD and create a file with chapter markings that would work on the iPod properly. As it is, my Audiobooks section on the iPod is messed up because, for instance, each piece of The Hitchhiker’s Guide To The Galaxy, which is a different track on CD, gets a single file, which is thus a single entry in the Audiobooks series. To deal with that I had to create playlists for the various phases, and play them from there. Slightly suboptimal, although it works.

Now, the idea would be to be able to rip a CD (or part of a CD) in a single M4B file, audiobook-style, and add chapter markings with the tracks’ names to make the thing playable and browsable properly. Doing so with just Free Software would be the idea. Being able to have a single file for multiple CDs would also be of help. The reason why I’m willing to spend time on this rather than just using the playlists is that it seems to me like the battery of the iPod gets consumed much sooner when using multiple files, probably because it has to seek around to find them, while a single file would be loaded incrementally without spending too much time.

In this post I really don’t have much in term of ideas about implementation; I know the first thing I have to do is to find a EAC -style ripper for Linux, based on either standard cdparanoia or libcdio’s version. For those who didn’t understand my last sentence, if I recall correctly, EAC can also produce a single lossless audio file, and a CUE file where the track names are timecoded, instead of splitting the CD in multiple files per track. Starting from such a file would be optimal, since we’d just need to encode it in AAC to have the audio track of the audiobook file.

What I need to find is how the chapter information is encoded in the final file. This wouldn’t be too difficult, since the MP4 format has quite a few implementations and I already have worked on it before. The problem is that, being DRM’d, analysing the Audiobooks themselves is not the best idea. Luckily, I remembered that there is one BBC podcast that provides an MP4 file with chapter markings: Best of Chris Moyles Enhanced which most likely use the same feature. Unfortunately, the mp4dump utility provided by mpeg4ip fails to dump that file, which means that either the file is corrupt (and how does iTunes play that?) or the utility is not perfect (much more likely).

So this brings me back to something I was thinking about before, the fact that we have no GPL-compatible MP4-specific library to handle parsing and writing of MP4 files. The reason for this is most likely the fact that the standards don’t come cheap, and that most Free Software activists in the multimedia area tend to think that Xiph is always the answer (I disagree), while the pragmatic side of the multimedia area would just use Matroska (which I admit is probably my second best choice, if it was supported by actual devices). And again, please don’t tell me about Sandisk players and other flash-based stuff. I don’t want flash-based stuff! I have more than 50GB of stuff on my iPod!

Back to our discussion, I’m going to need to find or write some tool to inspect MP4 files, I don’t want to fix mpeg4ip because of MPL license it’s released under, and I also think the whole thing is quite overengineered. Unfortunately this does not really help me much since I don’t have the full specs of the format handy, and I’ll have to do a lot of guessing to get it to work. On the other hand, this should be quite an interesting project, for as soon as I have time. If you have pointers or are interested in this idea, feel free to chime in.