My pragmatic view on multimedia container formats

This post is brought to you by a conversation in #amarok between me, jefferai and eean. I titled it as it is because I don’t intend to write about specific audio or video encoding formats; their are waaay out of my league, especially the video ones, and I only know a bit of theory behind the audio compressions, lossy and lossless, but nothing good enough even to compare different compressions beside a few superficial functional details.

I also call it pragmatic because what I’m going to write about is not geared toward ethics, nor it can be considered critical as I don’t know all the possible alternative choices that could have been taken, nor I know the intimate details of all the formats I’ll name. What I learnt about the formats I learnt through my work on xine and little more, so it’s not really technical either.

First of all, I’m gonna say it right now: I’m biased toward the QuickTime/MP4/ISO Media; this is due to a few different factors I’m going to explain in this post, but the main one would be that this is the only format that, since I started working on xine, gave me only one single bug (which was also easy to fix, but lets leave that alone for now).

To quote from MultimediaWiki, the QuickTime format and derived formats are the ones that are specified in the tiniest detail; unfortunately they are also often specified in conflicting ways. This is certainly a problem, but the need for translation in more human language also allows to have different interpretations that are more useful than a single specification that phrases important information badly (a non-multimedia example of these two cases are the ELF format and the symbols’ versioning description on Ulrich Drepper page, or the fantastic OpenDocument format that is still defined in incompatible ways between OpenOffice and KOffice).

The most common video container format is certainly the AVI (Audio and Video Interleaved) that was introduced by Microsoft; this format was an hack to begin with; the wat it works is probably more sheer luck rather than design ideas. It’s not unlikely that a player has to cope with broken AVI files to be useful to users. An AVI file does not cope well with multiple streams (it can somehow handle two audio streams in a single file, but very few players support that), and has no way to handle soft subtitles nicely. Beside, it has little or no metadata.

The common free alternative is of course Xiph’s Ogg; unfortunately to begin with they don’t support all possible stream types: you can’t use, for instance, mp3s or xvid streams. To fill this hole there was a nasty hack called Ogg Media that uses partly incompatible extensions that allows more stream types to be used, this adds one more check to do between these two formats; in addition to this, every stream type in Ogg files require a different header, and thus a different parser. This for instance causes xine to require libtheora presence to parse the Theora headers and extract the raw stream from the Ogg container.

A more viable alternative is the Matroska Video container (MKV); it supports basically ant format out there, it also allows to add multiple streams, and subtitles, even in really fancy formats, so it’s quite nice. Unfortunately its implementation is far from trivial; it is based upon the EBML format, that is the corresponding format to XML for binary files, extensible, yeah, but complex too. There are two reference libraries, libebml and libmatroska, that allows an easy access to Matroska files, but the libraries are C++ based and require a pretty sensible implementation of this language to work correctly, and are not welcome on many multimedia related projects. Both FFmpeg’s libavformat and xine’s implementation are quite broken; MPlayer’s also improved lately, but has its own troubles, as far as I know; VLC is certainly the best option in this.

Luca repeated me a few times to look at nut, but that’s not exactly a common format.

Even if the QuickTime format has a few idiotic flaws, like QT components themselves not able to cope with the specifics extensions like mdhd atom version 1, which also extends to FrontRow and iTunes on OSX, limiting to a few minutes the maximum lenght of a video stream when using high precision fractional timebases, like FFmpeg does, I find them less messy than the problems declared above; add to that the almost universal availability of players for this format, and you might understand why I like it better than other formats.

Also there are other things like the seektables allowing decently perfect seeking, and the fact that we got more than one Free implementation of muxers and demuxers; FFmpeg being one, then there is mpeg4ip package under MPL, as free as that is free; and finally gpac.

Unfortunately the metadata support for this format is far from trivial, contrary to vorbiscomments, but it’s also easier than ID3v2, although of this one we have already enough implementations as free software (not always good, complete and compatible with each other). For mpeg4 files, there is libmp4v2, but it’s MPL licensed so GPL incompatible, that’s why distributions don’t enable read/write support on Amarok.

To cut this entry now, that I’m again writing from the E61 while watching I, Robot for the second time, I just want to say that from a purely practical point of view, Apple’s format is the logical choice to share audio/video files between platforms.

And I think Will Smith is a great actor.