On audio formats

Over identi.ca there has been some short discussion about audio container formats.. you might remember that Ogg is an old pet peeve of me and thus I really can understand people not implementing it as it is.

What I said on the microblogging platform is that in my opinion, we should be pushing for the so-called MKA format; which is the usual Matroska format, just with a single audio track rather than both audio and video tracks. I think this is one thing that makes quite a lot of sense, especially if the track you host in it is the usual Vorbis audio. It’s not just a matter of what Ogg does wrong, but also the fact that Matroska/Vorbis is a combination that is sponsored nonetheless than by Google itself, with WebM.

Okay so WebM is actually a subset of Matroska, so there isn’t an absolute certainty that using a WebM decoder will let you play MKA, but given that most software is likely going to use one of the already-present Matroska demuxers, which play both WebM and MKA, there is really little work to be done to support this. As I have tested this before, mplayer, xine and VLC all play back MKA just fine.

It’s not even just a random chance, or a hack that might not work on the long term, or something entirely new that hasn’t been tried before. Ogg itself is able to support either a single audio track or multiple audio and video streams; heck, even the M4A format derived off Apple’s MOV format has the same design. This means that most of the software out there is already designed to work this way and does not expect a given container to always contain either only audio or audio and video.

Now, speaking about the Ogg shortcoming – on which I have ranted before – I would like to point out that a lot of self-appointed Free Software advocates are still suggesting that it’s perfectly fine as a modern format because it solves a number of issues that there are with mp3; if you’re one of those, I beg you to check your calendar: we’re in 2010. So yes, even though Ogg does solve some of the mp3 problems (but causes a few more), it doesn’t really compare with the modern used formats such as M4A/MP4 used by Apple, Sony, Android, …

Some of the limitations are actually taken directly from mp3 itself, like the design that allows for a single Ogg file to be created simply by concatenating a number of other files, or the fact that you need to know the content of the stream itself to be able to decode the containers’ data. But I think that the most iconic (sorry for the pun) is the support for album art/cover art.

If you want to add cover art data to an Ogg file, you have to add it as a comment (which is the freeform implementation of tracks’ tags for Vorbis; yes for the codec not the container); by its own design you cannot push binary data in a comment, so you have to encode it in Base64 (which is a representation of the 8-bit binary range into an ASCII-compatible alphabet). Unfortunately, encoding in this format means that the size is increased by an average 30%. This wasn’t much of a problem when you used simple icons to recognise one file from the other, but once again, we’re in 2010, and Apple actually spoiled us with their CoverFlow (and Amarok with CoverBling): you are now supposed to use high-resolution cover art in your files, to have the best effect at showing off the file you’re playing, and that 30% is now quite a lot of wasted space.

So, while Vorbis as a codec is not that bad, let’s stop pretend Ogg is a fine, modern format, and move on. We have Matroska, let’s use that!

Some personal comments about Google’s WebM

So, finally Google announced what people called for — the release as free software, and free format, of the VP8 codec as developed by On2 (the company that developed VP3 – from which Theora is derived – and that Google acquired a bit of time ago).

Now, Dark Shikari of x264 fame dissected the codec and in part the file format; his words are – not unexpectedly, especially for those who know him – quite harsh, but as Mike put it “This open sourcing event has legs.”

It bothers me, though, that people dismissed Jason’s comments as “biased FUD” from the x264 project. Let’s leave alone the fact that I think developers who insist that other FLOSS projects spread FUD about their own are just paranoid, or are just calling FUD what actually are real concerns.

Sure, nobody is denying that Jason is biased by his work on x264; I’m pretty sure he’s proud of what they have accomplished with that piece of software; on the other hand, his post is actually well-informed, and – speaking as somebody who has been reading his comments for a while – not so negative as people seem to write it off as. Sure he’s repeating that VP8 is not on technical par with H.264, but can you say he’s wrong? I don’t think so, he documented pretty well why he thinks so. He also has quite a bit of negative comments on the encoder code they released, but again that’s nothing strange; especially for the high code quality standard FFmpeg and x264 got us used to.

Some people even went as far as saying that he’s spreading FUD agreeing with MPEG-LA for what concerns the chances that some patents still apply to VP8. Jason, as far as I know, is not a lawyer – and I’d probably challenge any general lawyer to take a look at the specs, the patents, and give a perfect dissection about the chance they apply or not – but I would, in general, accept his doubts. That does not have much to say in all this, I guess.

To put the whole situation under perspective, let’s try to guess what Google’s WebM is all about:

  • getting a better – in the technical meaning of the term – codec than H.264; or
  • getting an acceptable Free codec, sidestepping Theora and compromising with H.264.

Without agreeing on one or the other, there is no way to tell whether WebM is good or not. So I’ll start with dismiss the first option, then. VP8 is not something new, they didn’t develop it in the first year or so after the acquisition of On2; it was in the work for years already, and has more or less the same age as H.264 — it’s easy demonstrated by the fact that Adobe and Sorenson are ready to support it since Day 1; if it was too new that was impossible to do.

Jason points out weaknesses in the format (ignore the encoder software for now!), such as the lack of B-frame, and the lower quality than the highest-possible H.264 options. I don’t expect those comments to come new to the Google people who worked on it (unless they are in denial), most likely, they knew they weren’t going to shoot H.264 down with this, but they accepted the compromise.

He also points out that some of the features are “copied” from H.264; that is most likely true, but there is a catch: while not being a lawyer, I remember reading that if you implement the same algorithm described by a patent but you avoid hitting parts of the claims, you’re not infringing upon it; if that’s the case, then they might have been looking at those patents and explicitly tried to null them out. Also, if patents have a minimum of common sense, once a patent describe an algorithm, patenting an almost identical one shouldn’t be possible; that would cover VP8 if it stays near enough, but not too near, H.264. But this is just pure conjecture on my part based on bits and pieces of information I have read in the past.

Some of the features, like B-frames, that could have greatly improved compression, have been avoided; did they just forget about them? Unlikely; they probably decided that B-frames weren’t something they needed. One likely option is that they wanted to avoid the (known) patent on B-frames, as Jason points out; the other is that they might have simply decided that the extra disk space and bandwidth caused by ignoring B-frames was an acceptable downside to have a format simpler to process on mobile devices in software — because in the immediate future, no phone is going to process this format in hardware.

Both Jason and Mike point out that they definitely are better than Theora; that is more than likely, given that the algorithms had a few more years to be developed. This would actually suggest that Google also didn’t consider Theora good enough for their needs; like most of the multimedia geeks have been saying all along. Similarly, they rejected the idea of using Ogg as container format, while accepting Vorbis; does that tell you something? It does to me: they needed something that actually worked (and yes that’s a post from just shy of three years ago I’m linking to) and not only something that was Free.

I have insisted for a long time that the right Free multimedia container format is Matroska, not Ogg; I speak from the point of view of a developer who fought long with demuxers in xine (because xine does not use libavformat for demuxing, so we have our own demuxers for everything), who actually read through the Ogg specification and was scared. The fact that Matroska parallels most of the QuickTime Format/ISO Media/MP4 features is one very good reason for that. I’m happy to see that Google agrees with me…

Let me comment a bit about their decision to rebrand it and reduce to a subset of features; I have sincerely not looked at the specs for the format, so I have no idea which subset is that; I read they skipped things like subtitles (which sounds strange, given that YouTube does support them), I haven’t read anybody commenting on them doing something in an incompatible way. In general, selecting a subset of the features of another format is a good way to provide easy access to decoders; any decoder able to read the super-set format (Matroska) will work properly with the reduced one. The problem will be in the muxer (encoder) software, though, to make use or not of various features.

The same has been true for the QuickTime format; generally speaking the same demuxer (and muxer) can be shared to deal with QuickTime (.mov) files, Mpeg4 files (.mp4), iTunes-compatible audio and video files (.m4a, .m4v, .m4b), 3GPP files (.3gp) and so on. Unfortunately here you don’t have a super-/sub-set split, but you actually got different dialects of the same format, which are slightly different one from the other. I hope Google will be able to avoid that!

Let me share some anecdotal evidence of problems with these formats, something that really happened to me; you might remember I wrote that a friend of mine directed a movie last year; on the day of the first projection, he exported the movie from Adobe Premiere to Mpeg4 format; then he went to create a DVD with iDVD on his MacBook – please here, no comment on the software choice, not my stuff – but … surprise! The movie was recorded, and exported, in 16:9 (widescreen) ratio, but iDVD was seeing it as 4:3 (square)!

The problem was the digital equivalent of missing anamorphic lens — the widescreen PAL 576i format uses non-square pixels, so together with the size in pixel of the frames, the container file need to describe the ratio of the pixel (16:15 for square, 64:45 for widescreen). The problem start with the various dialects use different “atoms” to encode this information — iDVD is unable to fetch it, the way Adobe writes it. Luckily, FFmpeg saved the day: a 9-second processing with FFmpeg, remuxing the file in iTunes-compatible QuickTime-derived format solved the issue.

This is why with these formats a single, adapting demuxer can be used — but a series of different muxers is needed. As I said, I sure hope Google will not make WebM behave the same way.

Beside that, I’m looking forward to the use of WebM: it would be a nice way to replace the (to me, useless) Theora with something that, even though not perfect, sure comes much nearer. The (lousy) quality of the encoder does not scare me, as Mike again said at some point FFmpeg will get from-scratch decoders and encoders, which will strive for the best results — incidentally, I’d say that x264 is one of the best encoders because it is not proprietary software; proprietary developers tend only to care about having something working; Free Software developers want something working well and something they can be proud of.

Ganbatte, Google!

A story about free software and free formats.

I already said I’m quite pragmatic when it comes to multimedia formats, as I’m an happy FFmpeg user, and I tend to be able to watch anything FFmpeg decodes, without limiting myself to non-encumbered formats.

But, it’s true that not everybody shares this view, and it’s always a good idea to provide at least some alternative when you’re developing Free Software. Especially for those users less fortunate, living in places where software patents are a problem.

Strangely enough, there is one very widely used software that does not seem to provide a free format alternative (Theora) to its users, The GIMP. It supports MPEG-1 and MPEG-2 video encoding, but not Ogg/Theora.

I was contacted by Ivo Emanuel Gonçalves, from Xiph, asking me if I could take a look into it. Unfortunately my knowledge of GTK+ is near zero, and I don’t even use GIMP. But I’m sure there are at least a few GIMP users reading my blog, and hopefully a few of them might be able to look into that.

GIMP developers don’t seem to be interested in adding Theora support proactively, but they said they’d be accepting patches if they are sent their way. So if anybody reading this blog is interested, please fire me an email and I’ll forward it to Ivo (unless he wants to make his email address public here directly ;) ).

A blog tam-tam is also welcome, so that we might reach more easily some people interested in this, so… spread the word!

My pragmatic view on multimedia container formats

This post is brought to you by a conversation in #amarok between me, jefferai and eean. I titled it as it is because I don’t intend to write about specific audio or video encoding formats; their are waaay out of my league, especially the video ones, and I only know a bit of theory behind the audio compressions, lossy and lossless, but nothing good enough even to compare different compressions beside a few superficial functional details.

I also call it pragmatic because what I’m going to write about is not geared toward ethics, nor it can be considered critical as I don’t know all the possible alternative choices that could have been taken, nor I know the intimate details of all the formats I’ll name. What I learnt about the formats I learnt through my work on xine and little more, so it’s not really technical either.

First of all, I’m gonna say it right now: I’m biased toward the QuickTime/MP4/ISO Media; this is due to a few different factors I’m going to explain in this post, but the main one would be that this is the only format that, since I started working on xine, gave me only one single bug (which was also easy to fix, but lets leave that alone for now).

To quote from MultimediaWiki, the QuickTime format and derived formats are the ones that are specified in the tiniest detail; unfortunately they are also often specified in conflicting ways. This is certainly a problem, but the need for translation in more human language also allows to have different interpretations that are more useful than a single specification that phrases important information badly (a non-multimedia example of these two cases are the ELF format and the symbols’ versioning description on Ulrich Drepper page, or the fantastic OpenDocument format that is still defined in incompatible ways between OpenOffice and KOffice).

The most common video container format is certainly the AVI (Audio and Video Interleaved) that was introduced by Microsoft; this format was an hack to begin with; the wat it works is probably more sheer luck rather than design ideas. It’s not unlikely that a player has to cope with broken AVI files to be useful to users. An AVI file does not cope well with multiple streams (it can somehow handle two audio streams in a single file, but very few players support that), and has no way to handle soft subtitles nicely. Beside, it has little or no metadata.

The common free alternative is of course Xiph’s Ogg; unfortunately to begin with they don’t support all possible stream types: you can’t use, for instance, mp3s or xvid streams. To fill this hole there was a nasty hack called Ogg Media that uses partly incompatible extensions that allows more stream types to be used, this adds one more check to do between these two formats; in addition to this, every stream type in Ogg files require a different header, and thus a different parser. This for instance causes xine to require libtheora presence to parse the Theora headers and extract the raw stream from the Ogg container.

A more viable alternative is the Matroska Video container (MKV); it supports basically ant format out there, it also allows to add multiple streams, and subtitles, even in really fancy formats, so it’s quite nice. Unfortunately its implementation is far from trivial; it is based upon the EBML format, that is the corresponding format to XML for binary files, extensible, yeah, but complex too. There are two reference libraries, libebml and libmatroska, that allows an easy access to Matroska files, but the libraries are C++ based and require a pretty sensible implementation of this language to work correctly, and are not welcome on many multimedia related projects. Both FFmpeg’s libavformat and xine’s implementation are quite broken; MPlayer’s also improved lately, but has its own troubles, as far as I know; VLC is certainly the best option in this.

Luca repeated me a few times to look at nut, but that’s not exactly a common format.

Even if the QuickTime format has a few idiotic flaws, like QT components themselves not able to cope with the specifics extensions like mdhd atom version 1, which also extends to FrontRow and iTunes on OSX, limiting to a few minutes the maximum lenght of a video stream when using high precision fractional timebases, like FFmpeg does, I find them less messy than the problems declared above; add to that the almost universal availability of players for this format, and you might understand why I like it better than other formats.

Also there are other things like the seektables allowing decently perfect seeking, and the fact that we got more than one Free implementation of muxers and demuxers; FFmpeg being one, then there is mpeg4ip package under MPL, as free as that is free; and finally gpac.

Unfortunately the metadata support for this format is far from trivial, contrary to vorbiscomments, but it’s also easier than ID3v2, although of this one we have already enough implementations as free software (not always good, complete and compatible with each other). For mpeg4 files, there is libmp4v2, but it’s MPL licensed so GPL incompatible, that’s why distributions don’t enable read/write support on Amarok.

To cut this entry now, that I’m again writing from the E61 while watching I, Robot for the second time, I just want to say that from a purely practical point of view, Apple’s format is the logical choice to share audio/video files between platforms.

And I think Will Smith is a great actor.

Working on a few things

Let’s start by saying that at this very moment my throat hurts, badly, either caused by cold or by the smoke from my boiler (it’s still a wood-fed boiler), my foot also hurts a bit, because of the whole sunday passed with my good shoes on (should have taken the other ones I supposed), and I don’t want to get started on how my personal life is lately.

But I haven’t been taking naps all day, although I’d like to. First of all, since yesterday I got XCB and libX11/XCB working on Gentoo/FreeBSD. The patch is still to be polished before being merged, I’ll do so later on tonight or tomorrow.

I’ve also been working on xine’s FLAC support so to fix both the OggFlac problem and the refusal of FLAC files starting with an IDE3 block rather than the Streaminfo one (that are anyway invalid FLAC files, but that’s beside the point).

I have to say, I never thought a container format would be that messed up as Ogg… every stream type has its own headers, in the case of FLAC, it uses FLAC’s own metadata blocks. And FLAC is a masochist format for audio… It’s a lossless codec, so the size of the encoded files is usually not that low, but the metadata blocks tries to save a few bytes by not aligning the data structures to bytes.. I gone crazy trying to read a value of size 20 bits using bitfields or other similar approach, there’s no way: FLAC’s metadata blocks are not mappable against a machine-independent C structure, which makes parsing quite hard.

Anyway, I’ll try to complete these works as soon as I can, also because I need to return to my paid job as soon as I feel better.. and because when I cannot do something because of the feverish mood I’m in I start to be frustrated and depress myself… so if I’m not able to get something working soon, I’ll start screaming by rage :P

On the bright side, John Palmieri announce D-Bus 1.0 release that hopefully will work out of the box on FreeBSD :)