Some personal comments about Google’s WebM

So, finally Google announced what people called for — the release as free software, and free format, of the VP8 codec as developed by On2 (the company that developed VP3 – from which Theora is derived – and that Google acquired a bit of time ago).

Now, Dark Shikari of x264 fame dissected the codec and in part the file format; his words are – not unexpectedly, especially for those who know him – quite harsh, but as Mike put it “This open sourcing event has legs.”

It bothers me, though, that people dismissed Jason’s comments as “biased FUD” from the x264 project. Let’s leave alone the fact that I think developers who insist that other FLOSS projects spread FUD about their own are just paranoid, or are just calling FUD what actually are real concerns.

Sure, nobody is denying that Jason is biased by his work on x264; I’m pretty sure he’s proud of what they have accomplished with that piece of software; on the other hand, his post is actually well-informed, and – speaking as somebody who has been reading his comments for a while – not so negative as people seem to write it off as. Sure he’s repeating that VP8 is not on technical par with H.264, but can you say he’s wrong? I don’t think so, he documented pretty well why he thinks so. He also has quite a bit of negative comments on the encoder code they released, but again that’s nothing strange; especially for the high code quality standard FFmpeg and x264 got us used to.

Some people even went as far as saying that he’s spreading FUD agreeing with MPEG-LA for what concerns the chances that some patents still apply to VP8. Jason, as far as I know, is not a lawyer – and I’d probably challenge any general lawyer to take a look at the specs, the patents, and give a perfect dissection about the chance they apply or not – but I would, in general, accept his doubts. That does not have much to say in all this, I guess.

To put the whole situation under perspective, let’s try to guess what Google’s WebM is all about:

  • getting a better – in the technical meaning of the term – codec than H.264; or
  • getting an acceptable Free codec, sidestepping Theora and compromising with H.264.

Without agreeing on one or the other, there is no way to tell whether WebM is good or not. So I’ll start with dismiss the first option, then. VP8 is not something new, they didn’t develop it in the first year or so after the acquisition of On2; it was in the work for years already, and has more or less the same age as H.264 — it’s easy demonstrated by the fact that Adobe and Sorenson are ready to support it since Day 1; if it was too new that was impossible to do.

Jason points out weaknesses in the format (ignore the encoder software for now!), such as the lack of B-frame, and the lower quality than the highest-possible H.264 options. I don’t expect those comments to come new to the Google people who worked on it (unless they are in denial), most likely, they knew they weren’t going to shoot H.264 down with this, but they accepted the compromise.

He also points out that some of the features are “copied” from H.264; that is most likely true, but there is a catch: while not being a lawyer, I remember reading that if you implement the same algorithm described by a patent but you avoid hitting parts of the claims, you’re not infringing upon it; if that’s the case, then they might have been looking at those patents and explicitly tried to null them out. Also, if patents have a minimum of common sense, once a patent describe an algorithm, patenting an almost identical one shouldn’t be possible; that would cover VP8 if it stays near enough, but not too near, H.264. But this is just pure conjecture on my part based on bits and pieces of information I have read in the past.

Some of the features, like B-frames, that could have greatly improved compression, have been avoided; did they just forget about them? Unlikely; they probably decided that B-frames weren’t something they needed. One likely option is that they wanted to avoid the (known) patent on B-frames, as Jason points out; the other is that they might have simply decided that the extra disk space and bandwidth caused by ignoring B-frames was an acceptable downside to have a format simpler to process on mobile devices in software — because in the immediate future, no phone is going to process this format in hardware.

Both Jason and Mike point out that they definitely are better than Theora; that is more than likely, given that the algorithms had a few more years to be developed. This would actually suggest that Google also didn’t consider Theora good enough for their needs; like most of the multimedia geeks have been saying all along. Similarly, they rejected the idea of using Ogg as container format, while accepting Vorbis; does that tell you something? It does to me: they needed something that actually worked (and yes that’s a post from just shy of three years ago I’m linking to) and not only something that was Free.

I have insisted for a long time that the right Free multimedia container format is Matroska, not Ogg; I speak from the point of view of a developer who fought long with demuxers in xine (because xine does not use libavformat for demuxing, so we have our own demuxers for everything), who actually read through the Ogg specification and was scared. The fact that Matroska parallels most of the QuickTime Format/ISO Media/MP4 features is one very good reason for that. I’m happy to see that Google agrees with me…

Let me comment a bit about their decision to rebrand it and reduce to a subset of features; I have sincerely not looked at the specs for the format, so I have no idea which subset is that; I read they skipped things like subtitles (which sounds strange, given that YouTube does support them), I haven’t read anybody commenting on them doing something in an incompatible way. In general, selecting a subset of the features of another format is a good way to provide easy access to decoders; any decoder able to read the super-set format (Matroska) will work properly with the reduced one. The problem will be in the muxer (encoder) software, though, to make use or not of various features.

The same has been true for the QuickTime format; generally speaking the same demuxer (and muxer) can be shared to deal with QuickTime (.mov) files, Mpeg4 files (.mp4), iTunes-compatible audio and video files (.m4a, .m4v, .m4b), 3GPP files (.3gp) and so on. Unfortunately here you don’t have a super-/sub-set split, but you actually got different dialects of the same format, which are slightly different one from the other. I hope Google will be able to avoid that!

Let me share some anecdotal evidence of problems with these formats, something that really happened to me; you might remember I wrote that a friend of mine directed a movie last year; on the day of the first projection, he exported the movie from Adobe Premiere to Mpeg4 format; then he went to create a DVD with iDVD on his MacBook – please here, no comment on the software choice, not my stuff – but … surprise! The movie was recorded, and exported, in 16:9 (widescreen) ratio, but iDVD was seeing it as 4:3 (square)!

The problem was the digital equivalent of missing anamorphic lens — the widescreen PAL 576i format uses non-square pixels, so together with the size in pixel of the frames, the container file need to describe the ratio of the pixel (16:15 for square, 64:45 for widescreen). The problem start with the various dialects use different “atoms” to encode this information — iDVD is unable to fetch it, the way Adobe writes it. Luckily, FFmpeg saved the day: a 9-second processing with FFmpeg, remuxing the file in iTunes-compatible QuickTime-derived format solved the issue.

This is why with these formats a single, adapting demuxer can be used — but a series of different muxers is needed. As I said, I sure hope Google will not make WebM behave the same way.

Beside that, I’m looking forward to the use of WebM: it would be a nice way to replace the (to me, useless) Theora with something that, even though not perfect, sure comes much nearer. The (lousy) quality of the encoder does not scare me, as Mike again said at some point FFmpeg will get from-scratch decoders and encoders, which will strive for the best results — incidentally, I’d say that x264 is one of the best encoders because it is not proprietary software; proprietary developers tend only to care about having something working; Free Software developers want something working well and something they can be proud of.

Ganbatte, Google!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s