Why prefixed ID3v2 tags are extremely evil

Some of Amarok and xine users might remember with older xine-lib versions the problem of some FLAC files that didn’t play at all because of prefixed ID3v2 which made the demuxer bail out. That problem was fixed in 1.1.3 and later.

Tonight an user on #amarok complained of another file not playing, but it was an AAC file – not an m4a file – which is currently quite heuristic in xine to be detected. I looked up the file provided by the user and I seen it also having a similar tag, that made tickle my spider sense.

I tried just skipping over the ID3 tag, but that didn’t help. A deeper check shown me that the problem was nastier. Only a 4KB preview buffer is checked by the demuxer; the tag was 1600 bytes long which meant only a smaller piece of data was actually checked.. Not enough to find the two frames the demuxer was looking for. To solve this problem I added an extra check to skip over the ID3 header if the stream is seekable; hopefully the tag would be present only on files on disk, most streams use HTTP headers to provide author and title.

So this is the summary of why prefixed ID3v2 tags are evil: when trying to detect the type of a stream you usually at the headers at the start of the data. When the tag is present, you need to read, or skip, a lot more bytes, before getting to the proper data.