I just read Klausman’s entry about compression algorithms comparison, and while I’m no expert at all in the field of compression algorithms, I wanted to talk a bit about it myself, from a power user point of view.
Tobias’s benchmarks are quite interesting, although quite similar in nature to many others you can find out there comparing lzma to gzip and bzip2. One thing I found nice for him to explicit is that lzma is good when you decompress more than compress. This is something a lot of people tend to skip over, causing some quite catastrophic (in my view) results.
Keeping this in mind, you can see that lzma is not really good when you compress as many times (or more) than you compress. When would that happen is the central point of this. You certainly expect a backup system to compress a lot more than decompress, as you want to take daily (or more frequent) backups, but the hope is never to need to restore one of those. For Gentoo users, another place where they compress more than decompress is for manpages and documentation. They are compressed every time you merge something, but you don’t tend to read all the manpages and all the documentation every day. I’m sure most users don’t ever read most of the documentation that is compressed and installed. Additionally, lzma does not seem to perform just as good on smaller files, so I don’t think it’s worth the extra time needed to compress the data.
One thing that Tobias’s benchmark has in common with the other benchmarks about lzma I’ve seen is that it doesn’t take much into consideration the memory usage. Alas, valgrind removed the massif graph that gave you the exact memory footprint of a process, it would have been quite interesting to see those graphs. I’d expect lzma to use a lot more memory than bzip2, to be so quick in decompression. This would make it particularly bad on older systems and embedded use cases, where one might be interested to save flash (or disk) space.
For what concerns GNU choice of not providing bzip2 files anymore, and just providing gzip or lzma compressed tarballs, I’m afraid that the choice has been political as much as technical, if not more. Both zlib (for gzip) and bzip2 (with its libbz2) have very permissive licenses, and that makes them ideal even for proprietary software, or free software with, in turn, permissive licenses like the BSD license. lzma-utils is still free software, but with a more restrictive license, LGPL-2.1.
While LGPL still allows proprietary software to link, dynamically, the library, it still is more restrictive, and will likely turn away some proprietary software developers. I suppose this is what the GNU project wants anyway, but I still find it a political choice, not a technical one. Also, it has an effect on users, as one has to either use the bigger gzip version or also install lzma-utils to be able to prepare a GNU-like environment on a proprietary system, like for instance Solaris.
I’m sincerely not convinced by lzma myself. It takes way too much time during compression to find it useful for backups, which is my main compression task, and I’m uncertain about its memory use. The fact that bsdtar doesn’t support it yet directly is also a bit of a turn down for me, as I grow used not to have three processes for extracting a tarball. Doug’s concerns about the on-disk format also makes it unlikely for me to start using that.
Sincerely, I’m more concerned with the age of tar itself, while there are ways to add stuff to tar that it wasn’t originally designed for, the fact that to change it you have to fully decompress it and then re-compress it makes it pretty much impossible to use as a desktop compression method like the rar, zip and ace (and 7z, somewhat, as far as I can see you cannot remove a file from an archive) formats are on Windows. I always found it strange that the only widespread archive method supporting Unix information (permissions, symlinks and so on) is the one that was used for magnetic tapes and is thus sequential by nature…
Well having it sequential makes it more interesting for backing up on a flash card probably (and I should be doing that by the way), but I don’t see it much useful to compress a bunch of random files with data on them… Okay that one of the most used cases for desktop compression has been compressing Microsoft Office’s hugely bloated files, and that both OpenOffice and, as far as I know, newer Office versions use zip files to put their XML into, but I still can see a couple of things I could be using a desktop compression tool from time to time…