The first two issues -- a lack of index and the fact that you can't seek within a deflated tarball -- are true but are easily handled by smarter compression. Tarsnap, for example, splits off archive headers and stores them separately in order to speed up archive scanning.
The third issue -- lack of support for modern filesystem features -- is just plain wrong. Sure, the tar in 7th edition UNIX didn't support these, but modern tars support modern filesystem features.
The fourth issue -- general cruft -- is correct but irrelevant on modern tars since the problems caused by the cruft are eliminated via pax extension headers.
The guy immediately loses credibility in my eyes for referring to the most popular archive format as 'WinZip'. It's the ZIP file format, designed by Phil Katz of PKWare Inc.
What this article describes has already been solved with zip, gzip, 7z, bzip and forks of tar
The problem is that at the moment there is no open standard (there are IETF proposals) since each of these is either patent, copyright or trademark encumbered.
> Because tar does not support encryption/compression on the inside of archives.
Yes it does? Just encrypt/compress all the files before tarring.
> Not indexed
The reason tar doesn't have an index is so that tarballs can be concatenated. Also IIRC, you only have to jump through the headers for all files. Still O(n) where n is the number of files, but you don't have to scan through all of the data.
> The reason tar doesn't have an index is so that tarballs can be concatenated.
I'm curious, what's the use-case for this? Offhand, the only use for that ability I can think of is if I forgot a file in a tarball and have already deleted the originals; I can tar the missing file and cat the two tarballs.
Don't think files, think tapes. Tar stands for Tape ARchive and was originally primarily used for backing up to tapes. When working with tapes where deleteing and re-writing archives is basically impossible, concating an archive to the end of an already backed up archive to create a new, updated archive is very useful.
Compress before tarring is a really dumb idea and you will get terrible compression ratios - you cannot exploit data patterns across files. It could work if you ask gzip to write some sort of a global table...
You also make a bad block affect potentially every file following it. When if you compress pre-tar you could find the next file boundary and recover the rest.
I think raising these concerns is fair in a world where nearly all Unix-related source code and binaries is distributed in (g/bzipped) TAR format. Unfortunately, the author does not really explain why this is and what is wrong with ZIP (e.g. why a new format is needed).
TAR is old however, and if ZIP cannot take its place, coming up with something new is not such a bad idea. I think Apple's DMG/UDIF file format deserves to be mentioned as well: it addresses all the concerns mentioned (it is essentially a mountable filesystem). I'm pretty sure there is a lot to be learned from that.
"... Because tar does not support encryption/compression on the inside of archives ..."
That can be an advantage. Space isn't always what I want for backups - I want the original data back and compression gone wrong (tar -zxvf) is just another way to loose data.
That is exactly why the lackof in-archive compression is bad, with tar you lose tje whole rest of the archive on a single bit error, with in-archive compression you lose just the file the error is located in.
The pkzip format allows you to "zip" data uncompressed if you are worried about that. Then you can trivially unpack your files using nothing but seek and read for those cases where you also accidentally misplace your last copy of unzip.
It also has file description before the copy of the file data and at the second copy at the end of the whole archive, allowing fast listing of the content or location inside of the archive, exactly as the article would want.
The only thing pkzip doesn't cover in the original format is unix/linux specific metadata, but maybe this was/can be added. I use info-zip when the metadata don't matter but tar when they do (but even tar has its limitations with working with unix/linux metadata).
pkzip does reserve the posibility for an arbitrary length extra field connected to each file. According to the spec (http://www.pkware.com/documents/casestudies/APPNOTE.TXT) this is for "additional information...for special needs or for specific platforms". All compatible zip tools are required to ignore all information in this field that it doesn't understand so you can basically write whatever you want there (although the spec does offer a recommended format for writing to this field). So if you write a special ACL preserving zip implementation, you can still unpack the file with any other zip implementation that knows nothing of your special version.
Well, the existing Linux package managers aren't really safer as far as the archive formats go; for example, .debs can run arbitrary shell scripts during installation. The main thing that seems to add to the safety is the social practice of grabbing debs via trusted repositories using apt-get/aptitude/synaptic, rather than manually downloading them from random sites and doing dpkg -i. But if there is malware, it's even worse, because at least these shar installers are usually installed as non-root, while installing a .deb needs root.
Commercial games with Linux versions often still use this (or a variant). Not too sure why; perhaps because it's the closest Linux equivalent to the self-extracting installer archives they use on Windows?
When dealing with users who may not be completely familiar with package management, and creating a single cross-platform package, it can be very helpful to prevent the data and its installer from being separated. It's a very simple way to bundle some logic with an archive as well, as the script can be modified after it is generated.
The third issue -- lack of support for modern filesystem features -- is just plain wrong. Sure, the tar in 7th edition UNIX didn't support these, but modern tars support modern filesystem features.
The fourth issue -- general cruft -- is correct but irrelevant on modern tars since the problems caused by the cruft are eliminated via pax extension headers.