Show HN: Arc – secure file archiver

josho · on May 31, 2016

This is the first that I've heard of ChaCha for encryption (https://en.wikipedia.org/wiki/Salsa20#ChaCha_variant).

Apparently it's a standard that Google is pushing to replace RC4 and already is using for HTTPS between google.com and Android.

If arc catches on I'm curious if it could support inline operations. E.g. on a 100gb+ archive can I read the tar index without decrypting the entire archive first, can I extract a single file? The ChaCha algorithm is a streaming cipher, which as I understand suggests that I cannot do operations like that.

Even worse now that I'm thinking about this, if my archive has a bit error early in the file does that mean the entire archive cannot be decrypted. Maybe for long term storage I'm better off physically securing my archives than encrypting them to avoid bit rot ruining everything.

koolba · on May 31, 2016

> Maybe for long term storage I'm better off physically securing my archives than encrypting them to avoid bit rot ruining everything.

Just have more copies on more/diverse media. Encrypted backups (with authenticated encryption) have the pleasant side effect of validating the backup on restore.

arete · on May 31, 2016

Indeed. One use case I have in mind is using the Shamir Secret Sharing mode to create N backups on separate flash drives stored in diverse locations.

valarauca1 · on May 31, 2016

If you are seriously concerned with the effects of Bit Rot on long term storage I suggest you invest in Magnetic Tapes. Which can be insured against bit rot for >300 years (Yes IBM and Oracle sells insurance for this, but only on IBM tape's/decks, they'll also have to inspect your storage facility).

With highly specialized requirements comes highly specialized solutions.

kakwa_ · on June 1, 2016

The tapes themselves could last 300 years (or, for more realistic common needs, around 30 years).

However the tape reader/recorder will probably not last that long. These things need special care (don't forget the cleaning tape every other month) and can act a little weird when the mechanical parts wear down (or even before in many cases).

The Ultrium LTO standard states that a tape recorder for version N must read/write N-1 and read N-2. Given that there is a new version around every 3 years, it gives you around 10 years of strong guaranty on your capacity to actually recover your data.

To properly manage data on a longer period you need to migrate the data to a new media. In fact, the problem then became mostly organizational.

barsonme · on May 31, 2016

Streams can be encrypted and decrypted, even arbitrarily. Here's a good read by Adam Langley: https://www.imperialviolet.org/2014/06/27/streamingencryptio...

stouset · on May 31, 2016

Your understanding is backwards. Stream ciphers are fundamentally compatible with random-access decryption, so yes this is entirely possible.

That said, authentication of a ciphertext is, in many ways, as important as encryption. So you would need to design the archive format in such a way that individual files' contents could be authenticated and decrypted on the whole, instead of authenticating the contents of the entire archive.

cyphar · on May 31, 2016

ChaCha is also going to be the default for OpenSSH encryption (it's already supported by upstream, but I don't know if they've changed the default yet).

Pirate-of-SV · on May 31, 2016

Cool! I usually use gpg-zip for this purpose on machines where I have gpg installed.

  gpg-zip --symmetric --gpg-args --cipher-algo=AES256 --output backup.tar.gpg file1 file2 file3

lucaspiller · on May 31, 2016

What do you do to keep your GPG keys safe (i.e. so you don't accidentally lose them)?

Johnny_Brahms · on May 31, 2016

in that example he is using symmetric encryption , so a password, no private key.

stouset · on May 31, 2016

Symmetric encryption is still based on secret keys, not passwords. Symmetric cryptosystems that appear to use passwords just transform them with a key derivation function into a suitable-length encryption key for the underlying cipher.

LeoPanthera · on May 31, 2016

This seems counter to the "unix philosophy". I would expect to use an archiving tool piped into an encryption tool. I'm not sure of the utility of something that combines the two.

stavros · on May 31, 2016

Can anyone tell me what advantage tgz has over zip? I usually curse when I have to use it, because it lacks indexes and is pretty much only good for archival tapes, if that. I wish we'd all move to a more modern format, like zip or 7zip.

hannob · on May 31, 2016

afaik neither zip nor 7z store unix permissions, file ownership, symlinks and a bunch of other features.

Other than that the "modern" version of it is .tar.xz.

aroch · on June 1, 2016

I believe, technically, you can store that data in a zip's metadata headers. It is just pretty much every zip/unzip implementation doesn't support it / permission restoration breaks across OSes

qwertyuiop924 · on June 1, 2016

Zip compresses each file separately. A folder full of zip files, each containing a single file, should take up roughly the same space as a zip archive containing all of the files. tgz doesn't do this, so it has a higher compression ratio for large collections of files. However, you cannot extract a single file alone from a tgz archive. This is why zip was designed the way it was.

netheril96 · on June 1, 2016

zip (the version used most commonly) does not support Unicode. I have a lot of zip files created by people on Windows that turn into nonsensical filenames when extracted on Linux or OS X.

rsync · on May 31, 2016

It's not clear from the README ... is this a client/server app, wherein I need to have 'arc' living on the server side ?

Or can I just point arc to SSH/SFTP and the server can be "dumb" ?

qwertyuiop924 · on June 1, 2016

The name arc has namespace collision problems. Not in software in general (I never complained about somebody calling their language elm - that's Cantrill's job), but in archivers in particular. Arc was the format that directly preceded zip, pkzip being the program that pk started selling after it was discovered that pkarc's source code was copied verbatim from the source for the original arc utility.

rsync · on June 1, 2016

Can the author compare and contrast to 'borg' which, it appears, has become the de facto standard for this kind of work ?

arete · on June 1, 2016

I hadn't heard of borg before, but it appears to be a backup program. arc is a file archiver, like tar or zip.

cyphar · on May 31, 2016

Why use this over tarsnap's client (which is also free software and has features like local deduplication).

lfam · on May 31, 2016

Tarsnap's client is not free software:

"2. You may use the Tarsnap client code for the sole purpose of accessing the service."

http://www.tarsnap.com/legal.html

grep4master · on May 31, 2016

Just a couple of commits from one contributor. I'll keep an eye on it; hope it gets more traction.

valarauca1 · on May 31, 2016

What HMAC is used to verify the contents of the archive before decryption takes place?

tekacs · on May 31, 2016

The MAC (not HMAC) used [0] is Poly1305 [1].

[0]: https://github.com/wg/arc/blob/cdd799359b6f7050fc5e3aa128bf1...

[1]: https://tools.ietf.org/html/draft-agl-tls-chacha20poly1305-0...

spilk · on May 31, 2016

Seems like it would be more unix-like to just pipe tar into gpg.

Longhanks · on May 31, 2016

[flagged]

pimlottc · on May 31, 2016

In fact, there's already an archiver named ARC from the BBS days:

https://en.wikipedia.org/wiki/ARC_(file_format)

liw · on May 31, 2016

Not only that, there's a historical program called Arc that is also a file archiver/compressor. It led to PKZIP.

liw · on June 1, 2016

Come to think of it, a variant of it is still packaged in Debian, package is called "arc".

Incidentally, for anyone choosing a name, the namecheck program from the Debian devscripts package is handy for checking if a particular name is already in use. Not perfect, but handy anyway.

ejcx · on May 31, 2016

I think it's a great name.

The author probably has this as a binary living in his PATH so they can just type

    $ arc <args>

It might be obvious, but who wants a long or strange name when you're using a command line utility often.

netheril96 · on June 1, 2016

When the name conflicts with another command line utility?

_1tan · on May 31, 2016

http://arclanguage.org

sophiebits · on May 31, 2016

Phabricator's command-line tool "Arcanist" is called arc:

https://secure.phabricator.com/book/phabricator/article/arca...

rmtew · on May 31, 2016

I used the arc archiver mentioned by others, back in the 90's if not earlier. I've probably still got archives on a disk somewhere.

todd8 · on May 31, 2016

And don't forget Arq.

jerrysievert · on May 31, 2016

ArcGIS, ArcPy, all sorts of collisions in the GIS namespace.

ddorian43 · on May 31, 2016

I'm starting to see this way too often. No one cares.

thesimon · on May 31, 2016

Sort of agree, but the title made me think of https://www.arqbackup.com

Quite close field to be honest.

jjnoakes · on May 31, 2016

Clearly at least one person cares, or he wouldn't have commented about it.

Not that the comment was useful.

But neither was yours.

jsmthrowaway · on May 31, 2016

What does that make yours, then?

jjnoakes · on May 31, 2016

Mine was useless too, was it not obvious?

Welcome to the club!