Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Arc – secure file archiver (github.com/wg)
72 points by arete on May 31, 2016 | hide | past | favorite | 43 comments



This is the first that I've heard of ChaCha for encryption (https://en.wikipedia.org/wiki/Salsa20#ChaCha_variant).

Apparently it's a standard that Google is pushing to replace RC4 and already is using for HTTPS between google.com and Android.

If arc catches on I'm curious if it could support inline operations. E.g. on a 100gb+ archive can I read the tar index without decrypting the entire archive first, can I extract a single file? The ChaCha algorithm is a streaming cipher, which as I understand suggests that I cannot do operations like that.

Even worse now that I'm thinking about this, if my archive has a bit error early in the file does that mean the entire archive cannot be decrypted. Maybe for long term storage I'm better off physically securing my archives than encrypting them to avoid bit rot ruining everything.


> Maybe for long term storage I'm better off physically securing my archives than encrypting them to avoid bit rot ruining everything.

Just have more copies on more/diverse media. Encrypted backups (with authenticated encryption) have the pleasant side effect of validating the backup on restore.


Indeed. One use case I have in mind is using the Shamir Secret Sharing mode to create N backups on separate flash drives stored in diverse locations.


If you are seriously concerned with the effects of Bit Rot on long term storage I suggest you invest in Magnetic Tapes. Which can be insured against bit rot for >300 years (Yes IBM and Oracle sells insurance for this, but only on IBM tape's/decks, they'll also have to inspect your storage facility).

With highly specialized requirements comes highly specialized solutions.


The tapes themselves could last 300 years (or, for more realistic common needs, around 30 years).

However the tape reader/recorder will probably not last that long. These things need special care (don't forget the cleaning tape every other month) and can act a little weird when the mechanical parts wear down (or even before in many cases).

The Ultrium LTO standard states that a tape recorder for version N must read/write N-1 and read N-2. Given that there is a new version around every 3 years, it gives you around 10 years of strong guaranty on your capacity to actually recover your data.

To properly manage data on a longer period you need to migrate the data to a new media. In fact, the problem then became mostly organizational.


Streams can be encrypted and decrypted, even arbitrarily. Here's a good read by Adam Langley: https://www.imperialviolet.org/2014/06/27/streamingencryptio...


Your understanding is backwards. Stream ciphers are fundamentally compatible with random-access decryption, so yes this is entirely possible.

That said, authentication of a ciphertext is, in many ways, as important as encryption. So you would need to design the archive format in such a way that individual files' contents could be authenticated and decrypted on the whole, instead of authenticating the contents of the entire archive.


ChaCha is also going to be the default for OpenSSH encryption (it's already supported by upstream, but I don't know if they've changed the default yet).


Cool! I usually use gpg-zip for this purpose on machines where I have gpg installed.

  gpg-zip --symmetric --gpg-args --cipher-algo=AES256 --output backup.tar.gpg file1 file2 file3


What do you do to keep your GPG keys safe (i.e. so you don't accidentally lose them)?


in that example he is using symmetric encryption , so a password, no private key.


Symmetric encryption is still based on secret keys, not passwords. Symmetric cryptosystems that appear to use passwords just transform them with a key derivation function into a suitable-length encryption key for the underlying cipher.


This seems counter to the "unix philosophy". I would expect to use an archiving tool piped into an encryption tool. I'm not sure of the utility of something that combines the two.


Can anyone tell me what advantage tgz has over zip? I usually curse when I have to use it, because it lacks indexes and is pretty much only good for archival tapes, if that. I wish we'd all move to a more modern format, like zip or 7zip.


afaik neither zip nor 7z store unix permissions, file ownership, symlinks and a bunch of other features.

Other than that the "modern" version of it is .tar.xz.


I believe, technically, you can store that data in a zip's metadata headers. It is just pretty much every zip/unzip implementation doesn't support it / permission restoration breaks across OSes


Zip compresses each file separately. A folder full of zip files, each containing a single file, should take up roughly the same space as a zip archive containing all of the files. tgz doesn't do this, so it has a higher compression ratio for large collections of files. However, you cannot extract a single file alone from a tgz archive. This is why zip was designed the way it was.


zip (the version used most commonly) does not support Unicode. I have a lot of zip files created by people on Windows that turn into nonsensical filenames when extracted on Linux or OS X.


It's not clear from the README ... is this a client/server app, wherein I need to have 'arc' living on the server side ?

Or can I just point arc to SSH/SFTP and the server can be "dumb" ?


The name arc has namespace collision problems. Not in software in general (I never complained about somebody calling their language elm - that's Cantrill's job), but in archivers in particular. Arc was the format that directly preceded zip, pkzip being the program that pk started selling after it was discovered that pkarc's source code was copied verbatim from the source for the original arc utility.


Can the author compare and contrast to 'borg' which, it appears, has become the de facto standard for this kind of work ?


I hadn't heard of borg before, but it appears to be a backup program. arc is a file archiver, like tar or zip.


Why use this over tarsnap's client (which is also free software and has features like local deduplication).


Tarsnap's client is not free software:

"2. You may use the Tarsnap client code for the sole purpose of accessing the service."

http://www.tarsnap.com/legal.html


Just a couple of commits from one contributor. I'll keep an eye on it; hope it gets more traction.


What HMAC is used to verify the contents of the archive before decryption takes place?



Seems like it would be more unix-like to just pipe tar into gpg.


[flagged]


In fact, there's already an archiver named ARC from the BBS days:

https://en.wikipedia.org/wiki/ARC_(file_format)


Not only that, there's a historical program called Arc that is also a file archiver/compressor. It led to PKZIP.


Come to think of it, a variant of it is still packaged in Debian, package is called "arc".

Incidentally, for anyone choosing a name, the namecheck program from the Debian devscripts package is handy for checking if a particular name is already in use. Not perfect, but handy anyway.


I think it's a great name.

The author probably has this as a binary living in his PATH so they can just type

    $ arc <args>
It might be obvious, but who wants a long or strange name when you're using a command line utility often.


When the name conflicts with another command line utility?



Phabricator's command-line tool "Arcanist" is called arc:

https://secure.phabricator.com/book/phabricator/article/arca...


I used the arc archiver mentioned by others, back in the 90's if not earlier. I've probably still got archives on a disk somewhere.


And don't forget Arq.


ArcGIS, ArcPy, all sorts of collisions in the GIS namespace.


I'm starting to see this way too often. No one cares.


Sort of agree, but the title made me think of https://www.arqbackup.com

Quite close field to be honest.


Clearly at least one person cares, or he wouldn't have commented about it.

Not that the comment was useful.

But neither was yours.


What does that make yours, then?


Mine was useless too, was it not obvious?

Welcome to the club!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: