Append-only backups with restic and rclone

m3nu · on March 10, 2019

This bugged me too last year. So I built a hosting service for Borg backups that can have append-only access keys.

So the client machine can't change old backups. Ever. In addition you can lock your settings down with 2FA.

https://www.borgbase.com/

bartman · on March 10, 2019

Great service, especially with the addition of the monitoring features.

How do you store the data on your end?

m3nu · on March 10, 2019

Borg stores all data as plain files in 500MB segments. Every segment has a checksum, which you can validate using the `borg check` command.

The actual data lies on a plain vanilla RAID array.

tomjen3 · on March 10, 2019

That is going to be an issue for anyone with GDPR data, since you are required to be able to delete it upon request.

m3nu · on March 10, 2019

You can still use a full access key to prune old data/old archives as needed. But you use it from a trusted machine or temporarily.

Also: you still need to keep some data even under GDPR. Like billing data.

dorfsmay · on March 9, 2019

This is great. I find myself having to explain and argue this point way too often:

"One issue with most backup solutions is that an attacker controlling the local system can also wipe its old backups."

regecks · on March 9, 2019

S3 also makes this kind of protection pretty easy - enable bucket versioning and allow only PutObject.

bpye · on March 10, 2019

B2 can do this too if S3 is a bit too spendy.

juskrey · on March 9, 2019

write only FTP rights

yroc92 · on March 9, 2019

What kind of solution to this problem would you propose?

Jaepa · on March 9, 2019

Well, append-only backups with something like restic and rclone seems like the obvious one.

mprovost · on March 10, 2019

Tapes sitting on a shelf is the classic solution.

m3nu · on March 10, 2019

Write-once backup media, like DVDs were even better. But they are getting out of fashion.

jeltz · on March 9, 2019

Borg Backup also has this feature, and so far my experience with it has been great.

kklimonda · on March 10, 2019

With borg, make sure that you read and considered drawbacks of the append-only mode: https://borgbackup.readthedocs.io/en/stable/usage/notes.html... I'm curious whether restic's append-only fares better here.

Tharre · on March 10, 2019

How could that possibly be improved? Append-only means no deleting or freeing up space by definition.

indigo945 · on March 10, 2019

Not necessarily, because the backup server could squash backups itself, even when the client is not allowed to do so.

witten · on March 10, 2019

borgmatic, a Borg Backup wrapper with a declarative config file, also supports creating append-only Borg repositories. More at https://torsion.org/borgmatic/

h1d · on March 10, 2019

duplicacy is another alternative. It has paid plans but free for personal CLI usage.

Their study on various cloud storage performance helped me decide tremendously.

https://github.com/gilbertchen/cloud-storage-comparison

I make backups using multiple implementations (restic and duplicacy) to avoid unfortunate case of an implementation corrupting encrypted / compressed data which may become unrecoverable completely or any other bug to partially corrupt it.

I hate it when backups don't work when I need it and it's not easy to do 100% verification on all backups regularly.

jsiepkes · on March 9, 2019

Worth noting that if you run your own Restic server you can run it in append only mode.

hendry · on March 9, 2019

Seems a bit complex. How about the remote server adding the immutable flag on the uploaded files?

ioquatix · on March 10, 2019

Or just use ZFS and send/recv over the network? It's super simple, guarantees reliability, and very easy to replicate, e.g. offline backups.

cyphar · on March 10, 2019

The nice thing about restic is that you don't need to have a smart server, you can store your backups anywhere. And it doesn't require a particular filesystem on your host. Not to mention that deduplication is very cheap due its design (and unlike extent-based dedup, content-defined dedup also helps dedup some file changes a lot better).

ZFS, while it is definitely a great project, doesn't provide those features.

cryptonector · on March 10, 2019

You can store zfs send output as a file. You don't need a ZFS filesystem to receive into if all you're doing is backups.

akvadrako · on March 10, 2019

That's not a complete solution unless you have infinite storage. You need the ability to store 1 snapshot per month without missing any files, but they may only be contained in one hourly diff.

cryptonector · on March 11, 2019

This is true of all incremental backup systems.

cyphar · on March 11, 2019

restic stores files as a Merkle tree of blobs, with snapshots just being a particular root of the tree. This means that all snapshots are "full" (in the sense you don't need any others to reconstruct the filesystem) but are also "incremental" (in the sense that data isn't duplicated if it wasn't changed).

Yes, ZFS internally stores things in a similar fashion but I believe you can't get that from zfs-send (which is totally fine -- that's not the use-case they were going for, and zfs-send is incredibly useful for lots of other cases). restic is also chunked using content-defined chunking which is gives you much better deduplication than extent-based chunking (which is what ZFS, or any in-kernel filesystem has) -- and ZFS deduplication is well-known to be very expensive to enable while restic's deduplication is (almost) free. Of course, restic doesn't get atomic snapshots without using a filesystem like ZFS -- so so there's obviously benefits to using both together.

cryptonector · on March 11, 2019

ZFS is a Merkle hash tree. zfs send uses this.

cyphar · on March 12, 2019

Yes, but if you are storing the output of zfs send for each snapshot (incrementally) you won't get the benefit of it using a merkle tree on the storage side of things (obviously it's used among other neat algorithms to figure out what the delta between snapshots is).

If you are using zfs recv on the remote server you will get basically the same features as restic (minus content-defined deduplication, and full-repo encryption -- ZFS has extent-based dedup and its built-in encryption is not "full-disk" since it reveals ZFS-level metadata). And you get real atomic snapshots which is better than what restic can give you because it's a userspace tool (though you can always use restic with ZFS).

I'm not sure we're actually in disagreement on how ZFS works, it's a question of whether you can get the practicalities of the benefits without having a ZFS server which holds your backups. If you just store the out of zfs send then it's also hard to expire old backups, and restoring would require applying all of the saved send payloads rather than just doing one 'zfs send' from the remote server.

cryptonector · on March 12, 2019

Sure, zfs send does not actually send a section of a blockchain. It could have, but that wouldn't have been as space-efficient.

cyphar · on March 12, 2019

In order for ZFS send to be able to provide the same features as restic it would need to output a representation of the zfs merkle tree as a flat filesystem (but encrypted) -- which would allow a dumb server to deduplicate the tree (and ZFS is clever enough to already know what blobs exist on the remote side). I guess this was not done because a ZFS send stream might be more efficient for transfer (as you said). But this means that it's main use as a backup system requires having a ZFS server on the other end (in order to be efficient and useful as a backup store).

Again, I'm not bashing ZFS. My whole point is that restic is a neat and interesting project specifically because it doesn't require a clever server to provide its features -- that doesn't mean ZFS isn't a great project (far from it).

I use ZFS on my servers and love it, and I use restic for backups.

akvadrako · on March 11, 2019

No, it's not. In a real backup solution (Time Machine, borg, restic), if a file exists at the time of the backup, it will be included in that snapshot, even if snapshots in between are removed.

With ZFS snapshots, only blocks changed since the previous snapshot are included.

cyphar · on March 11, 2019

The term "incremental" backup (in contrast to "full" backup) is no longer as useful as it was in the 2000s -- GP is actually correct that "incremental" backup systems don't provide this feature (since incremental backups only include the difference from the previous backup).

However modern (or, if you prefer, "real") backup systems backup systems" (like borgbackup and restic) aren't "incremental" in this traditional sense, they are a mix of both "full" and "incremental" backups such that you get the benefits of both without the corresponding downsides.

akvadrako · on March 14, 2019

You'll have to go even further back than that because Time Machine was invented in 2007 and described as incremental backup.

cyphar · on March 15, 2019

Right, but my point is that "traditional" views of backups had two types of backup -- incremental and full. The "let's store the delta between snapshots" solution is what would traditionally be called an "incremental" backup while a full copy of the entire filesystem would be a "full" backup.

Time Machine might call itself an incremental backup, but this just leads to more confusion -- it's a next-gen backup system (that has the benefits of both the "incremental" and "full" approaches) just like restic, borg-backup and all other similar projects.

bpye · on March 10, 2019

I've never considered doing this, can you do it differentially?

cryptonector · on March 11, 2019

Do you mean incrementally?

Yes, zfs send sends the contents that's different between snapshots (or the entire dataset up to a snapshot, if you want).

bpye · on March 11, 2019

Hm, I think I understand. So you could save the entire dataset up to snapshot x, then for x+1 you would save just a diff. Might have to play around with this. I currently use ZFS snapshots to protect against accidental data loss and I have daily backups off site but these are managed independently. This might make my life easier.

cryptonector · on March 11, 2019

Yes, zfs send can do both, "the whole thing [up to a snapshot]" and "diffs between snapshots". And it's fast.

Perfect for backups.

casylum · on March 10, 2019

Does restic support sparse files yet?

nine_k · on March 10, 2019

Noting the obvious: if you're backing up customer data, an append-only backup is not GDPR-compliant.

Xeago · on March 10, 2019

An append-only backup is not by definition non-compliant with GDPR. It's important that the individual can be assured that their personal data will not be restored back to production systems (except in certain rare instances, e.g., the need to recover from a natural disaster or serious security breach). In such cases, the user’s personal data may be restored from backups, but the controller will take the necessary steps to honor the initial request and erase the primary instance of the data again.

For example: https://www.acronis.com/en-us/blog/posts/backups-and-gdpr-ri...