Hacker News new | past | comments | ask | show | jobs | submit login
Append-only backups with restic and rclone (ruderich.org)
75 points by edward on March 9, 2019 | hide | past | favorite | 42 comments



This bugged me too last year. So I built a hosting service for Borg backups that can have append-only access keys.

So the client machine can't change old backups. Ever. In addition you can lock your settings down with 2FA.

https://www.borgbase.com/


Great service, especially with the addition of the monitoring features.

How do you store the data on your end?


Borg stores all data as plain files in 500MB segments. Every segment has a checksum, which you can validate using the `borg check` command.

The actual data lies on a plain vanilla RAID array.


That is going to be an issue for anyone with GDPR data, since you are required to be able to delete it upon request.


You can still use a full access key to prune old data/old archives as needed. But you use it from a trusted machine or temporarily.

Also: you still need to keep some data even under GDPR. Like billing data.


This is great. I find myself having to explain and argue this point way too often:

"One issue with most backup solutions is that an attacker controlling the local system can also wipe its old backups."


S3 also makes this kind of protection pretty easy - enable bucket versioning and allow only PutObject.


B2 can do this too if S3 is a bit too spendy.


write only FTP rights


What kind of solution to this problem would you propose?


Well, append-only backups with something like restic and rclone seems like the obvious one.


Tapes sitting on a shelf is the classic solution.


Write-once backup media, like DVDs were even better. But they are getting out of fashion.


Borg Backup also has this feature, and so far my experience with it has been great.


With borg, make sure that you read and considered drawbacks of the append-only mode: https://borgbackup.readthedocs.io/en/stable/usage/notes.html... I'm curious whether restic's append-only fares better here.


How could that possibly be improved? Append-only means no deleting or freeing up space by definition.


Not necessarily, because the backup server could squash backups itself, even when the client is not allowed to do so.


borgmatic, a Borg Backup wrapper with a declarative config file, also supports creating append-only Borg repositories. More at https://torsion.org/borgmatic/


duplicacy is another alternative. It has paid plans but free for personal CLI usage.

Their study on various cloud storage performance helped me decide tremendously.

https://github.com/gilbertchen/cloud-storage-comparison

I make backups using multiple implementations (restic and duplicacy) to avoid unfortunate case of an implementation corrupting encrypted / compressed data which may become unrecoverable completely or any other bug to partially corrupt it.

I hate it when backups don't work when I need it and it's not easy to do 100% verification on all backups regularly.


Worth noting that if you run your own Restic server you can run it in append only mode.


Seems a bit complex. How about the remote server adding the immutable flag on the uploaded files?


Or just use ZFS and send/recv over the network? It's super simple, guarantees reliability, and very easy to replicate, e.g. offline backups.


The nice thing about restic is that you don't need to have a smart server, you can store your backups anywhere. And it doesn't require a particular filesystem on your host. Not to mention that deduplication is very cheap due its design (and unlike extent-based dedup, content-defined dedup also helps dedup some file changes a lot better).

ZFS, while it is definitely a great project, doesn't provide those features.


You can store zfs send output as a file. You don't need a ZFS filesystem to receive into if all you're doing is backups.


That's not a complete solution unless you have infinite storage. You need the ability to store 1 snapshot per month without missing any files, but they may only be contained in one hourly diff.


This is true of all incremental backup systems.


restic stores files as a Merkle tree of blobs, with snapshots just being a particular root of the tree. This means that all snapshots are "full" (in the sense you don't need any others to reconstruct the filesystem) but are also "incremental" (in the sense that data isn't duplicated if it wasn't changed).

Yes, ZFS internally stores things in a similar fashion but I believe you can't get that from zfs-send (which is totally fine -- that's not the use-case they were going for, and zfs-send is incredibly useful for lots of other cases). restic is also chunked using content-defined chunking which is gives you much better deduplication than extent-based chunking (which is what ZFS, or any in-kernel filesystem has) -- and ZFS deduplication is well-known to be very expensive to enable while restic's deduplication is (almost) free. Of course, restic doesn't get atomic snapshots without using a filesystem like ZFS -- so so there's obviously benefits to using both together.


ZFS is a Merkle hash tree. zfs send uses this.


Yes, but if you are storing the output of zfs send for each snapshot (incrementally) you won't get the benefit of it using a merkle tree on the storage side of things (obviously it's used among other neat algorithms to figure out what the delta between snapshots is).

If you are using zfs recv on the remote server you will get basically the same features as restic (minus content-defined deduplication, and full-repo encryption -- ZFS has extent-based dedup and its built-in encryption is not "full-disk" since it reveals ZFS-level metadata). And you get real atomic snapshots which is better than what restic can give you because it's a userspace tool (though you can always use restic with ZFS).

I'm not sure we're actually in disagreement on how ZFS works, it's a question of whether you can get the practicalities of the benefits without having a ZFS server which holds your backups. If you just store the out of zfs send then it's also hard to expire old backups, and restoring would require applying all of the saved send payloads rather than just doing one 'zfs send' from the remote server.


Sure, zfs send does not actually send a section of a blockchain. It could have, but that wouldn't have been as space-efficient.


In order for ZFS send to be able to provide the same features as restic it would need to output a representation of the zfs merkle tree as a flat filesystem (but encrypted) -- which would allow a dumb server to deduplicate the tree (and ZFS is clever enough to already know what blobs exist on the remote side). I guess this was not done because a ZFS send stream might be more efficient for transfer (as you said). But this means that it's main use as a backup system requires having a ZFS server on the other end (in order to be efficient and useful as a backup store).

Again, I'm not bashing ZFS. My whole point is that restic is a neat and interesting project specifically because it doesn't require a clever server to provide its features -- that doesn't mean ZFS isn't a great project (far from it).

I use ZFS on my servers and love it, and I use restic for backups.


No, it's not. In a real backup solution (Time Machine, borg, restic), if a file exists at the time of the backup, it will be included in that snapshot, even if snapshots in between are removed.

With ZFS snapshots, only blocks changed since the previous snapshot are included.


The term "incremental" backup (in contrast to "full" backup) is no longer as useful as it was in the 2000s -- GP is actually correct that "incremental" backup systems don't provide this feature (since incremental backups only include the difference from the previous backup).

However modern (or, if you prefer, "real") backup systems backup systems" (like borgbackup and restic) aren't "incremental" in this traditional sense, they are a mix of both "full" and "incremental" backups such that you get the benefits of both without the corresponding downsides.


You'll have to go even further back than that because Time Machine was invented in 2007 and described as incremental backup.


Right, but my point is that "traditional" views of backups had two types of backup -- incremental and full. The "let's store the delta between snapshots" solution is what would traditionally be called an "incremental" backup while a full copy of the entire filesystem would be a "full" backup.

Time Machine might call itself an incremental backup, but this just leads to more confusion -- it's a next-gen backup system (that has the benefits of both the "incremental" and "full" approaches) just like restic, borg-backup and all other similar projects.


I've never considered doing this, can you do it differentially?


Do you mean incrementally?

Yes, zfs send sends the contents that's different between snapshots (or the entire dataset up to a snapshot, if you want).


Hm, I think I understand. So you could save the entire dataset up to snapshot x, then for x+1 you would save just a diff. Might have to play around with this. I currently use ZFS snapshots to protect against accidental data loss and I have daily backups off site but these are managed independently. This might make my life easier.


Yes, zfs send can do both, "the whole thing [up to a snapshot]" and "diffs between snapshots". And it's fast.

Perfect for backups.


Does restic support sparse files yet?


Noting the obvious: if you're backing up customer data, an append-only backup is not GDPR-compliant.


An append-only backup is not by definition non-compliant with GDPR. It's important that the individual can be assured that their personal data will not be restored back to production systems (except in certain rare instances, e.g., the need to recover from a natural disaster or serious security breach). In such cases, the user’s personal data may be restored from backups, but the controller will take the necessary steps to honor the initial request and erase the primary instance of the data again.

For example: https://www.acronis.com/en-us/blog/posts/backups-and-gdpr-ri...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: