The nice thing about restic is that you don't need to have a smart server, you can store your backups anywhere. And it doesn't require a particular filesystem on your host. Not to mention that deduplication is very cheap due its design (and unlike extent-based dedup, content-defined dedup also helps dedup some file changes a lot better).

ZFS, while it is definitely a great project, doesn't provide those features.

You can store zfs send output as a file. You don't need a ZFS filesystem to receive into if all you're doing is backups.

That's not a complete solution unless you have infinite storage. You need the ability to store 1 snapshot per month without missing any files, but they may only be contained in one hourly diff.

This is true of all incremental backup systems.

restic stores files as a Merkle tree of blobs, with snapshots just being a particular root of the tree. This means that all snapshots are "full" (in the sense you don't need any others to reconstruct the filesystem) but are also "incremental" (in the sense that data isn't duplicated if it wasn't changed).

Yes, ZFS internally stores things in a similar fashion but I believe you can't get that from zfs-send (which is totally fine -- that's not the use-case they were going for, and zfs-send is incredibly useful for lots of other cases). restic is also chunked using content-defined chunking which is gives you much better deduplication than extent-based chunking (which is what ZFS, or any in-kernel filesystem has) -- and ZFS deduplication is well-known to be very expensive to enable while restic's deduplication is (almost) free. Of course, restic doesn't get atomic snapshots without using a filesystem like ZFS -- so so there's obviously benefits to using both together.

ZFS is a Merkle hash tree. zfs send uses this.

Yes, but if you are storing the output of zfs send for each snapshot (incrementally) you won't get the benefit of it using a merkle tree on the storage side of things (obviously it's used among other neat algorithms to figure out what the delta between snapshots is).

If you are using zfs recv on the remote server you will get basically the same features as restic (minus content-defined deduplication, and full-repo encryption -- ZFS has extent-based dedup and its built-in encryption is not "full-disk" since it reveals ZFS-level metadata). And you get real atomic snapshots which is better than what restic can give you because it's a userspace tool (though you can always use restic with ZFS).

I'm not sure we're actually in disagreement on how ZFS works, it's a question of whether you can get the practicalities of the benefits without having a ZFS server which holds your backups. If you just store the out of zfs send then it's also hard to expire old backups, and restoring would require applying all of the saved send payloads rather than just doing one 'zfs send' from the remote server.

Sure, zfs send does not actually send a section of a blockchain. It could have, but that wouldn't have been as space-efficient.

In order for ZFS send to be able to provide the same features as restic it would need to output a representation of the zfs merkle tree as a flat filesystem (but encrypted) -- which would allow a dumb server to deduplicate the tree (and ZFS is clever enough to already know what blobs exist on the remote side). I guess this was not done because a ZFS send stream might be more efficient for transfer (as you said). But this means that it's main use as a backup system requires having a ZFS server on the other end (in order to be efficient and useful as a backup store).

Again, I'm not bashing ZFS. My whole point is that restic is a neat and interesting project specifically because it doesn't require a clever server to provide its features -- that doesn't mean ZFS isn't a great project (far from it).

I use ZFS on my servers and love it, and I use restic for backups.

No, it's not. In a real backup solution (Time Machine, borg, restic), if a file exists at the time of the backup, it will be included in that snapshot, even if snapshots in between are removed.

With ZFS snapshots, only blocks changed since the previous snapshot are included.

The term "incremental" backup (in contrast to "full" backup) is no longer as useful as it was in the 2000s -- GP is actually correct that "incremental" backup systems don't provide this feature (since incremental backups only include the difference from the previous backup).

However modern (or, if you prefer, "real") backup systems backup systems" (like borgbackup and restic) aren't "incremental" in this traditional sense, they are a mix of both "full" and "incremental" backups such that you get the benefits of both without the corresponding downsides.

You'll have to go even further back than that because Time Machine was invented in 2007 and described as incremental backup.

Right, but my point is that "traditional" views of backups had two types of backup -- incremental and full. The "let's store the delta between snapshots" solution is what would traditionally be called an "incremental" backup while a full copy of the entire filesystem would be a "full" backup.

Time Machine might call itself an incremental backup, but this just leads to more confusion -- it's a next-gen backup system (that has the benefits of both the "incremental" and "full" approaches) just like restic, borg-backup and all other similar projects.

I've never considered doing this, can you do it differentially?

Do you mean incrementally?

Yes, zfs send sends the contents that's different between snapshots (or the entire dataset up to a snapshot, if you want).

Hm, I think I understand. So you could save the entire dataset up to snapshot x, then for x+1 you would save just a diff. Might have to play around with this. I currently use ZFS snapshots to protect against accidental data loss and I have daily backups off site but these are managed independently. This might make my life easier.

Yes, zfs send can do both, "the whole thing [up to a snapshot]" and "diffs between snapshots". And it's fast.

Perfect for backups.

