Hacker News new | past | comments | ask | show | jobs | submit login

Filesystem-level de-duplication is scary as hell as a concept, but also sounds amazing, especially doing it at copy-time so you don't have to opt-in to scanning to deduplicate. Is this common in filesystems? Or is ZFS striking out new ground here? I'm not really an under-the-OS-hood kinda guy.



MacOS and BTRFS have had it for several years. In fact I believe it’s the default behavior when copying a file in MacOS using the Finder (you have to specify `cp -c` in shell).


Windows Server has had dedupe for at least 10 years, too


Also pretty much every major SAN vendor underpinning the VMs in your local cloud provider.


> Filesystem-level de-duplication is scary as hell as a concept

What's scary about it? You have to track references, but it doesn't seem that hard compared to everything else going on in ZFS et al.

> Is this common in filesystems? Or is ZFS striking out new ground here? At least BTRFS does approximately the same.


> > Filesystem-level de-duplication is scary as hell as a concept

> What's scary about it?

It's scary because there's only one copy when you might have expected two. A single bad block could lose both "copies" at once.


Disks die all the time anyway. If you want to keep your data, you should have at least two-disk redundancy. In which case bad blocks won't kill anything.


Besides what all the others have mentioned, you can force ZFS to keep up to 3 copies of data blocks on a dataset. ZFS uses this internally for important metadata and will try to spread them around to maximize the chance of recovery, though don't rely on this feature alone for redundancy.


The file system metadata is redundant and on a correctly configured ZFS system your error correction is isolated and can be redundant as well


You can do `zfs set copies=2` for a ZFS dataset if you think there's value in the extra copies. That's better than a copy of the file because with a single bad block, the checksum will fail and ZFS will retrieve the data from the good block and repair both files. In practice, using disk mirrors is better than setting the copies property.


That's what data checksumming and mirroring is for.


3-2-1..

3 Copies

2 Media

1 offsite.

If you follow that then you would have no fear of data loss. if you are putting 2 copies on the same filesystem you are already doing backups wrongs


I always like to add: 0 stress restore.

You should also be able to restore your data in a calm, controlled, and correct manner. Test your backups to be sure they work, and to be sure that you're still familiar with the process. You don't want to be stuck reverse-engineering your backup solution in the middle of an already-panicky data loss scenario.

Remain calm, follow the prescribed steps, and wait for your data to come back.


Copying a file isn't great protection against bad blocks. Modern SSDs, when they fail, tend to fail catastrophically (the whole device dies all at once), rather than dying one block at a time. If you care about the data, back it up on a separate piece of hardware.


> What's scary about it?

Just that I'm trusting the OS to re-duplicate it at block level on file write. The idea that block by block you've got "okay, this block is shared by files XYZ, this next block is unique to file Z, then the next block is back to XYZ... oh we're editing that one? Then it's a new block that's now unique to file Z too".

I guess I'm not used to trusting filesystems to do anything but dumb write and read. I know they abstract away a crapload of amazing complexity in reality, I'm just used to thinking of them as dumb bags of bits.


ZFS is COW with checksums so it wouldn't edit the same blocks and there is the possibility for sending snapshots to another pool (which may/may not use deduplication). Although, deduplication comes with a performance cost. I had all my photos spread out on various disks and external media, sometimes extra copies (as I did not trust certain disks) - if I remember it correctly I went from 3.6T to 2.2T usage by consolidating all my photos to a deduplicated pool. All fine, but the zpool wanted way more RAM and felt slower than my other pool. After I removed duplicates (with help of https://github.com/sahib/rmlint ), I migrated my photos to an ordinary zpool instead.


If you're on ZFS you're probably using snapshots, so all that work is already happening.


I mean, it's happening whether you're using snapshots or not; ZFS doesn't overwrite things in place basically ever. Snapshots just mean it doesn't delete the old copy as having nothing referencing it.


CoW is always happening but without snapshots you can know that every block has exactly zero or one references and it's much simpler. No garbage collection, no complicated data structures, all you need is a single tree and a queue of dead blocks.


It could, but for ZFS, I believe it still uses all the same machinery either way, it just short circuits a lot of paths.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: