Hacker News new | past | comments | ask | show | jobs | submit login

TL;DR they rediscovered the context-dependent variable block size technique that Tarsnap uses: https://www.tarsnap.com/download/EuroBSDCon13.pdf



They don't claim to have invented CDC, or FastCDC, they just made and are sharing a useful implementation of it.

And if that Tarsnap presentation is from 2013, and FastCDC was published in 2016 [1] according to Wikipedia [2], then presumably Tarsnap didn't invent FastCDC either.

[1] https://www.usenix.org/system/files/conference/atc16/atc16-p...

[2] https://en.wikipedia.org/wiki/Rolling_hash#Gear_fingerprint_...


Another well-cited predecessor is "A low-bandwidth network file system." (https://dl.acm.org/doi/abs/10.1145/502034.502052), which was published in 2001. It uses Rabin fingerprinting to define chunk boundaries.


The Tarsnap technique is interesting but that presentation is a bit hard to follow. What are the examples for values alpha and p?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: