Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Because people use it as a unique identifier for data.

Let's say you have a file on dropbox and they use SHA1 for data de-duplication. Someone else creates a file with the same SHA1. Dropbox will recognize the SHA1 and do one of two things:

- sync their file on top of your, or;

- sync your file to their file system

(not a real example, and an actual implementation will be less naive, but this is what can happen)



I would expect a tiered approach when it comes to deleting data. E.g. before deleting you actually check byte by byte, but because that is too expensive to run on every file, you use the hash to narrow down what files you are testing against each other (and maybe even a cheaper hash on top of that one that determines whether you will spend CPU cycles on doing an expensive hash).

Perhaps that's how it's actually done, I'm not sure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: