Hacker News new | past | comments | ask | show | jobs | submit login

2. Theory of CRC comes with noisy telecom lines where noise have a weired non linear random distribution making it impossible to "vectorize" the problem (aka parallelizing) (http://scienceblogs.com/goodmath/2007/07/27/fractal-dust-and...)

A dict/document/transaction maybe a fractal like structure, but the probability of alteration is not happening with the same shape/locality.

After reading this http://www.slideshare.net/Dataversity/redis-in-action , I am surprised.

Why the heck did they decided to focus on CRC? Early optimization?

First CRC16 is in fact used as a "perfect hash function". It is confusing to focus on the implementation and not the purpose. It is used to evenly distribute data in bucket. It is a 'perfect hash function' they should use and focus on (CRC being a special subcase of totally legitimate candidates for being perfect hash function, but if tommorrow a super hyper fast perfect hash function is out there ... ).

The CRC64 is used for a valid case of CRC BUT in an invalid domain.

Mathematically I could prove they are wrong. And I know where they have a probability problem and how there are sharpening the transition between functional/dysfunctional state, but how they are making so irreversible they will be FUBAR.

1) They are leaving of course there ass open to an attack by image by a malevolent attacker (having access to the storage for which they do the checksum (it is almost used as a digital signature so it should be a crypto hash function)). 2) probability of random collisions are not balanced by the fact a cause altering the signal normally also has some odds to alter the CRC and normally the signal is also a validation of the CRC that is the reason why CRC in real life should be a fixed fraction of the signal according to the probability of coalteration of both. So if you have a tolerance for faulty storage, as a result you will use it, and you might also reconstruct data from randomly corrupted with same CRC. It will be a mess the very exceptional but predictable day one shit happens, that's all I am saying. Every safeguards will be green :)

Something seems fishy.




First CRC16 is in fact used as a "perfect hash function"

It must be implemented by all clients too, so it needs to be simple and readily available.

open to an attack by image by a malevolent attacker

That wasn't one of the design goals.

if you have a tolerance for faulty storage,

There's always the problem of storage corrupting only the checksum even if your data is okay, leaving you in a pretty pickle.


parent is referring to a multithreaded algorithm, not one that uses SIMD instructions, which is what is meant by vectorizing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: