Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wait, people shard on a key other than something approximately random, like an sha1 hash!?


Where I work (Etsy) we keep an index server that maps each user to a shard on an individual basis. There are a number of advantages to it. For example, if one user generated a ton of activity they could in theory be moved to their own server. Approximately random works for the initial assignment.

Flickr works the same way (not by coincidence, since we have several former Flickr engineers on staff).


Sounds like a good system. I've noticed that people tend to do things like shard based on even/odd, and then they realize that they need three databases.

I've never had either problem though... but if I ever need to shard I plan on doing it based on object ID. Then one request can be handled by multiple databases, "for free", increasing both throughput and response time.


Even/odd isn't the end of the world, but you would then be best jumping to mod 4.


Actually for anonymous sharding (without a central index) a consistent hash is about the closest you can get to ideal distribution and flexibility. I haven't looked but I presume that's what mongo uses under the hood for their auto-sharding, too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: