Hacker News new | past | comments | ask | show | jobs | submit login

It's a bit more than that - sharding is fundamentally a way of spreading writes. You're limited in the number of writes you can make by the speed at which the disk rotates, so this quickly becomes a bottleneck, particularly in heavily-trafficked social sites. Sharding lets you spread your writes out among an arbitrary number of machines, so you can grow with traffic.

Big sites that are primarily read-hungry often don't need to shard at all, instead relying upon replication. Look at Craigslist - they have a single master database, about 20 slaves, and an archive cluster for postings that nobody looks at anymore. Or PlentyOfFish - 1 machine for billions of pageviews. Social sites with lots of user-created content need to shard much earlier - LiveJournal has a fraction of the users as Craigslist, but they had to start sharding around 2002/2003.

There're locality concerns as well, but they tend to be secondary to the write issue. Disks are big, and with good caching you can often avoid hitting the disk at all.




"Or PlentyOfFish - 1 machine for billions of pageviews. Social sites with lots of user-created content need to shard much earlier"

Is that really true? I find it hard to imagine...but something deep down inside of me says "Yes it's possible". Oh my...


Saying it's accurate is the kindest way to put it: http://news.ycombinator.com/item?id=23497




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: