I was at the presentation - it's a very smart system. They basically maintain a whole bunch of counters for any particular thing that's being tracked. For example, say someone clicks on a t.co link to blog.example.com/foo at 11:41am on 1st Feb. Rainbird would increment counters for:
t.co click: com (all time)
t.co click: com.example (all time)
t.co click: com.example.blog (all time)
t.co click: com.example.blog /foo (all time)
t.co click: com (1st Feb 2011)
t.co click: com.example (1st Feb 2011)
t.co click: com.example.blog (1st Feb 2011)
t.co click: com.example.blog /foo (1st Feb 2011)
t.co click: com (11am-12 on 1st Feb)
t.co click: com.example (11am-12 on 1st Feb)
t.co click: com.example.blog (11am-12 on 1st Feb)
t.co click: com.example.blog /foo (11am-12 on 1st Feb)
t.co click: com (11:41-42 on 1st Feb)
t.co click: com.example (11:41-42 on 1st Feb)
t.co click: com.example.blog (11:41-42 on 1st Feb)
t.co click: com.example.blog /foo (11:41-42 on 1st Feb)
So that's 16 counters to track one link, but it means they can do fast, denormalised queries in realtime to track how that link is performing.
It's not just for t.co links - they can use it for internal server monitoring tools, tweet counts, advertising metrics... pretty much anything that involves counting at scale.
It's possible to build a similar system for much smaller scale applications using atomic counters in Redis - I've been experimenting with something like that for some of my own projects.
if you follow various twitter tech talks you'll notice that they spend a lot of time talking about 'fan out'. they basically prefer writing many things asynchronously vs. having to aggregate later. makes sense... disk is cheap and only getting cheaper. but processing time to return realtime data and information is finite.
VoltDB looks pretty awesome, but I'm pretty concerned by its lack of ability to join a table to itself, to have over six tables in a join, to aggregate or select distinct arbitrary values. http://community.voltdb.com/docs/ReleaseNotes/index
At VoltDB, we're working on improving our SQL support. Initially, we focused on core SQL useful for OLTP. We've added functionality with every release and we plan to continue in 2011.
For example, the March release of VoltDB will support more than 6 table joins and has some improvements to aggregation and distinct code. It also has a much more usable explain plan feature.
We feel that VoltDB offers one of the richer query interfaces of systems that scale to its level, but we don't plan to sit still.
I think that at the rates Twitter is writing counter data (Many TBs per day denormalized, ~0.5TB normalized), a RAM-based solution like VoltDB would be prohibitively expensive. Rainbird allows Twitter to use cheap disk-based storage but still get acceptable (sub-second) latency.
Perhaps, but it really depends on what you want to do exactly. You can probably store a lot less and get the same throughput.
- VoltDB isn't log-structured, so you really only have to store the state. How fast you can mutate it isn't limited by RAM amounts. We see use cases with utter firehoses of data that update just tens or hundreds of gigabytes of state.
- Beyond normalization, you can probably reduce the number of redundant counters, e.g. use SQL to count which URLs start with "amazon". This would be painful in many systems, but depending on the query, can often be done at scale in VoltDB.
- The byte overhead per counter is also likely much lower in an ACID/Relational store.
Finally, VoltDB is designed to migrate data to disk based stores (such as Hadoop or an OLAP store) as memory fills up. This is a feature we're working very hard on see as a big differentiator. It adds complexity if you need to query across stores, but you get a best-of-both-worlds feature set.
I'm missing the point, so I hope the HN community can enlighten me, but: why do you need Cassandra and all that jazz, if you're just incrementing counters?
A decent memcached instance on modern hardware can easily push several 100K updates/sec . Couldn't you do the same with a pool of sharded memcached servers? Compute the MD5 hash of the string you want to count (which can be 16 strings, from the top-rated comment above), and just use that as the key.
We've seen people push 750K QPS on an InnoDB via HandlerSockets http://news.ycombinator.com/item?id=1886137 , so imagine what you could do with a sharded pool of 20 InnoDB servers.
Again: if I'm missing something, I'd love to learn.
As far as datastores go, cassandra is write-optimized. Writes are faster than reads. So for this use case (heavy denormalization) it is a good fit.
Also, Cassandra reduces the operational complexity of having a logical store which spans multiple hosts. Your memcache example does not get persistence for free, and sharding mysql is something you have to do manually. The interface to a Cassandra cluster is the same regardless of how many nodes you are running.
Scaling writes is "hard". Incrementing counters is obviously write heavy, and cassandra aims to make it easier.
The day half your memcached data center loses power will be a sad sad day indeed, and the story of 750k QPS on InnoDB was about read traffic, not writes.
Watch this talk[0] from Qcon SF November 2010. Ryan King goes into a lot of detail about many nosql implementations at twitter. This is the first time I had heard about distributed counting by way of Cassandra to implement tweet counts specifically and obviously count other things in general. In particular Cassandra is pointed out as having tremendous write availability to enable this sort of thing. He also mentions various specifics in how the feature is coded/designed and different approaches they had to take until they got a final version.
On another note, last year Twitter open sourced Gizzard.
http://engineering.twitter.com/2010/04/introducing-gizzard-f... (ironically, nobody seems to have tweeted this article!!)
Natural curiosity-I downloaded the repo,and tried to understand what it was.
Later,it also seemed the buzz surrounding ( on HN atleast) didnt last much, and I forgot about Gizzard completely.
Are there startups/bigCos using Gizzard,other Twitter's open source stuff?
i played with gizzard for one of our services with redis as backend. twitter engineers on #twinfra IRC channel helped me a lot on understanding gizzard and flockdb src code. Not yet using in production though.
Twitter is clearly brilliant and creating a viral and useful product. I admire everything they've achieved in terms of user adoption and usefulness.
They're proven without a doubt that technology is only one ingredient and it doesn't have to work well for a web or mobile business to grow.
They're the last company I'll look to for technology to use in my business or as an example on how to run operations. I have no interest in any of their open sourced products or in any advice from them on how to run my data center.
I honestly did not understand if you were being sarcastic in your third paragraph.
I'm still not sure, but it seems you're peeved that Twitter hasn't been perfect and not impressed with Rainbird. (Right?)
This is a bit unfair I think? No one's really waiting for your advice on how to run your data center, whereas Twitter is dealing with the sort of volume that your data center probably couldn't begin to handle without finding some of the same solutions Twitter is now presenting.
Now we could possibly (depending on what you actually meant) get into a debate about Twitter's failures, but that's really not interesting, because your second paragraph is true: technological perfection is not enough by itself.
That doesn't mean Twitter doesn't have anything interesting to say about technological perfection.
But mostly I posted this in order to express the sheer confusion I felt reading your post. If you don't like Twitter, just say so.
I agree with his general sentiment, I'd only phrase it differently.
The point is not that they've been "through some rough patches". The point is that they failed for years to come up with a reliable implementation of a solved problem; pub/sub messaging.
Twitter is not "large" by any means. Your telco, stock exchange and many other companies have dealt with the the same problem-space for decades. Those reliably dispatch orders of magnitude higher throughput under much more complex routing conditions. Many of them operate under SLAs that mandate five or even six nines of availability.
Sure, twitter is (gladly) not dispatching emergency-calls, as such their requirements are lower. However, given their track-record they're in no position to give technology advice either.
50 million tweets/day[1] is not a serious workload for a messaging system.
The very existence of that "insult" amuses me. I miss the days when cooler than cool meant ice cold.
<meta>I posted this partially to experimentally examine the downvote clumping effect I've noticed on hn, where some reasonable responses to highly downvoted posts get downvoted by association.</meta>
meh? ultimately the back end for it is cassandra, correct? so they're what, going to open source their implementation of doing stats in cassandra? It's great for people to open source code, but I think it's more of a championing for cassandra, than it is a 'code dump'
also in that vein, the formspring comment / gist link is very commendable and awesome!
t.co click: com (all time)
t.co click: com.example (all time)
t.co click: com.example.blog (all time)
t.co click: com.example.blog /foo (all time)
t.co click: com (1st Feb 2011)
t.co click: com.example (1st Feb 2011)
t.co click: com.example.blog (1st Feb 2011)
t.co click: com.example.blog /foo (1st Feb 2011)
t.co click: com (11am-12 on 1st Feb)
t.co click: com.example (11am-12 on 1st Feb)
t.co click: com.example.blog (11am-12 on 1st Feb)
t.co click: com.example.blog /foo (11am-12 on 1st Feb)
t.co click: com (11:41-42 on 1st Feb)
t.co click: com.example (11:41-42 on 1st Feb)
t.co click: com.example.blog (11:41-42 on 1st Feb)
t.co click: com.example.blog /foo (11:41-42 on 1st Feb)
So that's 16 counters to track one link, but it means they can do fast, denormalised queries in realtime to track how that link is performing.
It's not just for t.co links - they can use it for internal server monitoring tools, tweet counts, advertising metrics... pretty much anything that involves counting at scale.
It's possible to build a similar system for much smaller scale applications using atomic counters in Redis - I've been experimenting with something like that for some of my own projects.