This is interesting stuff (especially Cassandra), but I have to confess I've had nothing but troublesome and negative experiences with Thrift in my experiences with it, and right now I'd say that's the weakest link in this structure.
Thrift is fantastic when you're working in a C++-ish mode where CORBA-style IDL was as good as it got. But when it comes time to try and build something more flexible in terms of the protocol, or interface with scripting languages like Python which depends on a less typed kind of architecture, you're going to hit a lot of growing pains.
For example, I work on a project (which you can see public code for, check my github for Fuzed) that we'd like to look into providing a generic thrift interface for. But so far the Thrift infrastructure, despite being entirely capable of it, doesn't seem to show much interest in embracing flexible or generic protocols. Everything needs to be big-banged out up front, and this just doesn't jive with larger meta-projects building out fault-tolerant and flexible infrastructure that Hadoop doesn't meet the needs of.
This post's intent is not to say, "Thrift is terrible." Indeed, it is awesome at what it currently does. But if you want to go beyond that you're going to have to invest significant time and resources to get the protocol that binds all these amazing services up to snuff.
A Thrift service for distributed logfile collection. Scribe was designed to run as a daemon process on every node in your data center and to forward log files from any process running on that machine back to a central pool of aggregators. Because of its ubiquity, a major design point was to make Scribe consume as little CPU as possible.
Thrift is fantastic when you're working in a C++-ish mode where CORBA-style IDL was as good as it got. But when it comes time to try and build something more flexible in terms of the protocol, or interface with scripting languages like Python which depends on a less typed kind of architecture, you're going to hit a lot of growing pains.
For example, I work on a project (which you can see public code for, check my github for Fuzed) that we'd like to look into providing a generic thrift interface for. But so far the Thrift infrastructure, despite being entirely capable of it, doesn't seem to show much interest in embracing flexible or generic protocols. Everything needs to be big-banged out up front, and this just doesn't jive with larger meta-projects building out fault-tolerant and flexible infrastructure that Hadoop doesn't meet the needs of.
This post's intent is not to say, "Thrift is terrible." Indeed, it is awesome at what it currently does. But if you want to go beyond that you're going to have to invest significant time and resources to get the protocol that binds all these amazing services up to snuff.