I'm on the data team at Librato, happy to answer any questions. Some of the Librato team will also be at ReInvent next week, can discuss more in person for others attending.
How much of your decision to replace Storm stemmed from "this would be a fun technical challenge" that you economically justified by cutting server costs? It's hard to tell from the blog post whether you view SuperChief as an "indictment" of running Storm at scale, or whether you just thought building your own stream processor would be a great way to learn new things and cut server costs at the same time.
This was not "indictment" of Storm. We made the analysis and determined that in order to achieve the goals of the business we were not going to be able to take our current system to a 10x scale without a significant investment. Like any complex, distributed computing system there is always the question of when it makes sense to replace the portions of the system that you are not able to reason about sufficiently enough to scale.
Superchief is not an off-the-shelf-framework that anyone will be able to drop-in to fulfill their stream processing requirements. We were running on Storm for 2.5 years prior to moving to SC and during that time we truly developed the understanding of the system and our own workload to be able to build SC. SC builds on many assumptions and understandings of our workload that we did not have when we started with Storm.
For us we decided this was the path that would lead to the quickest and highest reward, and we were able to justify the effort by not only halving our infrastructure footprint, but being able to scale the next 10x. Other companies, eg. Twitter Heron, have made similar realizations.
Makes total sense, especially the part "Superchief is not an off-the-shelf-framework that anyone will be able to drop-in to fulfill their stream processing requirements".
So, essentially, Storm solved a more generic problem than you needed it to solve. Superchief is a narrower system that more closely aligns with your problem domain, and this let you not only simplify a little but also gain efficiency. Makes sense!
(p.s. the reason I ask is because I am deeply committed to the Apache Storm open source community, as one of the co-authors of streamparse and pystorm.)
first, thanks for the post--a gold mine of distributed stream-processing knowledge of the type it's probably only possible to acquire the hard way.
so obviously Librato develops on the JVM, if you were to begin SuperChief today, now that Akka Steams is 1.0, would you have considered using it? Also, Apache Storm is true one-at-a-time streaming; is SuperChief same or micro-batch? And finally, did i read correctly that you are using Zookeeper but a separate library for leader election? Does this work w/ the z-nodes or in place of?
Thanks! SuperChief does one at a time streaming. The library for leader election is just a wrapper around an apache curator leader latch with some instrumentation and logging so we can reuse it in other services.
As someone about to invest big time in spark streaming for processing event series data I'm wondering if you could comment on what makes it a bad for for your workload.
See other comment about Spark and frameworks. If you are just starting out with stream processing I would highly recommend using the best-of-breed frameworks. Frameworks provide abstractions that let you focus on solving your domain problems quickly, you don't want to have to worry about the plumbing until you've solved those.
Any consideration given to Scala or Clojure, or did raw performance (via Java 8) win over greater abstraction capabilities from the start?
IIRC, Twitter's Storm replacement, Heron, is written in C++ -- i.e. they didn't go with Spark/Scala, which, given that Twitter is probably the largest Scala shop in the world, speaks volumes about the volume of data these sytems need handle (read: Spark is far from slow).
Hi, I'm the author of the blog post and wrote SuperChief. We chose Java 8 because it's the language we're most comfortable with on the data team and find it easiest to reason about. We also reused some of the existing bolt code we wrote for Storm to run the time series aggregations. It was already written in Java so we just had to make it thread safe. I suspect you could get similar performance with other jvm based languages.
I was interested in looking at Spark for this for a bit but coming from Storm we decided we wanted something stripped down that was more purpose built.
Rather than build your own, did you consider moving to another distributed stream processing platform such as Spark Streaming or Flink? If so, curious to know what disqualified them.
Yes, we did survey the solutions out there at the time. At the end of the day the decision was really made on the fact that we knew the domain and problem we were trying to solve with Storm very well. The same topologies we've replaced with Superchief have been running on Storm for years and the business logic of them has not changed that significantly. Given the team's experience, it was easier for us to replace the streaming plumbing than become experts on another abstraction framework.
So this is definitely not a dig at any framework. If we were starting to build stream processing into our infrastructure today we would most likely start on an existing framework. Also, I imagine as our needs grow we will leverage these frameworks in future proof-of-concept streaming projects that may or may not make sense to move to Superchief.
We grab a timestamp when data hits our API (largely as close to source as we can get) and use that to order time samples. For time series the ordering requirements aren't as strict as other problem domains, since we can usually go back and overwrite previous time series data points.
Do you using batching to reach that scale of throughput? Streams sometimes are pre-aggregated data and it wasn't clear on if you maintained the granularity through the changes.
I can't speak for their implementation but batching is not necessary. Stream processing complex JSON documents and storing the documents to disk at rates of 500k documents/second per server is demonstrably achievable on some scale-out systems.
The internal architectures make an enormous difference in throughput. A proper high-performance stream processing engine does not look anything like the "Hadoop in RAM" style model.
> Stream processing complex JSON documents and storing the documents to disk at rates of 500k documents/second per server is demonstrably achievable on some scale-out systems
So is it per server or scaled out? I thought SSDs have capped around 100k discrete per second (P/E aka write cycles).
Can you give an example? I've been unable to practically reach more than a scale of 10k/sec/server using a number of technologies and combinations to collect from socket, parse json and write to socket. That's just my specific use case.
Looking at the top end of Intel's SSD lineup I see that they have a product that advertises up to 175k IOPS of random 4K writes. Is this what you are referring?
There's no batching, we have a 1 to 1 mapping of kafka messages to measurements we receive from our api, that could change though over time. Superchief just reads the messages and each message is passed off to another thread for processing.
Yes, we do plan to at some point. We want to get a longer production confidence before putting it out there for others and to build necessary documentation that'll make it useful for other teams.