Open-Sourcing Twitter Heron

scaleout1 · on May 25, 2016

As someone who has used Heron (along with MillWheel, Spark Streaming and Storm) I feel like this announcement is too late. The biggest thing Heron offer is raw scale but since they decided to use existing Storm API, it has the same shitty spout /bolt API that Storms offer. In contrast, Spark streaming/Flink/ Kafka Streaming are all offering map/flatmap/filter/sink based functional API. At twitter most teams used SummingBird on top of Heron to get the same functional API but summingbird didnt get a lot of traction outside twitter and I am not sure how actively maintained OSS version of summingbird is. Even if you bite the bullet and decide to use SB with Heron, you will still miss out on a lot of usecases as SB was mostly focused on doing read/transform/aggregate/write whereas most streaming problem that i have noticed outside of twitter involve doing read/transform/aggregate/decision/write. I suppose you can implement decisioning in SB but i havent seen it done.

Comparing Heron to google millwheel is interesting because of the design choices they made. Heron support at least one and at most once message guarantees but at Twitter most job ran with acked turned off so it was at most once with acknowledged data loss ( they had a batchjob doing mop up work to pick up missing data). Google on the other hand implemented exactly once semantic by doing idempotent sinks/ watermarking and managing out of order messages plus deduping support. Since both Flink and Spark will be implementing Apache Beam (millwheel's predecessors) model, only reason I see someone picking heron instead of Flink/Spark is that they are operating at massive scale that flink/spark dont support yet

eldenbishop · on May 25, 2016

Storm is a low level system for managing (optionally) transactional multi-machine tasks. It makes no assumptions about what is being processed (ie. analytics, data transforms). The primitives you are talking about exist in the child project Trident which runs on top of storm. Storm itself is no more for analytics than a web-server. It is a lower level tool.

weego · on May 25, 2016

The parent also ignored the time-to-process difference which is drastically lower in storm. It has its flaws but scale is not the only metric to use as a decider

jsmthrowaway · on May 25, 2016

> they are operating at massive scale that flink/spark dont support yet

Flink certainly scales just fine, for what it's worth. Flink 1.0 is quite good, and I'd consider what I'm doing "massive scale"; the ease of 1MM+ QPS with decent p95 latency via Flink surprised me compared to other systems that I investigated in this space. Most hip-fired benchmarks, including that awful Yahoo! one that everybody cites, use Flink poorly.

Rest of your comment is great and I couldn't agree more. Spot-on analysis. Twitter made a misfire here buying out Nathan Marz, neglecting Storm in favor of Heron while the rest of the field advanced (notably Google's open source work and Flink), announcing Heron which is so much better but keeping it to the chest for a while, then losing out on both of their streaming engines in time. Storm and Heron both feel too little too late, particularly Storm's recent (vast) performance improvements which a lot of folks I know kinda shrugged at and which is kinda too bad.

The Dataflow/Beam/Flink stuff is the compelling horse right now, to me. Just my personal opinion.

abc_lisper · on May 26, 2016

> Most hip-fired benchmarks, including that awful Yahoo! one that everybody cites, use Flink poorly

Why are Yahoo! benchmarks awful? How did they manage to use Flink poorly?

For people who don't know what he is referring to, check this: https://yahooeng.tumblr.com/post/135321837876/benchmarking-s...

RBerenguel · on May 25, 2016

I had a look at the open source SummingBird as a possible way to implement a (soft) real time project I have, because I'm not specially Java-ish and Storm does not seem to play that nice with Scala (I've been told it works decently with Clojure, though, that might have been a solution to my non-Javaness) and it looked somewhat stale.

Ditched it and decided to do it in Spark with Scala (making it a good excuse to learn Scala). With so many real time options popping up and around, deciding which to pick is getting harder and harder.

heavenlyhash · on May 25, 2016

Thank you for this comment. It contained orders of magnitude more useful information about the API choices and data models that define this system than the linked article itself.

lovelearning · on May 25, 2016

They have dropped clojure completely. All the critical path messaging in Storm was done using clojure. Dropped netty too. The actual messaging ("stream manager") seems to be C++. Perhaps that explains the latency and CPU improvements. The architectural changes mentioned account for better cluster utilization, fault tolerance and back pressure implementation, but don't explain why raw streaming performance is so much better in Heron.

Edit: Not so sure what they are using for networking. Seemed like cpp-netlib at first, but I don't think so now.

JonathonW · on May 25, 2016

Looks like a homegrown networking framework: https://github.com/twitter/heron/tree/master/heron/common/sr...

Their default event loop implementation uses libevent, and they're using protobuf in some of their higher-level networking classes, but the networking code itself seems to pretty much be plain sockets (with a thin portability layer on top in a few places).

zintinio5 · on May 25, 2016

It's some...interesting code.

https://github.com/twitter/heron/blob/master/heron/common/sr...

untog · on May 25, 2016

So that's what Twitter has been working all this time

zenlikethat · on May 25, 2016

Feel like they should feature the link to the repo much more prominently in the article!

For anyone curious, it's https://github.com/twitter/heron

ghayes · on May 25, 2016

And they built a site for the project with a good Getting Started section: http://twitter.github.io/heron/

Getting Started: http://twitter.github.io/heron/docs/getting-started/

buremba · on May 25, 2016

Great news! Here is the paper if you're interested in Heron. http://dl.acm.org/citation.cfm?id=2742788&CFID=620516550&CFT...

baby · on May 25, 2016

I'm genuinely curious as to why new products are constantly written in Java. My experience with the language is far from pleasant. Is it because people actually like the language? Is it because there is no other alternative when it comes to solid development? ...?

fizx · on May 25, 2016

Let's say you want a GC'd language (no C, C++) on *nix (no C#), good/varied libraries to work with (no esoteric languages), good performance (no Ruby, Python), reasonable options for implementing concurrency (no Javascript), what's left?

Looks like a JVM language (Java, Scala, etc), or Golang. Java has better tooling and more mature implementations. I personally find modern Java nicer to write than Golang (though Scala nicer than both). These days, it comes down to whether the JVM memory overhead is a big deal on the specific project, and if not, for the class of projects discussed in the previous paragraph, I'm probably choosing Scala (but if my teammates object, then it's back to Java).

jsmthrowaway · on May 25, 2016

> Let's say you want a GC'd language (no C, C++)

IOW, "Let's say you want largely random pauses to impact latency non-deterministically." I'm not knocking GC, but it has serious consequences in systems that make this a weird assumption with which to start, particularly the "big" GC languages that you named. Scala in particular with its immutability generates a lot of garbage if you are not careful, and we were constantly fighting GC pressure (and, indirectly, our achievable concurrency and efficiency) at a household name Scala app. The number of JVM developers who are aware of this and capable with the memory subsystem -- i.e., off-heap strategies such as that used in Flink (relevantly) and some of the clever speedups in netty -- are dwarfed by the number of JVM developers in the whole, so you really need someone who understands these problems to efficiently scale a JVM language.

There are better systems languages if you can move away from GC, such as Rust. And modern C++ is fantastic for systems, and it's too bad it gets overlooked because of bam, your first assumption there. Most systems at Google are in C++ because they invested into the infrastructure that you need to support it. It should tell you something that they're heavily involved in each new version of C++.

(I'm talking about systems, not applications. GC is a viable tradeoff for programmer productivity in applications.)

snaky · on May 25, 2016

> you really need someone who understands these problems to efficiently scale a JVM language

Yes. That's not a problem to find and hire one or two of these guys to write a core. And then hire a ton of regular Java developers to write a ton of code you need around that core, educated and managed by those two smart guys.

Compare it to the problem you have going C, C++, Rust, Nim way. Now you need to hire a ton of guys who can write good C++ code (because it must be good to not to crash every 5 seconds, at least), or even worse - a ton of guys who can write any (not even good) Nim code.

jsmthrowaway · on May 25, 2016

Your entire second paragraph indicates to me that you haven't spent time at a well-oiled C++ shop or potentially haven't worked with modern C++ at all. What does "crash every 5 seconds" even mean? Do you think outside of Java we are all sitting around pounding rocks together and dealing with segfaults all day long? Are you saying JVM languages don't NPE in poor hands?

There are three focus areas of wrongness in your comment (maybe four if I'm feeling particularly culturally trolly) but I'll stick with that one and leave the others to others.

coredog64 · on May 26, 2016

I think he is casting aspersions on the average programmer. I'm on a significant chunk of PRs across projects at my employer and I have seen crazy crazy shit from Java devs with years of experience. In some cases these same devs hold Java certifications from Sun/Oracle. I shudder to think of what would happen if they decided to have these guys write C or C++.

fizx · on May 25, 2016

I'm excited for Rust for the reasons you talk about, but when I tried it a year ago, FFI was broken. I'd like to find a reason to retry it soon!

steveklabnik · on May 25, 2016

Do you remember how it was broken? I wasn't aware of any problems, but if you can remember, I can give you an update.

jedisfullofit · on May 27, 2016

"Scala in particular with its immutability generates a lot of garbage if you are not careful, and we were constantly fighting GC pressure (and, indirectly, our achievable concurrency and efficiency) at a household name Scala app. The number of JVM developers who are aware of this and capable with the memory subsystem -- i.e., off-heap strategies such as that used in Flink (relevantly) and some of the clever speedups in netty -- are dwarfed by the number of JVM developers in the whole, so you really need someone who understands these problems to efficiently scale a JVM language."

Jed, once again you are laying claim to experience you do not have. If you ever once had touched Scala at Foursquare it would be a different story. This is just a bunch of tired platitudes.

PeCaN · on May 25, 2016

> what's left?

- Erlang (this fits your criteria at least as Java...)

- OCaml (concurrency isn't the best)

- Haskell (also fits your criteria very well)

- D (easy to use C libraries and sometimes C++)

- Rust (not GCed, but why is GCed a requirement?)

- Mercury (admittedly pretty obscure, but using C libraries is easy enough and it fits everything else)

Java only seems like a good choice if you ignore all the languages that are a better choice than Java. IMO Erlang seems like the choice here (or Elixir if you don't like Erlang syntax).

mbreese · on May 25, 2016

Let's also say that you want to be able to hire programmers to write and maintain it. And it would be nice to make it open to many more people so they could find and fix bugs.

After that, what's left?

At the moment, none of the above languages have any significant talent pool to draw from. If you're operating at "Twitter-scale" (I guess that's now a thing), you also need to be able to spin up a team of people that can get to work quickly without having to learn a language as they go. That eliminates a great number of languages just for logistical reasons. Rust might get there (and I hope it does!), but it's not there yet.

Maybe Twitter already has a significant number of skilled Java engineers, so they decided to use what they already knew really well. There's nothing wrong with that. Choosing any of the other languages you listed would have been a far riskier strategy.

zjaffee · on May 25, 2016

None of these languages even come close to the level of library support and popularity as Java. Additionally it becomes a people problem when you use a language people aren't as familiar with leading to people building connectors which don't perform as well as using the original libraries.

With Java, you can directly import a lot more, especially since the vast majority of OSS in this ecosystem is written in a JVM language.

hansjorg · on May 25, 2016

I think you're underestimating the value of the enormous Java/JVM ecosystem.

That said, I'm hoping to deploy my first Rust project to production soon. It's come a long way for being such a young language.

PeCaN · on May 25, 2016

> I think you're underestimating the value of the enormous Java/JVM ecosystem.

You could be right. For a number of the languages I mentioned I banked on really good FFIs allowing easy usage of C libraries. Perhaps in some domains, including stream processing, there are more and better Java or C++ libraries for them to build on.

nstart · on May 26, 2016

Hey.. I'm just super curious to learn a lot about the Rust ecosystem. Would you be up for sharing any details of what your rust project might be? Thanks!

pron · on May 25, 2016

Aside from Erlang which is far too slow (most Erlang apps use C for the data plane) the others are far too risky choices in terms of hiring, future support and general maintainability. Also there are far too many unknowns about them. Rust shows a lot of promise and may very well be suitable for this kind of things, but it may still be too young for an important infrastructure project at Twitter. It is also harder to program than in Java, although for an infrastructure project such as this, this may not be a deal breaker.

Another reason to run on the JVM is that often your applications run on the JVM, and the JIT -- which is getting better and better, and will get incredibly good in Java 9 -- can really do cool stuff when it optimizes across the app/infrastructure line. I've just seen a paper[1] showing 3x performance boost when rewriting parts of SQLite in Python, and letting the JIT optimize the DB together with the app.

[1]: http://arxiv.org/abs/1512.03207

technion · on May 27, 2016

    Aside from Erlang which is far too slow (most Erlang apps use C for the data plane)

I'm going to call "citation needed" on that. Riak and EJabberd have a reputation for scale and performance, and the only C I can see referenced is where existing libraries like Zlib get used.

Edit: You are still right. The hiring issues are relevant, and apply to Erlang also.

hyperpape · on May 26, 2016

Is there any kind of overview of the JIT improvements coming in Java 9, or do you have to follow the relevant mailing lists/commits?

pron · on May 26, 2016

I'm referring to Graal, which would be available in Java 9 as an alternative JIT to HotSpot's C2 compiler: https://wiki.openjdk.java.net/display/Graal/Publications+and...

lpw25 · on May 27, 2016

> OCaml (concurrency isn't the best)

Pet peeve: OCaml's concurrency is pretty good, it's parallelism that it struggles with.

010a · on May 25, 2016

Finding developers is hard. Finding developers who know those languages to support your software is near impossible.

abiox · on May 26, 2016

java isn't that bad. there's a lot of crappy code-culture that comes from the uber-enterprisey arena, but in more reasonable hands it's quite capable.

it's also worth nothing that Java comes with the JVM and Hotspot JIT, which earn it a bit of swagger.

flippant · on May 25, 2016

- Erlang/Elixir

  - OP mentioned performance. Erlang is terrible for

CPU-bound tasks.

- Haskell

  - OP said non-esoteric.

- Rust

  - There's a significant ramp up before you learn to deal with the compiler and lifetimes.

PeCaN · on May 25, 2016

> Erlang

Erlang's CPU performance is not bad IME (though admittedly I can't think of anything I've implemented in both Erlang and Java to compare). But that's beside the point; you pick Erlang for networked and I/O bound tasks. Twitter mentioned running Heron on several hundred machines—at that scale, single-thread performance starts to matter less and Erlang's scaling efficiency closes the gap.

> Haskell

Haskell may have been esoteric in 2005, but it's hardly so anymore. (Also, I mention Mercury, and Haskell is the language you decide is too esoteric‽)

> Rust

So you're saying they wouldn't use it because they didn't know it. I suspect that's the real reason they chose Java—not because it is necessarily the best language suited for the task, but because it's the best of the languages Heron's developers knew.

asdf1234 · on May 25, 2016

> Erlang's CPU performance is not bad IME

Erlang is very slow when it comes to CPU bound tasks. It's nowhere near Java for tasks like this.

galistoca · on May 26, 2016

What is the definition of "best" anyway? If you really just think about pure performance nothing can beat a binary code probably. When you say "best" it probably means most productive in the short/long term. In that case it's not about what the language is good at. After all it's humans who use the language. Just like how esperanto is such an awesome concept but nobody uses it and that makes it a useless language, the more people use a language the more valuable it becomes. And yeah Haskell is esoteric. I am confident that I know way many more programming languages than an average programmer but I don't program in Haskell. Try picking 10 random programmers and ask them if they can program in Haskell, at best you'll find 1 (actually that's even being too optimistic)

insulanian · on May 25, 2016

> on nix (no C#)

C# runs absolutely fine on nix.

int_handler · on May 26, 2016

Yes, it is possible to run C# on *nix using Mono but the ecosystem isn't as rich as that of Java.

zintinio5 · on May 25, 2016

It has good tooling, IDE support, lots of people know it, it's fast, relatively easy to write, accessible from Scala, Groovy, Clojure, etc. Java is a step removed from having to write a system in C or C++, but you don't need absolute control over every single detail (garbage collection is acceptable). Most development happens in Java, C#, or C++, just not on HN.

There's something of an inertial effect with Java, where people continue to write Java because everyone else writes it (and thus it continually improves). What other platforms do you know of that have a similar track-record of performing in large-scale software?

EdSharkey · on May 25, 2016

As I understand it, Twitter is heavily invested in Java and the JVM. They maintain their own fork of the JVM that they tweak for low latency. As a starting point for speed and stability, the JVM is top of the line.

I like coding with Java 8. There's lots of new syntactic sugar that let you cut down on all the boilerplate and cool new standard library features like Java 8 streams that let you do functional programming. The best part of Java 8 streams is they perform really well. You get more readable code along with better performance.

drinkabeer · on May 25, 2016

I think predictable performance is a major reason. Also, there are loads of developers which are still using Java, so you don't need to worry about expanding your teams and teaching hordes of developers a new development setup.

notJim · on May 25, 2016

I actually like Java pretty well. The language itself is mostly pretty OK, the tooling is super mature, it's typed which is huge. The community has a lot of smart people. Definitely not something super-exciting or my favorite language, but overall it's just pretty smooth to work with.

I'm curious what you found unpleasant about it.

gshx · on May 25, 2016

Go-lang is good but not as mature. Other than that, there's C++ and Erlang/Elixir. Not a whole lot of mature choices out there. For most of the critical production grade systems that need to be secure, have to scale and perform predictably, you don't want to blaze the trail.

zintinio5 · on May 25, 2016

I think your last point hits the nail on the head. There's a large amount of risk (especially at Twitter's size), and nobody wants to be saddled with an enormous and embarrassing failure. In addition, you need many hands to build the system, and finding developers for <favorite lang> that ALSO have relevant experience could be difficult. You could make arguments that developers in <favorite lang> are 10x as productive as your typical Java dev, but few people are going to take that chance. I'd say even Scala is still considered bleeding-edge at a lot of companies.

abritinthebay · on May 25, 2016

It's not bad once you strip out most of the crap that it tries to foist upon you.

But honestly...? It's taught in schools a lot so more people are familiar with it. The greatest disservice to an entire generation of programmers is Java being standardized at Universities.

draw_down · on May 25, 2016

Well, sometimes other things matter than developer ergonomics.

abc_lisper · on May 25, 2016

Wonder how this compares to Storm 1.0

abc_lisper · on May 25, 2016

Why am I being down-modded? When Twitter first announced Heron(~ a year ago), they compared it with a version of Storm available then. Since then Storm has improved it's latency, ability to scale-out, added back-pressure and in some cases it is 10x faster than the previous version of storm (0.10.0?). I was wondering how Heron compares to Storm's performance now.

itaifrenkel · on May 25, 2016

Does anyone know if Storm ShellBolt compatible with Heron, or is there a better way in Heron for running non-jvm bolts?

dswalter · on May 25, 2016

I've seen a decent amount of complaining about the difficulty of supporting a kafka infrastructure. I'd be interested in thoughts from people who have used Heron as to how it is running it in production.

elodina · on May 25, 2016

We open sourced Kafka support Spout, Bolt & Example Topology https://github.com/twitter/heron/pull/751

We run Kafka Mesos https://github.com/mesos/kafka allows for as a service implementation for broker clusters for users and apps.

Vagrant for running example https://github.com/elodina/heron/blob/v2/contrib/kafka9/vagr...

vemv · on May 25, 2016

Hearing about a Clojure project being dropped feels like a slap in the face. I only wonder if it was more of a political decision.

mark242 · on May 25, 2016

How does Heron compare to Spark streaming?

jsmthrowaway · on May 25, 2016

Well, the big one is that Heron is built to support real-time streaming while Spark Streaming is not, given its choice to use micro-batching. If latency matters, that matters to you.

I'd appreciate if it were called Spark Microbatching, but we can't have everything.

kod · on May 25, 2016

The amount of FUD thrown around about micro batching is really kind of silly.

How many streaming analytics use cases are ok with the JVM, ok with 10s of millis of latency, but not ok with 100s of millis of latency?

jsmthrowaway · on May 25, 2016

Remove "analytics" from your statement and you'll arrive at the answer, because while analytics is driving the streaming space as we speak it is literally one of the most boring applications of streaming imaginable. I positively cannot get excited about engagement numbers from an event pipeline. Intrusion detection, fraud detection, about half of infrastructure monitoring tasks, event sourcing... You're also glossing over the impact of microbatching which looks more like seconds than milliseconds, as well as what microbatching does to your windowing abilities (such as having to double-process to work around the microbatch interval as applied to your desired windowing semantics).

Microbatch latency immediately rules out several useful applications of streaming. I also didn't say anything about the JVM nor it being okay for my purposes. You did.

I can back up my statements on streaming from Flink to Samza to Dataflow/Beam to Storm to MillWheel to Spark "Streaming" and back again because it has been my primary focus (literally thinking of nothing else) for a couple years. Please accuse someone else of FUD because that's a relatively veiled way to say "you don't know what you're talking about," and I assure you that you're (condescendingly) wrong. I think you're also coming at that angle from interpreting me as negative on Spark Streaming. Read carefully.

kod · on May 26, 2016

> You're also glossing over the impact of microbatching which looks more like seconds than milliseconds

I've run spark streaming jobs at 250ms batch times. This comment you made, right here, is why I used the word FUD.

mark242 · on May 25, 2016

Okay, the better question then would be, how does this compare with writing a processor on top of Akka streams?

ps4fanboy · on May 25, 2016

If twitter's stock price continues to fall maybe they will open source the entire stack.