ErLLVM: LLVM backend for high performance Erlang

asb · on March 24, 2014

This backend has been merged in to the Erlang/OTP master branch: https://github.com/erlang/otp/commit/9d46875b53ffb21bc55aec4...

If you're interested in LLVM you might want to subscribe to LLVM Weekly (which I author). http://llvmweekly.org/issue/12

dochtman · on March 24, 2014

LLVM Weekly is awesome, thanks for doing that!

sitkack · on March 24, 2014

You need to listen to asb, it is a compiler.

DAddYE · on March 24, 2014

Thanks! Subscribed!

delinka · on March 24, 2014

I think I need some guidance in understanding this project. The typical LLVM-base compiler 'stack' looks like this:

  [Front-end] -> [LLVM AST] -> [Back-end]

Where the front end compiles your language of choice to LLVM's AST format, and a back-end translates AST to machine code. All the really awesome code optimization stuff happens at the AST level before the back-end gets involved.

So is ErLLVM a back-end that emits Erlang (such that any front-end language can be translated to Erlang)? Or is it an LLVM front-end that compiles Erlang to LLVM AST? An in the latter case, if LLVM already targets ARM, why does ErLLVM need to do work related to ARM?

masklinn · on March 24, 2014

> Or is it an LLVM front-end that compiles Erlang to LLVM AST?

This, kind-of. It's a new HiPE[0] backend. See [1] for how LLVM integrates into the existing Erlang pipeline.

[0] http://www.it.uu.se/research/group/hipe/

[1] http://erllvm.softlab.ntua.gr/documentation/design/pipeline/

signa11 · on March 24, 2014

if you have not seen this: http://www.erlang-factory.com/upload/presentations/519/erllv... already, it might serve as a high-level overview of what is being attempted here ?

chrisfarms · on March 24, 2014

Can't remember which talk it was from but I remember enjoying Joe Armstrong speaking about the importance of a language being "correct" rather than fast:

    The performance problem is solved. You just wait 20 years.

alexchamberlain · on March 24, 2014

Doesn't work any more... Now, we need people to invent decent compilers and languages.

duaneb · on March 24, 2014

Nonsense, it still works. Hardware prices are still falling. People aren't writing web apps without garbage collection. Performance/$ is still going to rise.

mzl · on March 25, 2014

While I do understand what you mean about writing web apps in managed languages for optimizing programmer productivity instead of speed of code, garbage collection is not slow in general. In fact, garbage collection can be faster than manual collection in some cases, due to several reasons (e.g., GC may require operations based on the size of the working set only, manual collection will need to touch each piece of garbage explicitly).

There are a lot of things that make GC performance an issue, including stop-the-world behaviour for some collectors making real-time operations impossible as well as the tendency for programmers to produce a lot more garbage in their programs. In general, I think that most people overstate the percieved inefficiencies of GC as it comes to throughput (predicatable latency is the main culprit IMHO).

orkoden · on March 25, 2014

The problem with garbage collection is not only the speed penalty. Garbage collection will also use a lot more memory than manual memory management or automatic reference counting. This is especially bad when running in a memory constrained environment and dealing with data intensive tasks. Think image or video manipulation on a mobile or embedded device.

mzl · on March 25, 2014

Reference counting is very expensive time-wise and memory-wise if you have lots of small objects. Memory-wise because of the additional counting-field needed, and time-wise because of the counter update at every gain and loss of a reference. These updates can also easily become huge concurrency bottlenecks, since they introduce false write sharing. Another issue with reference counting is of course that cyclic structures are hard to handle.

It is true that GC works best when you have sufficient amounts of free memory compared to the amount of garbage produced. On any kind of memory-constrained (embedded) device I would naturally view everything about memory allocations as critical issues to handle, with clear memory usage budgets.

kalleboo · on March 24, 2014

> Performance/$ is still going to rise

But is performance/core also going to keep rising? Or do we need smart compilers/languages to make use of all the cheap cores?

lucian1900 · on March 24, 2014

We need a correct, expressive language that easily lets you makes use of all those cheap cores. Sort of like Erlang.

duaneb · on March 24, 2014

I mean, this conversation has been around for easily 10 years. We live in a world where we already have to do this—look at all the progress in the last decade on the prevalence of futures, async i/o, and channels/queues/whatever. Memory bandwidth and latency has proven to be far more of a performance drag that has improved far slower—the likely areas that will lead to performance problems will be in areas other than single-core clock speed.

eternalban · on March 24, 2014

Well, that -- the Memory bottleneck -- is precisely why we need a rethink of platforms and tool chains.

Languages will provide the semantics (in context of memory hierarchy) of data locality, and compilers that optimize for that.

[nop edit]

drkrab · on March 25, 2014

As it turns out, Erlang is already providing much better memory locality than your average language. Each process runs in a contiguous memory area (heap+stack) which fits perfectly with a many-core world.

duaneb · on March 24, 2014

> is precisely why we need a rethink of platforms and tool chains.

My point was, people are already thinking. It's not an easy problem and there's no obvious way forward.

jlouis · on March 24, 2014

This limitation is only true if you can't utilize multiple small cores. Erlang would absolutely rock on a 1024 core machine, where you almost need no scheduler.

sitkack · on March 25, 2014

Yeah a wafer scale cpu with hundreds of gigabytes sprinkled between those cores. Cores aren't the issue, memory bandwidth is. 90% of the silicon will be devoted to communication. Kinda like a brain...

pekk · on March 24, 2014

People are inventing decent languages, but that doesn't mean they will be used since adoption is a matter of fashion and historical accident as much as anything

VeejayRampay · on March 25, 2014

I mostly program in Ruby, a language created 20 years ago, and I know for a fact that this quote isn't always accurate :/

thinkpad20 · on March 24, 2014

"Currently, ErLLVM supports the AMD64 and x86 architectures. There is also some ongoing work for ARM."

Isn't the whole idea of LLVM that you don't need to support individual architectures, because LLVM can be separately compiled into whatever architecture you need?

masklinn · on March 24, 2014

Here, LLVM is not used to output an independent and self-contained binary (that would hardly make sense). It's used as a backend for HiPE's native code generation[0], and it's possible that the rest of HiPE (or part of the integration itself[1]) is not quite ARM compatible (especially ARM64)

[0] http://erllvm.softlab.ntua.gr/documentation/design/pipeline/

[1] "The linearized RTL code is translated to LLVM Assembly. […] After the translation is completed, the LLVM code is printed to a file and the LLVM toolchain is invoked in order to produce an object file. […] The lib/hipe/llvm/elf64_format.erl module is responsible for extracting the binary code and all other necessary information from the object file."

wmobit · on March 24, 2014

No, unfortunately front ends still need to be aware of some of the ABI details of the target to produce the IR for it.

cryptolect · on March 24, 2014

I thought Erlang was already fast? Are there any before/after benchmarks available post-LLVM?

masklinn · on March 24, 2014

> I thought Erlang was already fast?

Erlang has a pretty good runtime, is fast at IO and is excellent at concurrency and reliability[-2]. Erlang itself is a "fairly slow" language: for imperative, CPU-bound and non-distributable[-1] workload you'll probably be closer to Python than to C.

However, note that this does not make Erlang faster, it's "just" an LLVM-based backend for HiPE.

HiPE (High Performance Erlang) is a >15 years old native compiler for Erlang (integrated to the standard distribution), but with Erlang being what it is (a dynamically typed functional language) it's best used as a form of "manual JIT" by profiling, extracting and compiling small subsets of the application: indiscriminately used on everything it will just generate boatloads of slow and useless "native" code which is just a deoptimization over standard BEAM (it will likely run at the same speed but eat more memory).

Part of HiPE's pipeline is the generation of native code itself, and the goal of this project (ultimately) is to replace existing hand-rolled codegen with LLVM (and benefit from its optimizations), thus simplifying maintenance by moving codegen to a third-party project. Most of HiPE (integration with the VM and generation of HiPE's own IR before codegen) remains, and according to [0] ErLLVM is slightly slower than existing HiPE backends at the moment.

If you want more info on ErLLVM, [0] has a bunch. For HiPE, [1] is the easy mode and [2] is hard mode

[-2] reliability is the original goal of Erlang, and contrary to most languages Erlang reached concurrency and distribution with reliability — rather than performances — in mind: it comes from telcos, where things have to keep running and a single machine means your stuff is down when (not if) the machine dies or crashes

[-1] Erlang's VM is fully multithreaded, by default.

[0] http://www.erlang-factory.com/upload/presentations/519/erllv...

[1] http://www.slideshare.net/didip/high-performance-erlang [Comic Sans warning]

[2] http://user.it.uu.se/~pergu/papers/erlang03.pdf "All you wanted to know about the HiPE compiler"

rdtsc · on March 24, 2014

Well this started as an academic interest project. Erlang is fast for IO and highly concurrent workloads but which have low granularity. It is not that fast for say SIMD type calculations or math operations. So this is a research project to see if it can go faster in those areas.

Currently it is about 50/50 -- some workloads are faster with this back-end some not. Erlang was not designed for Computer Shootout type problems to start with so maybe those don't matter as much but it is always nice to have extra speed just in case (even if for marketing purposes).

netcraft · on March 24, 2014

The only exposure I have had with erlang was with rabbitmq and it was blazing fast - I wonder if this would affect it much?

jlouis · on March 24, 2014

There are some parts of rabbitmq which is highly CPU-bound. This will probably help RabbitMQ in those areas.

jasonlotito · on March 24, 2014

Hopefully someone with more knowledge of all the moving parts can answer this: would this benefit Elixir in any way, considering it also compiles down to Beam or is it specifically for Erlang? I ask this because the documentation makes reference to first compiling to beam, and then compiling modules using LLVM.

rdtsc · on March 24, 2014

I believe this will benefit Elixir as well. LLVM code would work on the BEAM bytecode (via the HiPE backend). You can already use a HiPE enabled erl interpreter with Elixir today to this won't change.

jlouis · on March 24, 2014

You would be correct, sir :)