Where does Pony's performance vis-a-vis Erlang/BEAM come from? The absence of dy...

spooneybarger · on Aug 21, 2016

There are a number of areas:

Most message passing on BEAM is done via copying. Pony can share the memory directly because the compiler enforces safe sharing via deny capabilities. You can only send 3 basic types to another actor:

iso -> mutable memory that you have the sole reference to val -> immutable reference. you can read but you can't write tag -> an opaque reference. good for sending messages to actors or doing identity comparisons

The type system certainly helps as well. It allows us to give LLVM hints on optimizations it can make. The type system also helps because we can do "dangerous but fast" things safely because we can prove they are safe in this instance.

Compiling to native code via LLVM is another area for performance wins.

That said, Erlang has over 20 years of rock solid production usage behind it and that unto itself is a quite a selling point. At this point in time, I'd suggest Pony to Erlang/Elixir users if they need more performance, otherwise, I'd stick with Erlang for now.

steveklabnik · on Aug 21, 2016

  >  The type system also helps because we can do "dangerous but fast"
  > things safely because we can prove they are safe in this instance.

This is an area of some static type systems that I'm really interested in; it feels counter-intuitive at first. You'd imagine that flexibility is what lets you do the things you need to do to go fast, but in many cases, restrictions actually are. Cool stuff.

mintplant · on Aug 21, 2016

Absolutely. In modern JavaScript engines, for example, dynamic typing makes something as basic as |foo.bar = 7| extremely complicated internally. In SpiderMonkey, |foo| could be a native object with |bar| in some varying location on a fixed or dynamic slot, or it could involve a proxy, a setter, unboxed or unboxed expando object, DOM or cross-compartment wrapper... To make the whole thing efficient, a particular property access could go through engine code, specialized JIT-generated native code after enough type information has been collected, or one of three inline cache systems, which generate multiple native code stubs |switch|ed on checks based on previous (slower) executions. A given get or set could even pass through more than one of the above, if too many checks fail and bailouts are required. And |bar| could be located directly on |foo| or on some object in |foo|'s prototype chain, requiring on-the-fly verification of additional invariants to ensure correctness.

Static typing would mean the engine can know for sure what |foo| is and where to look for |bar|, allowing faster, guaranteed-correct code to be emitted ahead of time. Dynamic typing makes it harder to offer speed, correctness, security, and good memory usage all at once.

PeCaN · on Aug 21, 2016

If you think that's interesting, you might want to check out ATS¹ and Mercury². ATS is wicked fast and doesn't even do some of the optimizations it's theoretically capable of (I think its alias analysis is fairly primitive). It compiles to C, but can use type information to remove bounds checks in many cases. Linear types mean memory and concurrency safety with no runtime overhead. (You're on the Rust team, right? So I suppose you're familiar with linear types—ATS's are much more powerful than Rust's affine types though.)

Mercury has uniqueness types, so can be remain referentially transparent while compiling to code that mutates. The compiler has fairly advanced automatic parallelization and can in some cases do compile-time garbage collection (i.e. it knows at compile time when an object will become inaccessible).

--

¹ http://www.ats-lang.org/

² http://mercurylang.org/

steveklabnik · on Aug 22, 2016

I am familiar with both, though with ATS a bit more than Murcury. Thanks! I will have to spend some more time with them...

PeCaN · on Aug 22, 2016

The great part about ATS that I wish Rust had is that you can define linear types for C libraries, and in general the type system is strong enough that you don't need unsafe{} sections.

See e.g. https://bluishcoder.co.nz/2010/06/02/safer-c-code-using-ats....

That said, thanks for your work on Rust! Count me a big fan of the language.

dbaupp · on Aug 22, 2016

You can do exactly the same thing in Rust, just not in the same statement as importing the functions (which are just that, importing the functions). I regard this as one of the most powerful parts of Rust: wrapping unsafe code/APIs into safe interfaces without cost.

Also, I think saying that the ATS has no unsafe{} sections is misleading: it isn't explicitly marked in the source, but the compiler still cannot check the "ownership" annotations in the imports are correct, or that, say, the preconditions of the functions (which may lead to undefined behaviour when violated) are satisfied. In other words, all of that code is implicitly surrounded in an `unsafe` block.

A Rust API designed for people to use will also not have unsafe code, e.g. https://blog.rust-lang.org/2015/04/24/Rust-Once-Run-Everywhe...

(The linearity is essentially handled by destructors: the common case is the clean-up is just that, clean-up, and so destructors work well. It is definitely more annoying to 100%-type-check APIs that have more interesting clean-up/closing procedures but these are rarer.)

doublec · on Aug 22, 2016

I like this one too as an example of what can be done to make a C FFI safe: https://bluishcoder.co.nz/2012/08/30/safer-handling-of-c-mem...

You end up with a definition that checks at compile time that:

* We don't exceed the memory bounds of the source buffer

* We don't exceed the memory bounds of the destination buffer

* The destination buffer is at least the minimium size required by the function documentation

* We can't treat the destination buffer as a string if the function fails

* We can't treat the destination buffer as an array of bytes if the function succeeds

* Off by one errors due to null terminator handling are removed

* Checking to see if the function call failed is enforced

And there's no overhead at runtime.

vanderZwan · on Aug 21, 2016

> You'd imagine that flexibility is what lets you do the things you need to do to go fast, but in many cases, restrictions actually are.

Aren't you mixing up flexibility for the programmer with flexibility for the compiler?

The more restrictions the programmer has, the more flexibility the compiler has to change instructions and still produce the same results.

steveklabnik · on Aug 22, 2016

Sure. But I think a lot of people do, and it colors the way we look at programming languages.

PeCaN · on Aug 22, 2016

I think people tend to think that C lets you go fast because of the tricks it lets you get away with and how “close to the metal” you are. Which is partially correct, but C is also an obnoxiously hard language to optimize because of the flexibility.

steveklabnik · on Aug 22, 2016

Exactly.

infinite8s · on Aug 21, 2016

Yeah, see the lack of aliasing in fortran as an example of a restriction that allows the compiler to generate optimal code.

mafribe · on Aug 22, 2016

Thanks, that's interesting. Linear types are useful, undoubtably because they can reduce copying. What kind of scheduling does Pony use?

59nadir · on Aug 22, 2016

As per http://tutorial.ponylang.org/gotchas/scheduling.html it's a cooperative scheduler, which is disappointing.

mafribe · on Aug 22, 2016

That means BEAM - Pony is not a fair comparison, because cooperative schedulers solve a much simpler algorithmic problem.

rurban · on Aug 21, 2016

pony is compiled to native code, and to highly optimized code. beam is just interpreted. even if most of the time the code has to wait, it has to wait much less. that's why pony can beat C++ with OpenMP in comparable tasks.

It's garbage collector is superior, having to do much less work than in beam.

The data workload for each thread is much smaller. objects are tiny, messages are mostly referenced (shared) and not copied.

erlang does more. it already supports distributed actors, so there's a little overhead also.

PeCaN · on Aug 22, 2016

Curious, how is Pony's GC superior to Erlang? Erlang's GC is already pretty quick (since the heaps are very small).

doublec · on Aug 22, 2016

There's a thread here from a previous discussion: https://news.ycombinator.com/item?id=9483071

One nice thing about the Pony GC is it can GC actors that are waiting for messages but will never receive one. In some other languages I've used I end up with processes idling in a receive on channel living forever but never able to exit because nothing is going to put a message in the channel.