Most message passing on BEAM is done via copying. Pony can share the memory directly because the compiler enforces safe sharing via deny capabilities. You can only send 3 basic types to another actor:
iso -> mutable memory that you have the sole reference to
val -> immutable reference. you can read but you can't write
tag -> an opaque reference. good for sending messages to actors or doing identity comparisons
The type system certainly helps as well. It allows us to give LLVM hints on optimizations it can make. The type system also helps because we can do "dangerous but fast" things safely because we can prove they are safe in this instance.
Compiling to native code via LLVM is another area for performance wins.
That said, Erlang has over 20 years of rock solid production usage behind it and that unto itself is a quite a selling point. At this point in time, I'd suggest Pony to Erlang/Elixir users if they need more performance, otherwise, I'd stick with Erlang for now.
> The type system also helps because we can do "dangerous but fast"
> things safely because we can prove they are safe in this instance.
This is an area of some static type systems that I'm really interested in; it feels counter-intuitive at first. You'd imagine that flexibility is what lets you do the things you need to do to go fast, but in many cases, restrictions actually are. Cool stuff.
Absolutely. In modern JavaScript engines, for example, dynamic typing makes something as basic as |foo.bar = 7| extremely complicated internally. In SpiderMonkey, |foo| could be a native object with |bar| in some varying location on a fixed or dynamic slot, or it could involve a proxy, a setter, unboxed or unboxed expando object, DOM or cross-compartment wrapper... To make the whole thing efficient, a particular property access could go through engine code, specialized JIT-generated native code after enough type information has been collected, or one of three inline cache systems, which generate multiple native code stubs |switch|ed on checks based on previous (slower) executions. A given get or set could even pass through more than one of the above, if too many checks fail and bailouts are required. And |bar| could be located directly on |foo| or on some object in |foo|'s prototype chain, requiring on-the-fly verification of additional invariants to ensure correctness.
Static typing would mean the engine can know for sure what |foo| is and where to look for |bar|, allowing faster, guaranteed-correct code to be emitted ahead of time. Dynamic typing makes it harder to offer speed, correctness, security, and good memory usage all at once.
If you think that's interesting, you might want to check out ATS¹ and Mercury². ATS is wicked fast and doesn't even do some of the optimizations it's theoretically capable of (I think its alias analysis is fairly primitive). It compiles to C, but can use type information to remove bounds checks in many cases. Linear types mean memory and concurrency safety with no runtime overhead. (You're on the Rust team, right? So I suppose you're familiar with linear types—ATS's are much more powerful than Rust's affine types though.)
Mercury has uniqueness types, so can be remain referentially transparent while compiling to code that mutates. The compiler has fairly advanced automatic parallelization and can in some cases do compile-time garbage collection (i.e. it knows at compile time when an object will become inaccessible).
The great part about ATS that I wish Rust had is that you can define linear types for C libraries, and in general the type system is strong enough that you don't need unsafe{} sections.
You can do exactly the same thing in Rust, just not in the same statement as importing the functions (which are just that, importing the functions). I regard this as one of the most powerful parts of Rust: wrapping unsafe code/APIs into safe interfaces without cost.
Also, I think saying that the ATS has no unsafe{} sections is misleading: it isn't explicitly marked in the source, but the compiler still cannot check the "ownership" annotations in the imports are correct, or that, say, the preconditions of the functions (which may lead to undefined behaviour when violated) are satisfied. In other words, all of that code is implicitly surrounded in an `unsafe` block.
(The linearity is essentially handled by destructors: the common case is the clean-up is just that, clean-up, and so destructors work well. It is definitely more annoying to 100%-type-check APIs that have more interesting clean-up/closing procedures but these are rarer.)
I think people tend to think that C lets you go fast because of the tricks it lets you get away with and how “close to the metal” you are. Which is partially correct, but C is also an obnoxiously hard language to optimize because of the flexibility.
pony is compiled to native code, and to highly optimized code.
beam is just interpreted. even if most of the time the code has to wait, it has to wait much less. that's why pony can beat C++ with OpenMP in comparable tasks.
It's garbage collector is superior, having to do much less work than in beam.
The data workload for each thread is much smaller. objects are tiny, messages are mostly referenced (shared) and not copied.
erlang does more. it already supports distributed actors, so there's a little overhead also.
One nice thing about the Pony GC is it can GC actors that are waiting for messages but will never receive one. In some other languages I've used I end up with processes idling in a receive on channel living forever but never able to exit because nothing is going to put a message in the channel.