Hacker News new | past | comments | ask | show | jobs | submit login
Erlang's not about lightweight processes and message passing (2023) (stevana.github.io)
320 points by todsacerdoti 1 day ago | hide | past | favorite | 185 comments





The amazing thing about Erlang and the BEAM is it's depth of features. To the OP the Behaviour/Interface of Erlang is their biggest take away. For me I believe it is how you require far far less development resources to build complex systems than you would require in any other language (provided comparable experience in both stacks). And for many the lightweight processes and programming model.

OTP itself has so much in it. We've been working on compiling Elixir to run on iOS devices. Not only can we do that through the release process but through using the ei library provided in Erlang we can compile a Node in C that will interface with any other Erlang node over a typical distributed network as you would for Erlang, Elixir, Gleam, etc... furthermore there is a rpc library in Erlang where from C we can make function calls and interface with our Elixir application. Yes, the encoding/decoding has an overhead and FFI would be faster but we're still way within our latency budget and we got this stood up in a few days without even have heard of it before.

The larger point here is that Erlang has been solving many of the problems that modern tech stacks are struggling with and it has solved for scale and implementation cost and it solved these problems decades ago. I know HN has a bit of a click-bait love relationship with Erlang/Elixir but it hasn't translated over to adoption and there are companies that are just burning money trying to do what you get out of the box for free with the Erlang stack.


I went from a company that used Elixir in the backend to one that uses Nodejs.

I had gone in neutral about Nodejs, having never really used it much.

These projects I worked on were backend data pipeline that did not even process that much data. And yet somehow, it was incredibly difficult to isolate exactly the main bug. Along the way, I found out all sorts of things about Nodejs and when I compare it with Elixir/Erlang/OTP, I came to the conclusion that Node.js is unreliable by design.

Don't get me wrong. I've done a lot of Ruby work before, and I've messed with Python. Many current-generation language platforms are struggling with building reliable distributed systems, things that the BEAM VM and OTP platform had already figured out.


Elixir never performs all to well in microbenchmarks. Yet in every application I've seen Elixir/Erlang projects compared to more standard Node, Python, or even C# projects and the Elixir one generally has way better performance and feels much faster even under load.

Personally I think much of it is due to async being predominant in Node and python. Async seems much harder than actor or even threading for debugging performance issues. Sure it feels easier to do async at first. But async leads to small bloat adding up and makes it very difficult to debug and track down. It makes profiling harder, etc.

In BEAM, every actor has its own queue. It's trivial to inspect and analyze performance blockages. Async by contrast puts everything into one giant processing queue. Plus every function call in async gets extra overhead added. It all adds up.


This has to do with how async works without preemption and resource limits.

There's a counter-intuitive thing when trying to balance load across resources: applying resource limits helps the system run better overall.

One example: when scaling a web app, there comes a point when scaling up the database doesn't seem to help. So we're tempted to increase the connection pool because that looks like a bottleneck. Increasing the pool can make the overall system perform worse, because often times, it is slow queries and poorly performing queries that is stopping up the system.

Another example: one of the systems I worked on has over 250 node runtimes running on a single, large server. It used pm2 and did not apply cgroups to limit CPU resources. The whole system was a hog, and I temporarily fixed it by consolidating things to run on about 50 node runtimes.

When I moved them over to Kubernetes, I also applied CPU resource limit, each in its own pod. I set the limits based on what I measured when they were all running on PM2 ... but the same code running on Kubernetes ran with 10x less CPU overall. Why? Because the async code were not allowed to just run grabbing as much CPU as it can for as long as it can, and the kernel scheduler was able to fairly run. That allowed the entire system to run with less resources overall.

There's probably some math that folks who know Operations Research can prove all this.


> When I moved them over to Kubernetes, I also applied CPU resource limit, each in its own pod. I set the limits based on what I measured when they were all running on PM2 ... but the same code running on Kubernetes ran with 10x less CPU overall. Why? Because the async code were not allowed to just run grabbing as much CPU as it can for as long as it can, and the kernel scheduler was able to fairly run. That allowed the entire system to run with less resources overall.

As someone who has advocated against Kubernetes CPU limits everywhere I've worked, I'm really struggling to see how they helped you here. The code used 10x less CPU with CPU limits, with no adverse effects? What were all those CPU cycles going before?


The adverse effect is higher latency, as the execution gets delayed due to throttling.

Parent probably didn't care about latency.


> The code used 10x less CPU with CPU limits, with no adverse effects?

The normal situation is that defective situations get a much large latency, while the correct requests run much faster.

It's a problem on the cases when the first set isn't actually defective. But it normally takes a reevaluation of the entire thing to solve those, and the non-limited situation isn't any good either.


> Async by contrast puts everything into one giant processing queue

I don't know about node but C# has async contexts you can use .


...which is so much harder than actors.

> Async by contrast puts everything into one giant processing queue

How can you make performance claims while getting the details completely wrong?

Neither .NET's nor Rust's Tokio async implementations work this way. They use all available cores (unless overridden) and implement work-stealing threadpool. .NET in addition uses hill-climbing and cooperative blocking detection mechanism to quickly adapt to workloads and ensure optimal throughput. All that while spending 0.1x CPU on computation when compared to BEAM, and having much lower memory footprint. You cannot compare Erlang/Elixir with top of the line compiled languages.


That sounds about right for .NET. One of the Elixir projects I worked on lived alongside a C# .NET, the latter being a game server backend. The guy who architect and implemented it made it so that large numbers of people can interact in realtime without having to shard. It is pretty amazing stuff in my book.

On the other hand, I have yet to have to implement a liveness probe with an Elixir app, and I've had to do that with .NET because it can and does freeze. That game server also didn't use up all the available cores as well as the Elixir app. We also couldn't attach a REPL directly to the .NET app, though we certainly tried.

I would be curious to see if Rust works out better in production.


I think you read my reply incorrectly. Also, would you attach “repl” to your C++/Rust (or, God forbid, Go) application?

Sigh. I swear, the affliction of failing to understand the underlying concepts upon which a technology A or B is built is a plague upon our industry. Instead, everything clearly must fit into the concepts limited to whatever “mother tongue” language a particular developer has mastered.


> I swear, the affliction of failing to understand the underlying concepts upon which a technology A or B is built is a plague upon our industry. Instead, everything clearly must fit into the concepts limited to whatever “mother tongue” language a particular developer has mastered.

Ironic, since any time you post about a programming language it's to inform that C# does it better.

Not just here; someone with your nick also whined when the creator of C# made a technical deficient decision when choosing Go over C# to implement typescript.

It's hard for a rational person to believe that someone would make the argument that the creator of the language must have made a mistake just because he reached for (in his words) a more appropriate language in that context.

You have a blind spot when it comes to C#. You also probably already know it.


> Not just here; someone with your nick also whined when the creator of C# made a technical deficient decision when choosing Go over C# to implement typescript.

You know you could have just linked the reply instead? It states "C#, F# or Rust". But that wouldn't sound that nice, would it? I use and enjoy multiple programming languages and it helps me in day-to-day tasks greatly. It does not prevent me from seeing how .NET has flaws, but holistically it is way less bad than most other options on the market, including Erlang, Go, C or what have you.

> It's hard for a rational person to believe that someone would make the argument that the creator of the language must have made a mistake just because he reached for (in his words) a more appropriate language in that context.

So appeal to authority trumps observable consequences, technical limitations and arguments made about lackluster technical vision at microsoft? Interesting. No, I think it is the kind of people who refuse to engage with the subject on their own merits that are a problem, relegating to the powers that be all the argumentation. Even in a team environment, sure it is easier to say "a team/person X makes a choice Y" but you could also, if the situation warrants it, expand on why you think this way, and if you can't maybe you shouldn't be making a statement?

So no, "TypeScript, including Anders Hejlsberg, choosing Go as the language to port TS compiler to" does not suddenly make pigs fly, if anything, but being seen as an endorsement from key C# figure is certainly a bad look.


> So appeal to authority trumps observable consequences, technical limitations and arguments made about lackluster technical vision at microsoft?

Your argument is that you have a better grasp of "technical limitations" than Anders Hejlsberg?

You'll forgive the rest of us for not buying that; he has proven his chops, you haven't, especially as the argument (quite a thorough explanation of the context) from the typescript team is a lot more convincing than anything we've seen from you (a few nebulous phrases about technical superiority).

> but being seen as an endorsement from key C# figure is certainly a bad look.

Yeah, well, the team made their decision with no regard to optics. That lends more weight to their decision, not less.


What are you saying?

> Neither .NET's nor Rust's Tokio async implementations work this way.

Well that’s great. I didn’t mention Rust in that list because it does seem to perform well. Its async is also known as to be much more difficult to program.

> and having much lower memory footprint. You cannot compare Erlang/Elixir with top of the line compiled languages.

And yet I do and have. Despite all the cool tech for C# and .Net, I’ve seen simple C# web apps struggle to even run on Raspberry pi’s for IoT projects while Elixir ones run very well.

Also note Elixir is a compiled language and BEAM has JIT nowadays too.

I did hesitate to add C# to that list because it is an impressive language and can perform well. I also know the least about its async.

Nothing you said really counters that async as a general paradigm is more likely to lead to worse performance. It’s still more difficult to profile and tune than other techniques even with M:N schedulers. Look at the sibling post talking about resource allocation.

Even for Rust there was a HM post recently where they got a Rust service to run a fair bit faster than their initial Golang implementation. After months of extra work that is. They mentioned that Golang’s programming model made it much easier to write fairly performant networking code for. Since Go doesn’t use async it seems reasonable to assume go routines are easier to profile and track than async even if I lack knowledge of Go’s implementation details on the matter. Now I am assuming their Rust implementation used async but don’t know for sure.


> Also note Elixir is a compiled language and BEAM has JIT nowadays too.

Let's see it perform faster than Python first :)

Also, if the target is supported, .NET is going to unconditionally perform faster than Elixir. This is trivially provable.

> Nothing you said really counters that async as a general paradigm is more likely to lead to worse performance. It’s still more difficult to profile and tune than other techniques even with M:N schedulers. Look at the sibling post talking about resource allocation.

Can you provide any reference to support this claim as far as actually good implementations go? Because so far it looks like vibe-based reasoning with zero knowledge to substantiate the opinion presented as fact.

That's not surprising however - Erlang and Elixir as languages tend to leave their heavy users with big knowledge and understanding gaps and their communities are rather dogmatic about BEAM being the best next thing since sliced bread. Lack of critical thinking leads to such a sorry place.


> Can you provide any reference to support this claim as far as actually good implementations go?

Ah yes now to the No True Scotsman fallacy. Async only works well when it’s “properly implemented” which is only .NET.

Even some .NET folks prefer actors model for concurrent programming:

> Orleans is the most underrated technology out there. Not only does it power many Azure products and services, it is also the design basis for Microsoft Service Fabric actors, which also power many Azure products. Virtual actors are the perfect solution for today’s distributed systems.

> In my experience Orleans was able to handle insane write load (our storage/persistence provider went to a queue instead of direct, it was eventually consistent) so we were able to process millions of requests without breaking a sweat. Perhaps others would want more durability, we opted for this as the data was also in a time series database before Orleans saw it.

https://www.reddit.com/r/dotnet/comments/16kk2l1/comment/k0x...

https://learn.microsoft.com/en-us/dotnet/orleans/benefits

Ironically what got me into Elixir was learning about Orleans and how successful it was in scaling XBox services.

> Because so far it looks like vibe-based reasoning with zero knowledge to substantiate the opinion presented as fact.

Aside from personal experience and years of writing and deploying performance sensitive IoT apps?

Well quick googling shows quite a few posts detailing async issues:

> What tools and techniques might be suited for this kind of analysis? I took a quick glance at a flamegraph but it seems like I would need a relatively deep understanding of the async runtime internals since most of what I see looks like implementation details.

https://www.reddit.com/r/rust/comments/uph4tf/profiling_with...

> Reading a 1GB file in 100-byte chunks leads to at least 10,000,000 IOs through three async call layers. The problem becomes catastrophic since these functions are essentially language-level abstractions of callbacks, lacking optimizations that come with their async nature. However, we can manually implement optimizations to alleviate this issue.

https://www.ajanibilby.com/blog/async-js-performance-apr23/

> Asynchronous Rust seems to perform worst than multi-threaded Rust implementations.

https://dev.to/deepu105/concurrency-in-modern-programming-la...

> Under realistic conditions (see below) asynchronous web frameworks are slightly worse throughput (requests/second) and much worse latency variance.

https://calpaterson.com/async-python-is-not-faster.html

> I’m not going to say all async frameworks are definitely slower than threads. What I can say confidently is that asyncio isn’t faster, and it’s more efficient only for huge numbers of mostly idle connections. And only for that.

https://emptysqua.re/blog/why-should-async-get-all-the-love/

https://users.rust-lang.org/t/my-benchmark-done-elixir-is-fa...

https://blog.blackfire.io/the-challenges-of-async-python-obs...


Do you realize that actor model and virtual/green threads/stackful coroutines vs stackless coroutines / async/await and similar are orthogonal concepts?

Also picking asyncio from Python. Lol. You can't be serious, can you?

The only impression I get is most Elixir/Erlang practicioners simply have very ossified perception and deep biases that prevent them from evaluating implementation/design choices fairly and reaching balanced conclusions on where their capabilities lie. Very far cry from the link salad you posted that does not answer my question e.g. the issues with .NET and Rust async implementations performance-wise.

It's impossible to have a conversation with someone deeply committed to their bias and unwilling to accept that BEAM is not the shining paragon of concurrent and multi-threaded runtimes it once was.


I'd appreciate an in-depth write-up about deficiencies you found in Node and how Erlang fixes them

Starting with the most general: Nodejs suffers in the same way that other async systems do -- the lack of preemption means that certain async threads can starve other async threads. You can see this on GUI desktop apps when the GUI freezes because it wasn't written in a way to take that into account.

In other words, the runtime feature that Nodejs is the most proud of and markets to the world as its main advantage does not scale well in a reliable way.

The BEAM runtime has preemption and will degrade in performance much more gracefully. In most situations, because of preemption (and hot code reloading) you still have a chance for attaching a REPL to the live runtime while under load. That allows someone to understand the live environment and maybe even hot patch the live code until a the real fix can run through the continuous delivery system.

I'm not going to go into the bad Javascript syntax bloopers that still haunts us, and only partially mitigated by Typescript. That is documented in "Javascript: The Good Parts". Or how the "async" keyword colors function calls, forcing everything in a call chain to also be async, or forcing you to use the older callbacks. Most people I talk to who love Typescript don't consider those as issues.

The _main_ problems are:

1. Async threads can easily get orphaned in Nodejs. This doesn't happen when using OTP on BEAM because you typically start a gen_server (or a gen_*) under a supervisor. Even processes that are not supervised can be tracked. Because pids (identifiers to processes) are first-class primitives, you can always access the scheduler which will tell you _all_ of the running processes. If you were to attach a Nodejs REPL, you can't really tell. This is because there is no encapsulation of the process, no way to track when something went async, no way to send control messages to those async processes.

2. Because async threads are easily orphaned, errors that get thrown gets easily lost. The response I get from people who love Typescript on Nodejs tells me that is what the linter is for. That is, we're going to use an external tool to enforce all errors gets handled, rather than having the design of the language and the runtime handle the error. In the BEAM runtime, unhandled errors within the process crashes the process, without crashing anything else; processes that are monitoring that process that crashed gets notified by the runtime that it has crashed. The engineer can then define the logic for handling that crash (retry? restart? throw an error?).

3. The gen_server behavior in OTP defines ways to send control messages. This allows more nuanced approaches to managing subsystems than just restarting when things crash.

I'm pretty much at the point where I would not really want to work on deploying Nodejs on the backend. I don't see how something like Deno would fix anything. Typescript is incapable of fixing this, because these are design flaws in the runtime itself.


Just to further hammer point 2 and how it’s a problem in the real world, Express, probably the go to server library for close to a decade, has only within the last couple months sorted out not completely swallowing any error that happens in async middleware by default. And only because some new people came in to finally fix it! It’s absolutely insane how long that took and how easy it was to get stung by that issue.

I'm not a Nodejs developer. Could someone please explain, in technical terms, what it means for a thread to "get orphaned"?

Bandwidth vs latency. Erlang is designed to keep low latency under load with graceful degradation.

The problem with Node is observability. They've optimized away observability to where it's hard to find performance problems compared to the JVM to Beam.

I have been looking for an Erlang thing akin to Apache Airflow or Argo Workflows. Something that allows me to define a DAG of processes, so that they run one after the other. How would you implement something like that?

Have a look at GenStage, flow, Broadway, Oban pro in elixir land. But OTP alone can get you pretty far.

>Node.js is unreliable by design

Well, it's just a hack and some C libraries on top of a browser Javascript engine.

No big thought went into it, either before or after it got big.


That’s not fair.

Adding to this, the primitives erlang, and descendants, give you are very easy to work with, and therefore very easy to test.

Take GenServer. The workhorse of most BEAM systems. Everything it does it basically just calling various functions with simple parameters. So you can test it just by call l calling those functions and manually passing parameters to it, and asserting on its output. No need to set up complex testing systems that are capable of dealing with asynchronous code, no need to handle pauses and wait for coffee to finish running in your tests. It's something a lot of juniors tend to miss, but it's liberating when figured out


C nodes are under appreciated. We have one (Cgo) for communicating between Go and Elixir services running in the same Kubernetes pod. The docs are also pretty good for Erlang and its C libs.

> I know HN has a bit of a click-bait love relationship with Erlang/Elixir but it hasn't translated over to adoption and there are companies that are just burning money trying to do what you get out of the box for free with the Erlang stack.

Do you or the community have a sense why that is?


Elixir is "bad" because it is not a friendly language for people who want to be architecture astronauts at the code level (you can definitely be an architecture astronaut at the process management level but that's a very advanced concept). And a lot of CTOs are architecture astronauts.

That's the opposite of my experience. I tend to get those "architect astronauts" in teams using other languages platforms, and the folks I work with Erlang or Elixir tend to be pragmatic and willing to dig down the stack to troubleshoot problems.

That's what I wrote! (Read the first three words with a heap of sardonicism). Edited to add quotes around bad

Apologies for my ignorance but what's an "architecture astronaut"?

Here's the original article: https://www.joelonsoftware.com/2001/04/21/dont-let-architect...

> When you go too far up, abstraction-wise, you run out of oxygen. Sometimes smart thinkers just don’t know when to stop, and they create these absurd, all-encompassing, high-level pictures of the universe that are all good and fine, but don’t actually mean anything at all.

> These are the people I call Architecture Astronauts. It’s very hard to get them to write code or design programs, because they won’t stop thinking about Architecture. They’re astronauts because they are above the oxygen level, I don’t know how they’re breathing. They tend to work for really big companies that can afford to have lots of unproductive people with really advanced degrees that don’t contribute to the bottom line.


Joel was wrong about one thing, they also work at startups. My roommate worked at a startup where the senior frontend developer was basically building react in svelte + zod. Once a week he would see all his work deleted and completely rewritten in a fever dream PR that the senior produced. Completely impossible for grug developer to follow what's going on, his job eventually became "running this guy's code through chatgpt and adding comments and documentation".

Not just that, but there is no giant gorilla backing BEAM. Google pushes Go and Java, Microsoft node and c#

My personal opinion as a fan and adopter of the stack is that the benefit is often seen down the line, with the upfront adoption cost being roughly the same.

E.g. the built in telemetry system is fantastic, but when you are first adopting the stack it still takes a day or two to read the docs and get events flowing into - say - DataDog, which is roughly the same amount of time as basically every other solution.

The benefit of Elixir here is that the telemetry stack is very standardized across Elixir projects and libraries, and there are fewer moving pieces - no extra microservices or docker containers to ship with everything else. But that benefit comes 2 years down the line when you need to change the telemetry system.


There's no killer app, as in a reason to add it to your tech stack.

The closest I've come across was trying to maintain an ejabberd cluster and add some custom extensions.

Between mnesia and the learning curve of the language itself, it was not fun.

There are also no popular syntax-alikes. There is no massive corporation pushing Erlang either directly or indirectly through success. Supposedly Erlang breeds success but it's referred to as a "secret" weapon because no one big is pushing it.

Erlang seems neat but it feels like you need to take a leap of faith and businesses are risk averse.


> There is no massive corporation pushing Erlang either directly or indirectly through success.

Isn't there this "small" company that has a chat app that is using erlang :P


Well jayd did the same thing as that small company (which I joined in 2011 when it was small and left in 2019 when it was not so small), run ejabberd to solve a problem. In our case, Erlang subsumed pretty much the rest of our service over time. When I started, chat was Erlang, but status messages, registration, and contacts were PHP with MySQL, media was PHP (with no database), but those all got sucked into Erlang with mnesia because it was better for us.

But I guess it doesn't always work that way. FB chat was built on ejabberd and then migrated away.


Well, if we're talking medium-size companies - hard to bring any new language.

If we're talking pure modern-tech company - good luck bringing anything other than JS because "more developers == more growth" mentality.

So it's either end up being used where decision makers know/want-to-learn Erlang/Elixir or when all other possiblity was exhausted.


These incremental benefits don't translate to an order of magnitude more productivity, or stability, or profitability. Given the choice, as a business owner, future proofing is about being able to draw from the most plentiful and cheapest pool of workers. The sausage all looks the same on the outside.

That is not true, especially with Section 174 (for the US). Right now, if you want to hire an Elixir engineer, you're better off finding a generalist willing to learn and use Elixir, and you would probably get someone who is very capable.

With Section 174 in play in the US, it tends to drive companies hiring specialists and attempting to use AI for the rest of it.

My own experience is that ... I don't really want to draw from the most plentiful and cheapest pool of workers. I've seen the kind of tech that produces. You basically have a small handful of software engineers carrying the rest.

Elixir itself is a kind of secret, unfair advantage for tech startups that uses it.


>you're better off finding a generalist willing to learn and use Elixir, and you would probably get someone who is very capable.

This is a thing I really don't get. People are like "but what about the hiring pool". A competent software engineer will learn your stack. It's not that hard to switch languages. Except maybe going from Python to C++.


I'm biased, because I worked at WhatsApp, but it may be one of the most famous users of Erlang... and from its start until when I left (late 2019) I think we only hired three people with Erlang experience. Everyone else who worked in Erlang learned on the job.

We seemed to do pretty well, although some of our code/setup wasn't very idiomatic (for example, I'm pretty sure we didn't use the Erlang release feature properly at all)


After you've done releases a few times, it ends up being quite easy. The biggest issues I had was complete release (aka erlang itself) updates.

Admittedly, I didn't have a whole company core product riding on my upgrades.


We just pushed code, compiled, and hotloaded... Pretty much ignoring the release files; we had them, but I think the contents weren't correct and we never changed the release numbers, etc.

For otp updates, we would shutdown beam in an orderly fashion, replace the files, and start again. (Potentially installing the new one before shutting down, I can't remember).

Post facebook, more of boring OS packages and slow rollouts than hotloading.


For a lot of people learning a new stack is a big perk to switching jobs.

Erlang looks weird--Prolog-base, tail recursive loops, extensive pattern matching.

Also, a lot of the power of Erlang is the OTP (Open Telecom Platform) even more than Erlang, itself. You have to internalize those architectural decisions (expect crashes--do fast restart) to get the full power of Erlang.

Elixir seems like it has been finding more traction by looking more like mainstream languages. In addition, languages on the BEAM (like Elixir) made the BEAM much better documented, understood and portable.


Eventually, you use Elixir enough, and Erlang starts looking pretty.

My line is, the three things Elixir has over Erlang is protocols, macros, and a syntax that doesn't summon Cthulhu.

Are you really programming if the constant threat of the old one isnt looming just beyond your vision ?

Anyway, the options seem to be either summoning transcendent threats by superficial syntax or by well entrenched semantics. There seems to be no other choice.

I’ve worked with a few individuals, mostly managers, who intended to write books informed by our experiences. It was always frustrating for me to see that we disagreed about what aspects of our work made us successful. There was always something they minimized as being nice that I felt was essential.

And here we see someone claiming that lightweight processes and message passing aren’t the secret sauce, missing that Erlang as Communicating Sequential Processes is indivisible from those qualities, and then repeatedly mentioning CSP as part of the secret sauce.

Examples:

> The application programmer writes sequential code, all concurrency is hidden away in the behaviour;

> Easier for new team members to get started: business logic is sequential, similar structure that they might have seen before elsewhere;

> Supervisors and the “let it crash” philosophy, appear to produce reliable systems. Joe uses the Ericsson AXD301 telephone switch example again (p. 191):

Behaviors are interesting and solve a commonly encountered problem in the 80’s that was still being solved in some cases in the 00’s, but it’s a means as much as an end in Erlang. It’s how they implemented those other qualities. But I don’t know if they had to, to make Erlang still mostly be Erlang.


Erlang isn't CSP, it's the Actor model. https://en.wikipedia.org/wiki/Actor_model

CSP is what inspired the golang channels, via occam and some other languages. The whole synchronization on unbuffered channels is the most obvious differentiator, though there are others like the actor concept of pattern matching over a mailbox.

The whole CSP vs actor debate is quite interesting when you get down to it because they superficially look kind of similar but are radically different in implications.


Watch the guy who came up with Erlang, the Actor model and CSP discuss it: https://youtu.be/37wFVVVZlVU One of my favorite videos on youtube.

Totally! I love the Alan Kay and Armstrong one too https://www.youtube.com/watch?v=fhOHn9TClXY

There are a lot of languages that now claim to be 'Actor Model' and have only a shade on Erlang's fault tolerance and load balancing. That term no longer has the gravitas it once had.

The actor model in general doesn't really care about fault tolerance in the way that erlang does.

And this would be part of that “minimized as being nice that I felt was essential” thing I mentioned in my original comment.

sure i would argue that Erlang isn't really actor model, because the error handling was the first priority and that is what drove them to build processes with message passing as the primitive -- so it just happened to look like actor model.

Is Erlang considered CSP? I've always thought it wasn't really, and had its own thing called 'actors' which are id'd and can communicate directly, vs CSP which are anonymous and use channel messaging.

I've always thought the actor model made more sense, but highly YMMV.


The erlang docs only go as far as saying it’s functionally similar to CSP.

I think the term Actor Model has been so semantically diluted at this point that the phrase also understates what Erlang has as well.

Neither CSP nor AM require process isolation to work, which means they can work when they work but fail much much worse. They are necessary but insufficient.


It's like saying they are both Turing Complete or that SML modules and Haskell typeclasses are functionally equivalent (even though their use in practice is quite different).

Aactors must always have a known address to be accessed and you share them by sharing addresses. You also wouldn't pass an actor to an actor and you'd pass an address instead. CSP channels are first-class. You can create anonymous channels and even pass channels through other channels. This is similar to languages with lambdas and first-class functions vs other languages where every function has a name and functions cannot be passed to other functions.

Actors are naturally async-only and (for example) make no attempt to solve the two generals problem while CSP implementations generally try to enforce synchronization. CSP also enforces message order while actors don't guarantee that messages will be received in the order they were sent.

These are all more theoretical than actual though. CSP channels may be anonymous to the programmer, but they all get process IDs just like Actors would. Actors may seem async, but they can (and no doubt do in practice) make stronger guarantees about message order and synchronicity when on the same CPU. Likewise, CSP would give the illusion of synchronicity and ordering across CPUs where none actually exists (just like TCP).


And what you’re saying is like saying that learning every distinct discipline is a unique experience. All knowledge was discovered by humans (even if by discovering it from nature), and all pedagogy addresses the same type(s) of brains. All disciplines contend with the same universal laws of physics. There is a lot less difference than people think between learning the flute and learning woodworking, even though the flautist may be much more discouraged from bringing their own hand-made gadgets to work.

But yes, many solutions are isomorphic because they are dealing with the same information on the same Turing machine. That doesn’t mean it’s stupid to bring it up, but it can mean that there’s a less upside to switching solutions than people think.

The Chesterton’s Fence here though is that you can implement the Actor Model without the BEAM’s process isolation, and the supervisor tree that goes with it. If you insist on doing so, which several languages have, then the finer distinctions between CSP and Actor pale in comparison to the BEAM.


Managers make up their own narrative based on vibes.

I came here looking for information about why Ericsson stopped using Erlang, and for more information about Joe's firing.

The short answer seems to be that they pivoted to Java for new projects, which marginalized Erlang. Then Joe and colleagues formed Bluetail in 1998. They were bought by Nortel. Nortel was a telecom giant forming about a third of the value of the Toronto Stock Exchange. In 2000 Nortel's stock reached $125 per share, but by 2002 the stock had gone down to less than $1. This was all part of the dot com crash, and Nortel was hit particularly hard because of the dot com bubble burst corresponding with a big downturn in telecom spending.

It seems safe to look at Joe's layoff as more of a "his unit was the first to slip beneath the waves on a sinking ship" situation, as they laid off 60,000 employees or more than two thirds of their workforce. The layoff was not a sign that he may not have been pulling his weight. It was part of a big move of desperation not to be taken as a sign of the ineffectiveness of that business unit.


It's very weird to me to see the word "fired" in this context. "Laid off" is more appropriate. "Fired" is very value-laden and implies fault and termination with cause. Which I'm sure if that was somehow actually true the original article author would know nothing about, nor would it be any of their business.

I've just gotten back into Erlang becuase of the lightweight processes and message passing, so far behaviour has been secondary (i.e. just learning about them)!

The project is about bring visual Flow Based Programming(FBP)[1] to Erlang. FBP seems to be made for Erlang and I was surprised there was something already but there does not seem to be.

My goto tool for FBP is Node-RED and hence the basic idea is to bolt a Node-RED frontend on to an Erlang backend and to have every node being a process. Node-REDs frontend is great for modelling message passing between nodes, hence there is a very simply one-to-one mapping to Erlangs processes and messages.

I've implemented some basics and started to create some unit tests as flows to slowly build up functionality. I would really like this to be 100% compatiable to Node-RED the NodeJS backend. For more details, the github repo --> https://github.com/gorenje/erlang-red

Overall Erlang is amazingly well suited to this and astonished that no one else has done anything like this - or have they?

[1] = https://jpaulm.github.io/fbp/index.html


Oh that's really cool to see! I always thought a visual programming language on the BEAM would be fun

Love the idea as well! Would I be wrong in thinking that, at a high-level, fbp is like erlang processes where message flow is one way?

This is a really cool idea!

Thank you, it's also a lot of fun to do :)

Hopeful I can get some useful functionality together without hitting my Erlang coding limits!

Any help is greatly appreciated :+1:


To me, Erlang/Elixir’s power is not necessarily the Actor model implementation, the matching from prolog, immutability, behaviors, etc, but Joes desire to demonstrate you could do more with less.

It is a well thought out and trued system of computation that has a consistency rarely witnessed in other languages, much less the “web”. It is not perfect. But it is pretty impressive.

Unfortunately, I find the appreciation and uptake for what simplicity empowers in the software world pretty under appreciated. Complexity allows people to become specialists, managers to have big teams and lots of meetings, experts to stay experts.

Erlang was being developed in a period where companies were trying to implement software solutions with smaller headcounts, limited horsepower, etc. A multi decade outpouring of cash into the domain has made the value of “less will mean more for all of us in good ways” less of an attractor.


Alan Kay has once said that you get simplicity by choosing a slightly more complicated building block.

It appears to me that erlang does this.


Reminds me of Rich Hickey's talk about Simple VS Easy.

You've just convinced me to spend some more time with Erlang! I've dabbled a bit and, at least on the surface, prefer erlang syntax over elixir.

Me too, as weird as it might sound

I mostly prefer the Elixir syntax. But I don’t care for all of the “ends”. Wish they had taken more inspiration from Python in that department.

And I am grateful for the ends, wish Python had them as well. Python already marks the start of every block with a :, one more keyword and the semantic whitespace madness could end.

Interesting. My very personal opinion is that punctuation like characters should be used to structure/shape code, and words/tokens should be used for expressions.

I accept that others, such as yourself, have different opinions.

I think the reason that I feel the way I do is that I’ve read a lot over the years (like, non fiction), so I’ve been heavily conditioned to look for infix characters to separate and define the shape of code, and strings of words to express things. But that may be self introspective reach too far.


For me the most interesting concept in Erlang/BEAM is that partial recovery is built in from the ground up. When an unexpected state is encountered, instead of either killing the entire process or trying to proceed and risking corruption, you just roll back to a known good state, at the most granular level possible. This idea was researched many years ago under the name of "microreboots"(associated with "crash-only software"), but only Erlang/BEAM made it a first-class concept in a production system.

You still have to be careful with supervision trees and parts of the tree restarting. For example your system might work if the whole erlang operating system process is suddenly killed and restarted but your system might start corrupting data if parts of the erlang process tree is restarted. Erlang gives you a good model to work with these problems but it doesn't allow you to completely turn off your brain. If you walk in thinking that you can just let things restart and everything will be fine then you might end up getting burnt.

> You still have to be careful with supervision trees and parts of the tree restarting [...] Erlang gives you a good model to work with these problems but it doesn't allow you to completely turn off your brain.

Erlang gives architects the tools to restart as little, or as much of the tree as they like, so I hope they have their brains fully engaged when working on the infrastructure that underlies their projects. For complex projects, it's vital think long and hard about state-interactions and sub-system dependencies, but the upside for Erlang is that this infrastructure is separated from sequential code via behaviors, and if the organization is big enough, the behaviors will be owned by a dedicated infrastructure team (or person) and consumed by product teams, with clear demarcations of responsibilities.


Yes, you can design your system pathologically to make it wrong.

> When an unexpected state is encountered, instead of either killing the entire process or trying to proceed and risking corruption, you just roll back to a known good state, at the most granular level possible.

> but only Erlang/BEAM made it a first-class concept in a production system.

Exceptions?


In most languages that have exceptions you don't have the same guarantees because the values are not immutable so if they were mutated they will stay mutated. The language can roll back the stack using exceptions but it can't roll back the state.

The BEAM runtime and all languages that target it including Erlang do not allow mutation, (ETS and company excepted). This means that on the BEAM runtime you can not only roll back the stack but you can also rollback the state safely. This is part of what the poster meant by the most granular level possible.


Can you explain how exceptions (partial stack unwinding while carrying a value) do this?

Maybe their idea is that you can have a thread that processes work from a queue and catch any exceptions thrown during that processing and just continue processing other work.

Erlang, OTP, and the BEAM offer much more than just behaviours. The VM is similar to a virtual kernel with supervisor, isolated processes, and distributed mode that treats multiple (physical or virtual) machines as a single pool of resources. OTP provides numerous useful modes, such as Mnesia (database) and atomic counters/ETS tables (for caching), among others. The runtime also supports bytecode hot-reloading, a feature used to apply patches without any system downtime. While the syntax is not very screen reader-friendly, it is digestable.

Apache Mesos[1] is the only thing that comes to my mind as a similar platform to BEAM in its ability to treat multi-machine resources as a single pool.

Over a year ago, my private consulting company decided to adopt Erlang as our backend language. After some time, we started exploring BEAM's internals to, for example, replace the TCP-based stack with QUIC and integrate some Rust patches. A truly fantastic choice for lightweight and high-throughput systems that are only failing in case of kernel panic or power loss. We are currently working on very "busy", concurrent software like a film/game production tracker and pipeline manager, and are now also preparing R&D for a private hospital management services.

[1]: https://mesos.apache.org/


Before you ask, we're not going to ever fully adopt Elixir (or Gleam) as its ecosystem is built around Phoenix framework and external services/databases. We would have to maintain internal bindings/implementations of things that are unmaintained on Elixir's side. Also worth to mention that it has a large amount of syntax sugar and its users have that weird fetish for abstracting stuff into DSL interfaces.

Couldn't understand your comment well but I am making a SQLite library for Elixir (via Rust bindings) so that would be one less dependency on external systems. I happen to believe that most projects don't need a full-blown database server.

All the people from Elixir community I met, kept telling me "Mnesia sucks, use Postgres instead" through Ecto DSL of course. Same goes about pushing towards Redis and gRPC. Most of them will try to convince you to start using Phoenix instead... Also there are very little to no references on how to use e.g. Cowboy or Bandit without Plug DSL.

Bandit is coupled to Plug, so there isn't really a way to use it without Plug. But if you just don't want to use Plug.Router, you can always make your own router (with whatever perf implications that may or may not have). Plug.Router and Phoenix.Router are just middleware and Plug a middleware specification. You can do a case statement on `%Plug.Conn{}.request_path` or `.path_info`.

For Cowboy, what's wrong with the docs? Erlang translates to Elixir pretty cleanly (list comprehensions and records notwithstanding): prefix atoms with `:`, downcase variables, `%` maps, and `~c""` Erlang strings. If you're really itchy (as in the look of using atoms as modules makes you itchy), you can alias the Erlang modules as Elixir module atoms: `alias :cowboy_router, as: CowboyRouter`.


"This begs the question: why aren’t language and library designers stealing the structure behind Erlang’s behaviours, rather than copying the ideas of lightweight processes and message passing?"

Because the function signatures of Erlang's behaviors are critically tied to Erlang's other functionality, specifically its unusual use of immutability. You need a separate init call for its servers because of that, and a very distinct use of the state management to work exactly the same way.

But to achieve the same goals in other languages, you almost always shouldn't directly copy what Erlang is doing. In fact when I see "Look! I ported gen_server into $SOME_OTHER_LANGUAGE" and I see exactly and precisely the exact interface Erlang has, I know that the port doesn't deeply understand what Erlang is doing.

When I ported the idea of supervisor trees into Go [1], I did so idiomatically. It turns out in modern Go the correct interface for "a thing that can be supervised" is not precisely the same signature that Erlang has, but

    type Service interface {
        Serve(context.Context)
    }
That's all you need and all you should use... in Go. Your other language may vary. Go doesn't need a "handle_event/2" because it has channels, and you should use those, not because they are "better" or "worse" but because that's what this language does. In another language you may use something else. In another infrastructure you may end up sending things over Kafka or some cloud event bus rather than "calling a handle_event/2". The key is in building an event-based system, not copying the exact implementation Erlang has.

A peculiar issue the Erlang community has is getting excessively convinced that there's something super-mega-special about the exact way Erlang does it, and that if you do it any other way it is ipso facto wrong and therefore not reliable. This may have been true in 2005; it is not true in 2025. Where once Erlang had almost the only sensible answer, in 2025 the problem is poking through the ocean of answers deluging us! While I recommend learning from Erlang about reliable software, I strongly recommend against just blind-porting out the exact way Erlang achieves it into any other language. It is in almost any other language context the wrong answer. Even other immutable languages generally vary enough that they can't just copy the same structure.

[1]: https://jerf.org/iri/post/2930/


To follow on from your excellent post, I think a reasonable next question is, "why have these kinds of approaches and ideas in other languages and systems succeeded in gaining market adoption, but Erlang/Elixir has not?"

This to me is the most interesting question about Erlang, and I say this as someone who works professionally in Elixir.

It's _clear_ that there is incredible appetite for tools that help us design reliable concurrent systems given the wild success of things like k8s, Kafka, AWS's distributed systems products, etc., but why hasn't Erlang/Elixir been able to capture that share?

My friends and I debate this all the time, but I don't know the answer.


Talk to some engineering managers. Their concerns are hiring people to get the job done. You can't easily hire devs for obscure languages like Erlang and Elixir. If you can find any that are looking for a gig they want too much money. On the contrary, if you are hiring for C++/C#/Java/JS/TS your problem is separating good from bad candidates but good ones are available.

Likewise, most devs don't want to learn an obscure language for one job even if they are more than capable. Either they get stuck doing that language or they earn a hole in their resume instead of additional experience in what future employers care about.

Finally, the vast majority of applications and systems don't need ultra high reliability and don't have the budget for it. It isn't clear that downtime impedes success for anything but the most critical businesses.


I mean, if you learned Erlang on the job to build reliable systems with it, you don't have to put an "obscure" language on your resume. You can put "highly fault tolerant systems" on your resume, and when asked about it in an interview, you got the chops to back that claim up, while many other people don't. It is very far from a "hole" in ones CV. Any engineer worth their salt in a hiring process will recognize this. It is a matter of learning new things, instead of repeating the same experience of some NodeJS or Java CRUD over and over again. If I was in hiring work, and I met someone with that kind of Erlang experience, I would hope I can hire them, and that they will not be too expensive for me. I would set them to work on the interaction between the system parts and let them work on reliability and latency stuff.

It is a matter of someone having the same 2 years of experience over and over again, or someone learning many things. Personally I would welcome a chance to learn more Erlang on the job and build something with it.

Unfortunately, businesses want the fresh graduate with 10y of work experience, who already knows their complete stack. Maybe not so much in the Erlang world, but in general. Learning on the job?? Pah! You already ought to know! Just another reason to pay less!

And Erlang jobs are rare. I am between jobs, so if someone happens to know a remote job, where I could start working and learn more Erlang (have only looked at the beginning of "Learn you some Erlang for great Good"), please let me know. I would be happy to have that "hole" as part of my CV :D


> Any engineer worth their salt in a hiring process will recognize this.

Sure, agreed, but you aren't even going to get to the point of an engineer recognising this because you'll fail the gauntlet of HR with it's tick-boxes for tech stacks.

You could be extremely battle-hardened on fault-tolerant distributed systems from being the the Erlang trenches for the last 3 years, but because they HR person couldn't tick-off one of "Node.js", "Java", "C#/.Net" or "Python", your application won't ever be seen by an engineer.


> Likewise, most devs don't want to learn an obscure language for one job even if they are more than capable. Either they get stuck doing that language or they earn a hole in their resume instead of additional experience in what future employers care about.

This is less of an issue with accumulated experience. Personally I would actually welcome the kind of job that would involve learning a new niche language, since I already have >10 years of experience in several mainstream languages, and there's diminishing returns wrt resumes and interviews past this point.


Well I'm a senior looking for an Elixir job. What constitutes "wanting too much money" btw? Really curious.

I think that’s mostly due to Erlang looking too alien compared to mainstream languages. Elixir is changing that but it arrived a bit late.

"but why hasn't Erlang/Elixir been able to capture that share?"

Because Erlang has a well-integrated collection of what are by 2025 standards mediocre tools.

There is value to that integration, and I absolutely won't deny that.

However, the state of the art has moved beyond Erlang in a number of ways, and you're taking a pretty big penalty to stick to BEAM on a number of fronts now. Its performance is sub-par, and if you're running a large cluster, that's actually going to matter. Erlang qua Erlang I'd call a subpar language, and Elixir qua Elixir is merely competitive; there are many places to get similar capabilities, with a wide variety of other available cost/benefit choices. Erlang's message bus is not terribly resilient itself; modern message busses can be resilient against individual nodes in the message bus going down, and it's a powerful pattern to have multiple consumers against a single queue, which Erlang's focus on PIDs tends to inhibit. Erlang's message bus is 0-or-1 when as near as I can tell the rest of the world has decided, correctly IMHO, that 1-or-n is superior. Erlang is fairly insular; once you have to hook up one non-BEAM service to the system, well, you're going to do that over some sort of message bus or something, and you pretty quickly get to the point that you might as well let that be your core architecture rather than the BEAM cluster. Once you're heterogeneous, and BEAM is just another node on the net, there isn't necessarily a lot of reason to stay there. And as a system scales up, the pull to heterogeneity approaches infinity; takes a lot of work to go to an entire company and force them to work entirely in BEAM.

Plus, some of the problems Erlang solved in one way have developed better solutions. Erlang solves the problem of multiple code bases possibly simultaneously existing in the same cluster by basically making everything untyped. That was a nifty solution for the 1990s, but today I think we've gotten a lot better at having typed data structures that still retain backwards compatibility if necessary. So throwing away the entire type system, including all the methods and inheritance or composition or whatever, to solve that problem is a heck of a blow.

I do want to close out with a repetition of the fact that there is value in that solid integration. More people today are aware of the various tools like "message busses", but it is still clearly not as common knowledge as I'd like and I still see entire teams struggling along basically crafting an ad-hoc half-specified custom message bus every so often, which in 2025 is insane. (I have written a couple of services where I have basically had to provide HTTP "REST" endpoints that end up just being proxies on to my internal message bus that my system is really based on, because they'd rather POST HTTP than have to use a message bus library, even though it doesn't really buy them anything.) Erlang does help educate people about what are now the basics of cloud architecture. And that "well-integrated collection of mediocre tools" can still solve a lot of problems. Many sins can be forgiven by a 32 4GHz cores backed by high powered RAM, disk, and networking.

But it would take a lot of backwards-incompatible changes to create a BEAM 2.0 that would be competitive on all fronts... if indeed such a thing is even possible. The variety of techs exist for a reason. It stinks to have to paw through them sometimes, but the upside is you'll often find the exact right solution for your needs.


"It's _clear_ that there is incredible appetite for tools that help us design reliable concurrent systems given the wild success of things like k8s, Kafka, AWS's distributed systems products, etc., but why hasn't Erlang/Elixir been able to capture that share?"

Becasuse Erlang is a runtime + language and Kubernetes is a neutral platform. You can build concurrent and reliable solution without the need of locking you down to a single language.

Someone can start by just porting its Python code on Kubernetes to make it more reliable and fault tolerent.


Decision making in computing is mostly a matter of fashion / cargo-culting.

And Resume padding

Go is my favorite language but:

> Go doesn't need a "handle_event/2" because it has channels, and you should use those

Of what type? But most importantly, channels are local to the process, so you need glue to make it networked. (I assume erlang has networked message handling abstracted away). In addition I’ve seen 3-4 different variations of your proposed pattern for long-running server like things.

I agree fully that porting should make use of idiomatic constructs. But I also think languages can have hidden mechanics that loses the valuable essence while porting – a form of anti-relativism of PLs if you will.

It’s entirely possible to me that this ”oh a channel? just wrap it in X” is much more detrimental to interop than what it sounds like. For instance take http.Handler in Go. Similarly simple but what are the real world implications of having it in std? An ecosystem of middleware that is largely compatible with one another, without pre-coordination (a non-std http server X can be used with auth middleware Y and logging middleware Z). Similar things can be said about io.Reader and friends. These extremely simply interfaces are arguably more valuable than the implementations.

If, and I’m speculating here, Erlang got many of the interfaces for reliable distributed systems right, that can be what enables the whole.


"Of what type?"

Of the type of the messages you're sending. Which can either be an interface for multiple messages, or you can use multiple channels with one type each. I've done both. This is not an important question when actually programming in Go.

"But most importantly, channels are local to the process, so you need glue to make it networked."

This is an important consideration if you are using Go. Although I would observe that it isn't so much that "channels don't do network" as that "channels are a local tool"; e.g., we do not complain that OS mutexes are not "network capable", because they're intrinsically local. Network locking uses different solutions, and we don't really consider etcd a competitor to a local "lock" call.

But there are dozens of message busses in the world now, and Erlang's isn't really all that competitive modulo its integration.


I don't think behaviours are all that interesting; after all other programming languages have them.

Rather, what is interesting about the BEAM is that throwing an error is so graceful that it's not such a sin to just throw an error. In otherwords, a component that CAN error or get into a weird state can be shoved into a behaviour that CANNOT. And by default you are safe from certain operational errors becoming logic or business errors.

For example. You might have a defined "get" interface that doesn't return an error -- let's say it starts as an in-memory K/V store and it returns an optional(value), which is NULL in the case that the key didn't exist.

But suppose you want to have two datastores that the same interface targets, so you might abstract that to a filesystem, and you could have a permission error. And returning "NULL" is not actually "correct". You should throw, because that bubbles up the error to ops teams instead of swallowing it whole. A panic in this case is probably fine.

What if now you're going over a filesystem that's over the network, and the line to the datacenter was backhoe'd and there was a 10 millisecond failover by your SDN -- returning "NULL" is really not correct, because consumers of your getter are liable to have a bad time managing real consistency business cases that could cost $$$. And in this case a panic is not necessarily great, because you bring down everything over a minor hiccup.

The other power with throwing errors + behaviors is that it makes trapping errors with contextual information reporting (e.g. a user-bound 500 error with stack trace information sent somewhere where ops can take a gander) really easy and generically composable, that's not so for error monads or panics.

Anyways it was always strange to me that erlang-inspired actor system programming languages came out that obsessed over "never having errors" as a principle (like ponylang) because that's throwing out a big part of erlang.


I do not agree with the contents of this article. Behaviors are possible because of the underlying architecture of the system. Behaviors are not interfaces – they are more like abstract object in a language like Java so they implement basic and self-contained functionalities that are hidden behind a collaboration interface, but they couldn’t do much without the underlying infrastructure that makes sure that every process is totally separate from other processes, the that all processes can be safely closed without leaking memory, or resources, and that you cannot just share a rogue pointer between two different processes.

What Joe did in his thesis is to show you how you can build reliable systems (and up point, reliable distribuited systems) by using a given set of Lego blocks.

The reason why you need the erlang vm to implement something like that appropriately – and that you cannot do that fully on a different VM – is that without the underlying plumbing, supervision trees would be leaky - in Java, you cannot kill a thread that is holding up to resources and hope that everything will always go well, And do not have ways to monitor different processes.


From this article and others, it’s still unclear to me what the state-handling and state-sharing model of Erlang is. Presumably, the granularity of the crashing/restarting sequential processes is also the granularity of in-memory state sharing. But what about external state, like databases, queues, file systems? For example, if a process has taken an item off a queue and then crashes before having fully processed it, how is that accounted for? Or you might not even know from the outside if it has been fully, partially, or not at all processed yet. This is an example where correct error handling or not crashing is crucial, in my experience. Or what about processing pipelines where a component in the middle crashes. Is there something like that in Erlang? Is there an article explaining Erlang from that perspective?

> For example, if a process has taken an item off a queue and then crashes before having fully processed it, how is that accounted for?

I have worked with people that had deployed huge amounts on the BEAM that had a real problem with the answer to that, and resort to magical thinking.

When erlang processes "crash", assuming the whole system didn't crash, they almost certainly alerted a monitoring process of the fact, so that a process can be quickly restarted. This is the core of how supervision trees in erlang are built.

There are a lot of subtleties to that. The whole system may or may not be a single BEAM instance, and if more than one then they can be distributed, i.e. processes on one machine receive failure messages from processes on others, and can restart the processes elsewhere. These mechanisms on a practical basis are sufficient to automatically pick up the majority of transient failures. (I should add there are two classic ways to blow up a BEAM instance which make this less good than it should be: a bad C function call "NIF" for native something function, or posting messages to a process faster than it can consume them, which will eventually cause an OOM).

But this differs from the underlying philosophy of the runtime, which is that things are only done when they're done, and you should expect failures at any time. This maps on to their messaging paradigm.

What you actually sound like you want is a universe more like FoundationDB and QuiCK https://www.foundationdb.org/files/QuiCK.pdf where the DB and worker queue all live in one single transactional space, which certainly makes reasoning about a lot of these things easier, but have nothing to do with erlang.


> what about [...] if a process has taken an item off a queue and then crashes before having fully processed it

> you might not even know from the outside if it has been fully, partially, or not at all processed yet

Erlang does not propose a unique solution to distributed problems, just good primitives.

So the answer would be the same; you'd keep track in the queue if the element was partially popped, but not completed, and you report back to the queue that the processing failed and that the element should be fully put back.

So in Erlang you might monitor a worker process and requeue items handled by processes that failed.


Thanks. So Erlang is really only about managing process lifetimes and simple RPC? In my experience processes often have meaningful internal state, meaningful in the sense that it matters if it gets lost due to a crash. If I understand correctly, Erlang doesn’t provide any particular model or mechanisms for dealing with that?

Like fidotron said, a process's internal state is lost if it crashes (or exits).

If you want that state to be durable, you need to store it durably. Mnesia provides (optional) distributed transactions which may be appropriate for durability needs (lots of details). Or you could externalize durability to other systems.

Erlang is wonderful, but it's not magic. It won't prevent hardware failures, so if an Erlang process fetches something from a queue and the cpu stops for whatever reason, you've got a tricky situation. Erlang does offer a way for a process to monitor other processes, including processes on remote nodes, so your process will be notified if the other process crashes or if the other node is disconnected; but if the other node is disconnected, you don't know what happened to the other process --- maybe it's still running and there's a connectivity issue, maybe the whole host OS crashed. You could perhaps set bidirectional monitors, and then know that the remote process would be notified of the disconnection as well, if it still was running... but you wouldn't know if the process finished (sucessfully or not) after the connectivity failed but before the failure was detected and processed.


> In my experience processes often have meaningful internal state, meaningful in the sense that it matters if it gets lost due to a crash.

The erlang process state will be simply what it has on the stack. (Ignoring things like ETS tables for the moment).

Erlang has the concept of ports, used to interface to the world outside, that provide a sort of hook for cleanup in the event of a crash. Ports belong to processes, in the event of a crash all associated ports are cleaned up. You can also set this sort of thing up between purely erlang processes as well.

As the other commenter observed, erlang gives you the primitives to make distributed systems work; it does not prescribe solutions, especially around distributed transactions, which imo is one of the reasons some of the hype around the BEAM is misguided.


Erlang at least used to come with an in-memory database called Mnesia, that in the places I've encountered it depended on replicating all the state to every server, which usually caused some scaling issues.

There's nothing outright stopping you from doing proper design and building separate erlang services that exchange state with regular protocols, but there does seem to be a temptation to just put all erlang in one big monolith and then run into very hard memory and scaling issues when usage and data grows.

One high profile erlang user in the payment industry was mainly constrained by how big a server they could buy, as all their code ran on a single server with a hot standby. They have since moved to java, and rethought how they managed shared state

Facebook managed to get ejabberd, the xmpp server written in erlang, to back their first Messenger, but it involved sharding to give each ejabberd-instance a small enough data set to cope, and a clever way to replicate presence data outside of erlang (storing it in compact memory blocks on each ejabberd server, and shipping them wholesale to a presence service at a regular cadence).

Pretty soon they tore ejabberd out, metaphorically burned it in a field and salted the earth... but how much of that was the fault of erlang itself, and how much it was the issue of having one corner with erlang in a largely C++ world isn't known to me.


Mnesia isn't in-memory only. It also journals to disk. You can also use disk only tables that don't hold the whole table in memory (but from what I've read, perf sucks... otoh, a lot of what people say about Mnesia conflicts with my experience, so maybe disc_copies is worth trying).

OTP ships with mnesia_frag which allows fragmenting a logical table into many smaller tables. You don't need to have all of the tables on all of the nodes that share an mnesia schema. That's at least one way to scale mnesia beyond what fits in memory on a single node. Single nodes are pretty big though; we were running 512GB mnesia nodes 10 years ago on commodity hardware, and GCP says 32TB is available. You can do a lot within a limit of 32TB per node.

There's other ways to shard too, at WhatsApp pre-FB, our pattern was to run mnesia schemas with 4 nodes where one half of the nodes were in service, the other was in our standby colo, all nodes had all the tables in this schema, and requests would be sharded so each schema group would only serve 1/N users and each of the two active nodes in a schema group would get half of the requests (except during failure/maintenance). We found 4 node schemas were easiest to operate, and ensuring that in normal operations, a single node (and in most cases, a single worker process) would touch specific data made us comfortable running our data operations in the async_dirty context that avoids locking.

We did have scaling challenges (many of which you can watch old Erlang Factory presentations about), but it was all surmountable, and many of the things would be easier today given improvements to BEAM and improvements in available servers.


> For example, if a process has taken an item off a queue and then crashes before having fully processed it, how is that accounted for?

I'm not sure I understand the question - all queue systems I've used separate delivery and acknowledgement, so if a process crashes during processing the messages will be redelivered once it restarts.

Do you have a concrete example of a flow you're curious about?

Maybe these could help:

- https://ferd.ca/the-zen-of-erlang.html

- https://jlouisramblings.blogspot.com/2010/11/on-erlang-state...


To me the most important aspect of Erlang is the runtime's scheduler, which is preemptive instead of cooperative. This allows the message passing, sequential code and lightweight processes to be much more effective than in any other general language or framework using cooperative scheduling (like async runtimes or coroutines in Rust, .Net, Kotlin, Lua).

You can write actually synchronous code in Erlang and the runtime makes it so that no process blocks any other process by preempting them on a schedule.


Sounds a lot like Go

Yes, goroutines are preemptive, too. And they took a lot of inspiration from Erlang and the BEAM when they designed goroutines and channels.

In Go you manage your goroutines and channels explicitly, while the BEAM runs all processes for you, and I've seen Robert Virding run an infinite loop in one Erlang process while the rest were serving requests, the core with the loop stayed at 100% but 0 requests were dropped and the latency and throughput was more or less the same, pretty crazy capabilities.

You can do the same in Go but it's a lot more manual.


A question about erlang:

Haskell taught me a lot about programming, things that I still use now, even though I only write Python.

Does learning erlang teach you a new way of thinking? Or does it just make you wish you had erlang language features and libraries when not writing erlang?


IMHO it will teach you a new way of thinking but that way is not as generally applicable as what most people take away from Haskell.

Haven't digged deep into Erlang, but from what I read, I never thought Erlang's big idea is "lightweight processes and message passing".

Always thought the key differentiator is having a kind of "orchestrator/supervisor" for those processes.

(Which "lightweight processes and message passing" facilitates, sure, but it's more than those)


"In February 1998 Erlang was banned for new product development within Ericsson"

False statement. Ericsson still uses Erlang, for example in their MME. Source: I used to work at Ericsson.


It's not false, Erlang was indeed banned at Ericsson, which caused Joe Armstrong to leave. They later reversed course and brought him, together with the language back. This is a well documented fact in the history of the language.

It is simultaneously possible that Ericsson banned Erlang in 1998 (a statement claimed multiple times by the creators of Erlang) and that Ericsson rescinded the ban later in 2004, when they hired back Joe Armstrong.

"5.2 Erlang is banned

Just when we thought everything was going well, in 1998, Erlang was banned within Ericsson Radio AB (ERA) for new product development. This ban was the second most significant event in the history of Erlang: It led indirectly to Open Source Erlang and was the main reason why Erlang started spreading outside Ericsson.

The reason given for the ban was as follows:

The selection of an implementation language implies a more long-term commitment than the selection of a processor and OS, due to the longer life cycle of implemented products. Use of a proprietary language implies a continued effort to maintain and further develop the support and the development environment. It further implies that we cannot easily benefit from, and find synergy with, the evolution following the large scale deployment of globally used languages. [26] quoted in [12].

In addition, projects that were already using Erlang were allowed to continue but had to make a plan as to how dependence upon Erlang could be eliminated. Although the ban was only within ERA, the damage was done. The ban was supported by the Ericsson technical directorate and flying the Erlang flag was thereafter not favored by middle management."

And to be completely fair....

"6.2 Erlang in recent times

In the aftermath of the IT boom, several small companies formed during the boom have survived, and Erlang has successfully rerooted itself outside Ericsson. The ban at Ericsson has not succeeded in completely killing the language, but it has limited its growth into new product areas.

The plans within Ericsson to wean existing projects off Erlang did not materialise and Erlang is slowly winning ground due to a form of software Darwinism. Erlang projects are being delivered on time and within budget, and the managers of the Erlang projects are reluctant to make any changes to functioning and tested software.

The usual survival strategy within Ericsson during this time period was to call Erlang something else. Erlang had been banned but OTP hadn’t. So for a while no new projects using Erlang were started, but it was OK to use OTP. Then questions about OTP were asked: “Isn’t OTP just a load of Erlang libraries?”—and so it became “Engine,” and so on."

A History of Erlang Joe Armstrong Ericsson AB

©2007 ACM 978-1-59593-766-7/2007/06-ART6

https://lfe.io/papers/%5B2007%5D%20Armstrong%20-%20HOPL%20II...

There's probably a discussion on precisely what this means, but such descriptions as "Erlang is banned" has significant and credible precedent.


Yeah, I don't know why this falsehood continues to persist. WhatsApp and Ericsson engineers continue to work together to evolve Erlang, alongside a bunch of other people across the industry.

Source: I work at WhatsApp


It's not a falsehood. The confidence of working at Ericsson and WhatsApp seems to have kept you both from just looking it up.

And there is a small team of Ericsson full time devs working on developing the language itself and the BEAM.

Is there any additional context here? (Is this a common misperception that you’ve come across?)

Not the original commenter, but from hearing Joe Armstrong speak on multiple occasions my recollection is that Erlang was blacklisted for some length of time (which allowed it to become an open source language), but eventually the company realized it still offered significant value.

I posted an excerpt in a different comment, but a decent source for this is section 5.2 of the document:

https://lfe.io/papers/%5B2007%5D%20Armstrong%20-%20HOPL%20II...


My impression from Ericssonland:

Around year 2008 being an Erlang coder was often more or less seen as being a COBOL coder in Sweden. Bluetail had sort of failed, having burned lots of VC, iirc.

So Erlang was something weird and custom that Ericsson used to build software for legacy phone exchanges. I remember that a colleague's wife working at Ericsson had received on-the-job training from essentially zero programming knowledge to become an Erlang developer in order to maintain some phone exchange software.

It's been fascinating to see it morph into something cool. Whatsapp, etc.


FWIW among PL nerds Erlang was "cool" in 2000 to 2007, too. It was constantly talked about on Lambda The Ultimate, I would have loved to have used it at my work at the time... I saw it used at multiple startups in the 2008-2010 period, and it eventually got deployed for the backend of Facebook's initial Messenger version, among other places.

If anything, it fell out of favour and lost hype wave for some time after that, while other languages copied aspects of the Actor model... and mostly the BEAM hype came back in the form of Elixir.


One thing I haven't seen being discussed is the BEAM internals becoming a little long in the tooth. We still have static reductions before the scheduler switches to another task, the priority system in scheduling is a bit dodgy, flipping vmargs is kinda complex, lock counting and crash dump tooling kinda suck, etc.

BEAM is great, although it's definitely missing something like pprof for go or java flight recorder.


Thanks for posting this. What a fantastic post on the history or Erlang. Very illuminating (all of this is new to me).

Only ever played with ejabberd but felt it was a marvel of high speed engineering.

I also felt it had strong "halt and catch fire on error" properties.

Am I maligning it, and Erlang?


In 2003 I joined a startup building a horizontally scalable archive. You could add nodes to add capacity for storing data and metadata, and the system could tolerate up to a configured number of failures and carry on without loss of data or service. (This was not a general-purpose file system, it was for write-once/read-many objects.)

We built the system in Java and C. The distribution layer was done completely in Java. It was only after the system was done that I discovered Erlang. I REALLY wish I had known about it earlier. Erlang solved so many of the problems we had to solve by ourselves.


Even these says, now that Java got Virtual Threads?

I disagree. Interfaces are a trivial concept that can get bolted-on to any language. Even in languages without an official interface construct, you can replicate them in the program space.

The BEAM succeeds because you can run 1M processes on a single node, represent complex distributed state machines with ease, and restart portions of the system with zero downtime. Among many other things.

I really don't think behaviors/interfaces is the most critical piece.


I see your point to a degree.

That's kind of how Erlang is. At first, anything Erlang has, some other system has too:

Isolated process heaps? - Just use OS processes

Supervision trees? - Use kubernetes.

Message passing? - Not a big deal, I can write two threads and a shared queue in Java.

Hot code loading? - Java can do that too

Low latency processing? - I can tune my LMAX disruptor to kick Erlang's butt any day.

Now getting all that into one platform or library that's the main idea. OS processes are heavyweight. Running 2M of them on a server is not easy. You could use some green threads or promises but now you lost the isolated heap bit.

You can use kubernetes to some degree but it does not do nested supervision trees well. I guess it would work, but now you have your code, and you have pods and controllers, and volumes and all the shit.

You can do message passing with an "actors" libraries in many language. But you cannot do pattern matching on receive, and it doesn't transparently integrate with sending it across nodes to another thread.

You can do hot code loading, but how do you deal with runtime data structure and state. Erlang is built around that: gen_servers since the state is immutable and explicit has callbacks to upgrade not just the code but the state itself.


I know this is minor to your point, but I think you can get pattern matching on receive with Akka Typed.

That's a fair point. Akka did go pretty far, this is pretty neat: https://doc.akka.io/libraries/akka-core/current/typed/intera...

The BEAM's real special sauce is in its preemptive scheduler in that it is impossible for one process to take down the whole system, even if bad processes are eating up the entire CPU. This cannot be done in any other language.

Worth noting as a disclaimer for people reading this: this is only true so long as external interfaces are not used. As soon as the BEAM calls out to other binaries, you lose that guarantee.

A disclaimer to the disclaimer is that you should be isolating that binary on dedicated nodes if there are stability concerns. Inter-node communication is trivial on the BEAM and it can be tweaked in the future when the scale of the system calls for it.

If you write an Erlang function in C you can actually call a function that lets the scheduler know you are willing to yield.

> Isolated process heaps? - Just use OS processes

OS procs are heavyweight. Erlang procs are ~2KB each. You can spin up millions on one box while traditional OS procs would melt your machine. Not even in the same league.

> Supervision trees? - Use kubernetes.

Comparing Kubernetes to Erlang supervision trees misses the mark. K8s is infrastructure that requires significant configuration and maintenance outside your codebase. Erlang's supervision is just code - it's part of your application logic. And those nested supervision trees? Good luck implementing that cleanly in K8s without a ton of custom work.

> Message passing? - Not a big deal, I can write two threads and a shared queue in Java.

Basic threads and queues don't compare to Erlang's sophisticated message passing. Pattern matching on receive makes for cleaner, more maintainable code, and the transparent distribution across nodes comes standard. Building equivalent functionality from scratch would require significantly more code and infrastructure.

"Any sufficiently complicated concurrent program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Erlang."

> Hot code loading? - Java can do that too

Java can reload classes, but Erlang was designed for seamless evolution of running systems. The gen_server callbacks for upgrading state during hot code loading show how deeply this concept is integrated into Erlang's core.


It's funny reading your response because I think you skipped the second half of the comment! Your rebuttals to the rebuttals are pretty similar.

its friday my brain is cooked - you're right.

No worries, at all I had figured you'd notice eventually. But I essentially agreed with you was just saying that common perceptions and how Erlang managed to do well there.

I haven’t used it enough to be able to say yet, but I believe the BEAM avoids part of the problem Ian Cooper (Where Did It All Go Wrong?) rediscovered, which is that microservices don’t min-max the inter- versus intra-modular friction in systems.

I would not say that Beam eliminates this problem in any way, but I do think it lowers the slope of the line. The self-consistent idioms and functionality, especially with deployment, auto recovery and load balancing, reduce the inter-module friction. It makes a system where 12 engineers can easily manage 30 endpoints, and your surface area can still follow a power law.


Discussed at the time:

Erlang's not about lightweight processes and message passing - https://news.ycombinator.com/item?id=34545061 - Jan 2023 (274 comments)


Erlang is my favorite language but getting a job writing Erlang feels impossible. I make it a habit to ctrl-F every Who’s Hiring? thread and find Elixir occasionally and Erlang never.

Can you articulate the kinds of business problems Erlang is particularly well-suited to solve?

When you choose Erlang for a project, what kind of return on investment do you think it typically offers? Does it lead to significant cost savings or help generate more revenue in ways that other languages might not?

In situations where Erlang is chosen, what are some concrete examples of how it has demonstrably increased efficiency, reduced errors, or enabled new business opportunities that wouldn't have been as feasible with other technologies?

Edit: I guess if I'd done any research myself before asking, I might've found this: https://www.erlang-solutions.com/blog/which-companies-are-us...


Interesting. It strikes me that some of this rhymes with the platform abstraction of roc[1]

[1] https://www.roc-lang.org/platforms


"In 1998 Ericsson decided to ban all use of Erlang. The people responsible for developing it argued that if they were going to ban it, then they might as well open source it. Which Ericsson did and shortly after most of the team that created Erlang quit and started their own company."

Bwahahaha. Reminds me of the JRuby team, who left Sun as a single unit and resumed work as a team at another company (I can't remember where) when Oracle acquired Sun.


I think if you ever find yourself saying Erlang is about one thing, you've lost the plot. Erlang has a bunch of features that each would be powerful in their own right, but they aren't special--there are other ways to solve the same problems, and some of those ways even have merit. But Erlang chose the set of features they did with a very high level understanding of how those varous features would interact. I honestly think it's one of the greater achievements humanity has accomplished. It lacks the obvious elegance that, say, Lisp, has, but I think the sum of parts in Erlang was probably a lot harder to make.

(2023)

Someone explain to me why I should prefer Erlang/BEAM/Elixir over something like Akka.NET?

With the latter I get a huge ecosystem of packages and wide compatibility with platforms and tooling and also a robust and scalable actor model.

Learning Erlang or any related language meanwhile feels like learning Tolkien’s Elvish for the purposes of international trade.


No, we can't explain to you why our blub language should be preferred to your blub language. It's your job to make that determination on your own.

I can come back in 5 years to explain to you what is annoying about Akka.NET compared to the BEAM and vice versa. An expert in the BEAM who lacks experience in C# is not going to be able explain to an expert in C# who lacks experience in the BEAM why BEAM is better.

You're asking for something incredibly rare - a person who is an expert in both runtimes and can concisely explain to you the tradeoffs of each.


_Supposedly_ they are more convenient if you are willing to tolerate abysmally subpar efficiency, exotic semantics and lacking ecosystem.

If you want to do exclusively distributed computing at the application level - Erlang/Elixir will be better. They can offer nice Northstar of where the UX of Akka.net/Orleans should sit at (and, arguably, Orleans is not exactly nice to use in comparison).

Otherwise, aside from educational purposes, they are not worth spending your time on. Just skip to F# over Elixir because Elixir is not a serious language, lacking base language primitives and operations one would expect standard library to offer. It's not productive nor fast.


Is it just me or does Erlang's syntax look a little bit nicer than Elixir's?

It's inspired/descended from Prolog, and my impression is that many people find it a bit odd. It is at first, but I quickly adjusted to it and quite like it now.

Erlangs syntax takes a bit of getting used to but it's very pleasant to use once you're familiar with it. I like it a lot.

I learned Erlang at school and used to prefer its syntax for years. However, after giving Elixir a chance and writing 1000 loc I was converted. Now I look at snippets of Erlang in docs with mild disgust.

Elixir came from Ruby developers and thus has similarly verbose syntax and macros. Erlang's syntax came from Prolog, which was used to implement the first compiler and is why Erlang's syntax is more concise.

I'm an outsider to this ecosystem, but I've seen a few people share that same opinion. They prefer the explicitness of Erlang.

Elixir is still very explicit from a syntactic point of view. Macros allow hiding a significant amount of boilerplate in certain behaviours, though. So it's a matter of preference, of course.

gleam is probably my favorite middle ground between elixir and erlang.

TLDR title for erlang people: erlang is not just erlang but erlang + OTP.

"In February 1998 Erlang was banned for new product development within Ericsson—the main reason for the ban was that Ericsson wanted to be a consumer of software technologies rather than a producer." - The creator of the language banned any use of it internally.

Being a consumer rather than a producer of tech is strictly a business decision. There's significant cost to producing and maintaining a language, and Ericsson no longer wanted to pay the upkeep.

That's not necessarily an indictment on the language itself. The alternative would have been to keep using it while also open sourcing it, but I'm guessing they just wanted to be able to hire cheaper C developers or whatever the flavor of the time was.


No, the company the creators worked for. And six years later they hired Armstrong again and silently lifted the ban.

It is wildly disingenuous to just copy paste that line from wikipedia and not the rest of the paragraph.

> In February 1998, Ericsson Radio Systems banned the in-house use of Erlang for new products, citing a preference for non-proprietary languages.[15] The ban caused Armstrong and others to make plans to leave Ericsson.[16] In March 1998 Ericsson announced the AXD301 switch,[8] containing over a million lines of Erlang and reported to achieve a high availability of nine "9"s.[17] In December 1998, the implementation of Erlang was open-sourced and most of the Erlang team resigned to form a new company, Bluetail AB.[8] Ericsson eventually relaxed the ban and re-hired Armstrong in 2004.

- edit, poster was quoting a quote in the article, not wikipedia, the article is the one omitting the context


But from the quote it seems that for reasons unrelated to the language itself?

Right. Companies need to figure out what they are going to be good at and what you buy from someone else. There is no one correct answer, and often the best answer changes over time (both reasons that different companies will have different answers). My company architects have been debating what IPC framework we should adopt for longer than it would take me to just write one from scratch - but they are correct to debate this instead of writing one because adopting means we don't have to maintain it and eventually that should cost more than all the debate. (note I didn't say will, though I think it is likely)



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: