In Erlang the fault-tolerant behavior is not builtin either, only tools to make ...

hosh · on Jan 27, 2023

Our team had worked on an Elixir app for a couple years, before splitting off game logic into Dotnet. Scaling the dotnet server was a much different beast:

- It wasn't designed to crash on failure. It uses thread pools with no supervision trees. We had to add in liveliness probes to check if it is alive. I've only had to use readiness checks for Elixir

- No REPL. With a REPL in production, we can debug things live, even try patches to see if those work. Can't do that with Dotnet. That's also something that contributes to reliability

Now, cluster size do matter. The way Erlang and BEAM was designed were for vertical scaling. You can minimize cluster size by biasing towards vertical scaling. That's what we do on our systems. There's a way to do that with Kubernetes so that we scale vertically during our daily traffic cycle.

At some point though, you start looking at partial clustering topology for BEAM, or use one of the many process registeries that are better suited for dynamic membership. (The one bitwalker wrote comes to mind).

throwawaymaths · on Jan 27, 2023

That's wrong. Fault tolerance is basically the default. Yes you have to build a supervision tree, but unless you're writing a one-off script, you have to build it to anyways to do anything.

mapcars · on Jan 30, 2023

Well you confirming what I said - you have to build it, it's not that Erlang programs automatically never fail and always handle problem correctly as required.