Can you elaborate? I’m intrigued.

mikepurvis · 2024-08-04T02:14:06 1722737646

Not a huge erlang person but I think the idea is that redundancy and robustness is managed by the BEAM process on each host, so that layer is much higher in the stack. Compared with k8s where it’s like okay die if something goes wrong and the container orchestrator makes you a whole new chroot as if you just rebooted.

SoftTalker · 2024-08-04T02:24:02 1722738242

And multihost is included out of the box, you don't really need to do anything special.

__MatrixMan__ · 2024-08-04T03:02:39 1722740559

Well, you're hearing it from a guy who has written hardly any Elixir or any other BEAM language. (For a proper intro I recommend this video https://youtu.be/JvBT4XBdoUE). Less practitioner, more fanboy. So I may not be the best source. But I'll try anyhow.

The BEAM is a virtual machine, I guess kinda like the JVM. So just like you can write Java or Kotlin or Clojure or a million other JVM languages, so too can you write Erlang or Elixir or Gleam (I like the look of Gleam)... And expect similar interoperability.

The BEAM has its roots in the telecom world. So while Sun Microsystems was doing the Java thing to make webservers or applets or whatever for the JVM, Ericcson was doing Erlang things to make things like long distance phone calls happen on the BEAM.

(I'm not a fan of Java, I just think it's a decent thing to compare with in this case)

The BEAM folks take a different approach to concurrency than is common elsewhere. BEAM processes are much more lightweight than OS processes, so while it might be insane to run a separate copy of your server for each user, it's less insane to run a separate BEAM process for each user.

BEAM processes interact through message passing. Of course most other processes do to, but only because the developer built it that way. With the BEAM it's built in, each process periodically checks its mailbox for a message which matches its criteria, and if there's no message, it sleeps until it is revisited by the scheduler. There's no async/await business. They're all single threaded and sequential. Instead, you achieve coordination by having many of them, some of which are in charge of starting/stopping/organizing others. (I guess they build structures out of these things called "supervision trees" but I don't precisely know what that is).

This has all grown up in a world where nodes are expected to be physically separate (like either end of a phone call) so you end up with a bit more fault tolerance than if each process is expected to be on the same machine.

In Kubernetes you've got this mountain of yaml which you craft to tell the container orchestrator how to treat your app. And then you've got your app itself which is probably not written in yaml. So I find it very jarring to switch between my dev hat and my ops hat.

And Kubernetes... That's Google's baby, right, so it makes sense that it doesn't feel the same as the underlying app. As a cloud provider, they need a rather high wall between the app and the infra. But I think it causes all kinds of problems. At least in my world, the apps are either in Python or Go, so when there's a problem someone will come along and solve it with yaml-glue to add an additional container which may or may not resemble the app which has the problem.

My brain struggles to hop from Python to Yaml to Go (and there's usually some bash in there too).

The BEAM, by contrast, expects processes to start and stop other processes. So your orchestration logic and your application logic are in the same language. You don't have to express your wishes in yaml and then navigate all of these superfluous layers (e.g. the container entrypoint script, port forwarding, in-cluster DNS, etc) to have your wish granted. That kind of communication is handled by the BEAM's inbuilt message passing system.

If I got to rebuild our stack from scratch I'd use Kubernetes as a cloud-provider-agnostic interface to get access to compute, but instead of expressing anything about the app in YAML, I'd handle all of that extra stuff (e.g. log scraping, metric aggregation, whatever hacky fix is needed today...), I'd handle it in the BEAM, right alongside my app.

People like to say "build security into the app" or "build observability into the app", but standard practice is to bolt on solutions that don't resemble the app at all. My (probably flawed) perspective is that if you scratch those itches within the BEAM, then you're going to end up with fewer superfluous layers of abstraction. Also fewer distinct niches that you now must find a specialist to fill when the old one quits. Also, you end up more in control of your app because since you more or less wrote the orchestrator, you're relying less on the cloud provider to be a reliable puppet master.

---

It's slow going, one class per semester, but I've been taking biology classes on the side. I sometimes think about making a break for it and trying to build something like farmbot but for driving a microscope, or a pipette, or maintaining the temperature/pH/etc in a bioreactor.

These are, for now, just dreams.

Sorry for the diatribe, but you did ask me to elaborate :)

DevOfNull · 2024-08-04T05:38:52 1722749932

Different person, but thank you for the writeup! Very interesting. For anyone else reading: Please write more comments like this, they're one of the best parts of HN.

tonyarkles · 2024-08-04T12:51:15 1722775875

To elaborate a little bit on the supervision tree thing then: there's a bunch of different behaviours you can associate with process failure depending on your needs. Let's say you have a Postgres connection pool and for some reason the pool manager process dies. You can set it up so that the death of the manager will:

- kill all of the child processes that the pool was managing

- return an error to all of the request handlers who had active queries going while not touching the request handlers who didn't

- restart the pool manager

- once it's running, respawn the managed pool processes

This is all machinery that's pre-built into the OTP runtime. While that's all happening your app as a whole can keep trucking along and everything that doesn't need to make a database query carries on without even noticing that something was amiss.

The slogan "let it die" gets tossed around the Elixir/Erlang community quite a bit. This is referring to Erlang Processes (the internal lightweight processes, not the host process with a formal OS PID associated with it). Your whole app doesn't die, just the broken parts, and the OTP supervisor subsystem brings them back to life quickly.