BEAM is such an awesome and beautiful VM. Not only is concurrency at a scale unimaginable in other languages the norm for BEAM, but the isolation of running processes didn't hit me until I asked an Elixir dev about running BEAM in Docker. The confused look told me I was asking an odd question: processes in BEAM are as isolated from each other as console apps running in different VMs in VMWare server. These processes can send messages to each other but they cannot access each other's memory. It was truly a mind-blowing revelation!
> The confused look told me I was asking an odd question: processes in BEAM are as isolated from each other as console apps running in different VMs in VMWare server. These processes can send messages to each other but they cannot access each other's memory.
The isolation is really not that high. erlang:process_info[1] allows one process to observe an awful lot about another. sys:get_status [2] often works (depending on software design choices) and provides even more data. I would not run two applications that needed isolation for security from each other in the same BEAM or as part of the dist network.
> The confused look told me I was asking an odd question: processes in BEAM are as isolated from each other as console apps running in different VMs in VMWare server. These processes can send messages to each other but they cannot access each other's memory. It was truly a mind-blowing revelation!
Not quite: the messages sent are kept in shared memory (which is reference-counted). Cycles can't occur due to immutability. But with everything else, you're correct. One reason for this, is that every thread has its own garbage collector! This allows multithreaded garbage collection without stop-the-world pauses.
Source: Did a bunch of research into BEAM for implementing my own multithreaded GC.
> This allows multithreaded garbage collection without stop-the-world pauses.
This was one of the things that really floored me when I learned about how BEAM was designed. It's incredible that how all the seemingly small changes come together in a way that works really well for the problem domains Erlang was used for.
Anyone interested in this subject would be extremely well-served by picking up a copy of the late, great Joe Armstrong's "Programming Erlang." Putting together a minimal "server" and then slowly evolving it into a GenServer is incredibly fun and rewarding. And then the kicker is you get hot-swapping for (almost) free! Learning about Erlang hot-swapping was probably one of my most favorite moments in my brief time using Erlang before moving onto Elixir, where you get it all right out of the box.
> the messages sent are kept in shared memory (which is reference-counted)
Only large binaries and slices of them use reference counting, other things are copied. And of course, if the process is remote, everything is copied :)
Bin0 was on the process heap, but Bin1 is a refcounted binary with 256 bytes of storage. If you're putting Bin1 into ets/mnesia, use binary:copy/1 to get a clean copy that doesn't reference a lot of extra storage.
Interesting. I'm glad to be wrong because then I learn something!
I'm not sure if you have answers, but I have questions:
1. How do they determine what size to start sharing/refcounting at? Now that I'm actually considering it, it's obvious that they'd copy small stuff--an obvious lower limit would be sizeof(<reference_counter_type>) + sizeof(<pointer_type>) because smaller than that, it actually is faster/less memory to copy. But the numbers you're mentioning are higher than that--is there some metadata being stored beyond the reference count and pointer?
2. Is copied data stored in thread storage and garbage collected by that thread's garbage collector? That would be my guess.
Obviously the source of truth for all these questions is the code[1], but it's been a while since I dug in there. Maybe it's time to go again.
There's no thread-local GC, it's really per process. For instance a scheduler thread that's running a process and hits the need to do an expensive GC may delegate the task to a dirty scheduler (another type of thread, dedicated to long blocking tasks), and go back to running other processes in the meanwhile.
It's frequent for processes to move around between schedulers, they steal work from each other.
Regarding question 1, my understanding is that all non-binaries messages are copied regardless of size. Only binaries ever use shared memory. Early in BEAM development they debated optimizing shared memory for local messages but ended up not using it, partly due to the extra code complexity. Always copying messages means there’s little difference between local or remote messaging code paths and removes ref counting. Haven’t read the source myself regarding that, but it would be an interesting price to study.
I’d bet optimizing local message passing would be a good feature of a BEAM implementation in Rust since the ref counting and other details would be less worrisome. Another research topic would be how BEAM would compare to say Java/Dotnet runtimes on cpu cores with lessened hardware shared memory guarantees and relying on message copying between cores instead.
Wouldn't the point of running in Docker still be to guarantee version, environment, and config compatibility? Just because you run an Elixir app on your local machine doesn't mean it will work worry-free on Prod. Without Docker, you might have a different version of BEAM installed (or none at all), etc.
I think OTP Releases would still depend on the target host being configured correctly and having all dependencies installed that your application needs to run.
Containers are all about consistency of the environment. For example, maybe your Elixir app wants to log out to something like Splunk or DataDog...if those agents aren't installed on the host, it doesn't matter what facilities BEAM provides. Or maybe your app depends on certain configs, libraries, or files being available at specific paths on the system. Things like that would be captured in the image your containers are running and guarantee consistency across machines and environments.
This is a question that I see hotly debated a lot in Elixir circles. BEAM and the Elixir toolset do have a lot of tools that overlap with Docker, so there's a lot of people that advocate for "not bothering" with Docker, while there's other people that are familiar with Docker (and perhaps not as familiar with BEAM) that will tend to use Docker. They both have more or less the same capability to deliver the same functionality.
IMO, in most use cases it makes sense to just use whichever you're more familiar with and will be more productive with. After all, that is the entire point of said tools...
Elixir Releases/Distillery definitely do overlap with Docker deployment techniques. The BEAM also has scaling capabilities that are tantamount to K8s horizontal scaling.
Deployment doesn't overlap. Erlang releases bundle up your code but you still need libraries like openssl, to have a nice bundle that can actually just run anywhere Docker is the way to go.
And horizontal scaling between the two do not overlap either, not sure what you mean. Packing containers (processes) efficiently into a cluster of nodes is not something Erlang does.
Edit: Erlang/OTP does offer fail over for what it calls "distributed applications" but based on a static set of nodes -- not horizontal scaling, anymore than other languages/frameworks do by letting you spawn new instances...
>Deployment doesn't overlap. Erlang releases bundle up your code but you still need libraries like openssl, to have a nice bundle that can actually just run anywhere Docker is the way to go.
And with Docker you still need libraries like, y'know Docker for your code to run.
>Packing containers (processes) efficiently into a cluster of nodes is not something Erlang does.
How much have you used BEAM? It definitely does do that. Scalability is one of Erlang/Elixir's primary benefits, and node clustering is exactly how it does that.
BEAM doesn't do everything containers or a container orchestration system can, but in view of my relatively short time as a regular Erlang programmer (definitely less than 10 years), the spirit of tidepod's point accords with my experience (with a weaker interpretation of his term "node clustering"). BEAM and OTP don't do automatic load distribution between nodes, but can be fashioned to do so easily with application support.
We never used pool. The nodes were mapped onto heterogenous machines sharing the host with a 3rd-party daemon. It's configuration changes even took place through a module update hook written in Erlang itself. We both deployed new code and distributed work "manually" across them entirely on OTP.
[NOTE] It it surprising, or was to me, that there are problems with having a fairly small number of nodes fully connected. I'm lucky enough to have avoided learning this the hard way, but imagine this could serve as a painful backbone to an "Erlang deployment war story".
Sure, you can write it yourself, and probably it is even easier to write in Erlang -- up to a limit due to the issues with distributed Erlang discussed above.
I worry about, and have seen this both in the first hype phase of Erlang a number of years ago, the misconception about what Erlang offers and the resulting frustration, blaming it on the tool and quitting.
Hoping in the chapters I'm working on for https://adoptingerlang.org/docs/production/ I can better explain the benefits of running on k8s (or similar), while also making clear it certainly isn't the right choice in all cases.
Nice synopsis in the docs. I’d suggest that instead of saying k8s and OTP don’t overlap that it’s more that "they don’t completely overlap" Fenn diagram style. That’d be a bit more nuanced answer and help people judge which category they fall into.
There are many cases where having an Erlang cluster and doing ‘naive pooling’ at the application level would get you 80% of what k8s + routing layer would get you. I’m assuming more of an an all Erlang/Elixir environment which many small startups could get by with. Even then Docker + Fargate or whatnot would still simplify deployment. Personally I harbor a secret desire to see if I could replicate part of the k8s interface using Nerves images + some otp tooling. Probably better things to do with my life though. ;)
Also by ‘heart’ are you referring to how ‘epmd’ works or Erlang distribution?
For 'heart' here I'm referring to the heart program that'll start if you run erl with `-heart` http://erlang.org/doc/man/heart.html -- it will restart a down node.
I think part of the confusion is you in general don't need k8s, but when your size and requirements get to a point that k8s makes sense (which arguably lowers as hosted k8s becomes better) it is not in conflict with your also use of Erlang/Elixir.
I find distributed Erlang much nicer in k8s env (I don't have to maintain it obviously) where I get an IP per node, can add a k8s service making it possible to use DNS SRV queries to find all other nodes and letting k8s worry about where pods run and keeping them up.
Plus there is configuration management and consistent storage (resources in etcd).
I'm not a BEAM user, so I may be wrong, but as I understand it, that's not really true. Two VMs are isolated at a much more fundamental level which matters for security and containment. For example, two BEAM process, while they don't have direct access to each other's memory in the sense of say global shared variables, still have access to the same physical memory, the same disk partition, the same file descriptors, the same sockets, etc..
So I'd imagine there's potential utility in still running inside docker, as a means to deploy the expected OS, the correct assets, the version of BEAM itself, etc. And there are reasons to isolate all that with VMs, to prevent sharing of machine resources like disk, file descriptors, sockets etc.
BEAM is amazing. The scaling is insane. I built a business texting platform using it (https://pigeonsms.com).
As we brought on our first few customers I kept expecting we would need to increase our hosting capacity. Here we are 2 years later and we are still running the whole thing with our initial hosting setup.
It seems it definitely hits a niche market in the current time but say in 2030, I’d imagine most people will have migrated over their phone setups to a modern text-supported by default one ?
> Finally, I've seen various reports that the practical size limit of a BEAM cluster is in the range of 50-100 nodes. The reason for this is that BEAM cluster establishes a fully connected mesh (each node maintains a TCP connection to all other nodes), so at some size this starts to cause problems. As far as I know, the OTP team is working to improve this, but as of OTP 22 it is still not done.
I've run clusters of 1-2k machines at my last job (maybe it was bigger, but I can't remember for sure). Holding a TCP connection to each other node is not a problem --- we certainly had a lot more connected clients than connected servers, tuning memory for buffers can be an issue on low ram systems. Global can get to be a problem, I'm not sure of the state in open source OTP, but if you have multiple nodes contending on the pg2 global lock for a group, it can get really slow; there's ways to make that better, but you do need to be careful not to introduce new deadlocks. If it's still using the simple method of try to lock everyone, if unsuccessful unlock and wait a bit and try again doesn't work well under significant contention or if one (or more) nodes is unhealthy and running slowly, but staying online.
The quality of network needed really depends on your tick timeouts, and the amount of data you're transmitting. Dist will work with slow and lossy networks as long as it can get a ping transmitted often enough. I think the default tick time is 30 seconds, and four failed ticks disconnects, so you really just need pings coming through once every two minutes, and for your OS not to give up on the TCP connection.
It wouldn't work well for mobile, but between two reasonably connected datacenters, it should be fine. Anyway, dist should only be used between nodes at the same trust level --- anything you can do on one node can be done from the other node; consider it a bidirectional shell. I've debugged plenty of cases where an intermediate link was congested resulting in very low throughput, and tens of minute message delays on dist; it was still working ok --- just anything synchronous would take forever.
>First, in my opinion distributed BEAM is mostly intended to run on a network which is fast and more reliable (such as local network). While in theory it can also work on a less reliable/slower network (e.g. geographically dispersed machines connected via Internet), in practice you might experience more frequent netsplits which could cause various problems, such as worse performance or less consistency.
This is exactly why I wasn't excited about LiveView[1]. It felt like a step backwards in terms of human-centric design. Another tool that makes us consider our network bars first and our life second.
In general, I'm kind of disappointed that Elixir isn't leading the way on decentralized and offline-first technology, but I guess it's a limitation of BEAM running on small/low-powered devices?
Have you actually tried it on a cellphone though? LiveView works decently enough for me over my cell phone connection even when using a vpn. I wouldn’t want say a todo app made using LiveView, but for dashboards and other sites which require server data anyways it does fine. Oddly I find LiveView can even feel subjectively faster than an SPA making a discrete request and then rendering the result.
Not with the GRiSP board, but I’m working with Nerves and embedded Elixir. Even using Erlang distribution to communicate between devices over an internal Ethernet lan. Works pretty well actually.
The speaker mentions that Redis, MongoDB and background jobs were replaced by Erlang.
What does he mean by that exactly? Erlang provides some persistent state storage? Or is he just saying he used Erlang database drivers to access Redis / MongoDB?
You can use ETS and Mnesia, and overall it is built to be distributed so you can pass messages between processes/nodes without needing something like RabbitMQ.
Most of the time Erlang and OTP provide what you need already without having to reach for an external tool. (Obviously depending on your use case)
Furthermore, job processing is much easier with Erlang because the processes and the scheduling mechanism - there could be no job, only processes doing their things.