We developed Online Charging System in Erlang that served couple million subscribers for close to three years. I found the whole experience fairly terrible.
Erlang is nice enough language and I don't mind the syntax but it's also kinda cumbersome and verbose at times. For example, adding `true -> ok` to every if statement gets old fast. Similarly, Erlang/OPT is a nice platform but some parts are fairly bad. Looking at you `inets/httpc`.
But the really big problem was the whole ecosystem. Build process was needlessly complicated. Surprisingly enough multicore support was not great. Performance was not that great. It seemed like a lot of libraries were abandoned after 2013. I could go on.
We ended up rewriting the whole thing in Java and, so far, it worked out great. And after Java 8, Java the language is not so bad. I still have somewhat found memories of Erlang but I don't miss it.
Not to refute your experience, but "adding true -> ok to every if statement" sounds like an anti-pattern. A couple of notes to elaborate:
1. If-statementes should rarely be used in Erlang because case is preferred
2. If you're returning ok everywhere, you could also just let it crash by doing something akin to true = function(). Always go for let it crash first, unless you really know you can handle the error sensibly (and then use a case-statement)
3. httpc is kind of known to not be the most formidable HTTP client. Hackney is a more modern and better choice. The good thing with httpc is that it is included with OTP, otherwise there are better choices (scalability- and feature-wise)
4. When someone says "multicore support" and "performance" support is not great it's usually one of two things: (a) they're developing a use case that is not fit for Erlang (e.g. compute heavy) or (b) they don't know how to use Erlang properly (e.g. too complex process setups)
Now, many of these things are not obvious to people new to Erlang and takes some experience to know about. This I would say is the bigger problem with the Erlang eco-system.
if
DeductCreditInRatingGroup ->
log("deducting credit...");
end;
unfortunately you cannot write it this way. You have to write it as:
if
DeductCreditInRatingGroup ->
log("deducting credit...");
true -> ok
end;
Not a huge deal but it's annoying.
(2) Let it crash attitude is something I never understood. It's just not something you can do most of the time. If subscriber consumed 2MB and you let it crash in the middle of rating function you just "lost" 2MB. With couple of million subscribers on LTE that turns into very expensive error handling very fast.
(3) We actually ended up using hackney.
(4) We were not doing anything compute heavy. I think our application was well architected and we had no problem fitting it into OTP "framework". Comparing CPU and memory usage to our Java implementation, Java is more performant.
Nowadays though, you should really use the new logger introduced in Erlang 21. There you can do run-time filtering on many different parameters, or even write your own handlers. Using if-statements is usually considered an anti-pattern in Erlang, and I think very common coming from languages where only if-statements exist. Problems are almost always better solved by using case, or even better, pattern matching in function heads.
2. Sounds like one of the rare cases where you care about the error and can also handle it. In the use case you mentioned, I would buffer the data outside of the loop that produces it and wrap the loop in a try-catch statement. That way, you don't loose the data if one iteration crashes.
4. Erlang usually ends up being good at latency and concurrency (scalability over cores), together with a smaller and easier to read code base. I'm curious about your case, and what would have made Java a better fit. If you share a lot of global objects, Java might be faster for some things (at the cost of concurrency usually). Erlang has tools for that as well though (e.g. ETS and recently persistent_term).
(1) was not about logging. I was trying to illustrate a valid use case for `if` statement that requires `true` guard expression. We used lager[1] for logging. I understand that new logger was inspired or loosely based on lager.
(2) I would argue that it's a rare case you don't care about errors.
(4) Multicore support in Erlang was not always great and you would run into all kinds of problems after 2-4 cores. Than for a while it was 12-16 cores. Not sure what the limit is these days.
Again, I am not arguing Java over Erlang. There is no lack of great examples of large successful project running on Erlang. For me and my team Erlang was just too much hassle for not much gain. Your milage may vary.
Can't comment on your other issues (I'll say I have noticed the complete opposite in practice, where the same team working with Java vs Erlang led to far more reliable, performant systems in Erlang, but I obviously can't speak to your experience), but for (2) -
You always care about exceptions/errors. The thing is, most you won't predict/can't handle. Those you do predict, and know how to handle, you should, because they're not really errors/exceptions at that point; they're just edge cases.
The point of Erlang is not to make it so you just throw instead of addressing an edge case; it's to make it so you reason about how, if something you don't predict and don't know how to handle happens, you can get back into a known good state. It's actually phenomenal at doing that. I've had complicated user facing production systems work without noticeable issue for years (even while under active development) in Erlang. I've never seen that in any other language. Not to say it can't happen, I just haven't seen it, and the operational lift to achieve that was no different for us than just "make it work"; we were spending the same amount of time thinking about failure cases with the Erlang system as other languages. The difference was a better approach to handling the things we didn't think of.
1. My point still stands, using if-statements is not common in Erlang exactly for the reason you mention.
2. In Erlang it usually only make sense to care about errors which you can actually handle in a sensible way. Everything else you usually "let crash" and let the supervisor tree deal with recovering. Defensive programming is another anti-pattern in Erlang in my opinion.
4. Not sure what you mean with "not great". With Erlang, if you have a completely parallelizable problem you should see close to linear scalability. If it drops of after that it's either because the problem is not parallelizable enough or because of your architecture.
Yeah, I'm not arguing Erlang over Java either. I'm just pointing out that blanket statements like this are usually because of architecture or design rather than shortcomings of the language itself.
2) Let it crash attitude is often misinterpreted. It means your system should be fault-tolerant in general, so you can focus on the happy case instead of trying to predict all humanly possible failure modes. If you _expect_ invalid input, then handle it properly.
> Let it crash attitude is something I never understood. It's just not something you can do most of the time. If subscriber consumed 2MB and you let it crash in the middle of rating function you just "lost" 2MB. With couple of million subscribers on LTE that turns into very expensive error handling very fast.
Let it crash is the only sensible thing to do. Depending on the type of error, the 2MB could be absolute garbage. I would rather stop sending any message instead of sending potential wrong data without knowing.
I can't refute your personal experience, but 10 years of Erlang work left me longing for more. It sounds as if perhaps Erlang was not used optimally, or, alternatively, best practices may not have been followed.
10 years ago the ecosystem was horrendous, and is much better now. Granted, the build experience could be better but I didn't find it that bad.
I find that people's experiences are relative to what they are accustomed to. Early years for me included compiling and linking C in Windows (segmented) and that was truly horrible!
Replace "Erlang" with any other language and the statement still holds true.
If you want the obvious explanation, it's because we all have different experiences, and no one tool will be the biggest win for all teams across all domains.
Interesting, do you think Elixir would be better in your case? Also what was the amount of requests per second? Looks like you didn't really need Erlang's scalability.
No; as I understand it Elixir's only advantage is, arguably, better syntax. I don't mind Erlang's syntax that much.
We were handling, in peek hours, over 100k requests per second spread over 3 servers.
Funny you should mention scalability as something that somehow justifies Erlang's other shortcomings. I don't think scalability is something Erlang/OTP does out of the bag. One thing we learned the hard way is that Erlang does not do overload well at all. Erlang mailboxes are unbounded and it can get very messy very fast. I always found that design choice odd given Erlang's origins in telecommunications.
Now this is not to say you can't do massively scalable systems in Erlang, it's just that there's no silver bullet when it comes to scalability.
Scalability isn't something any language does out of the box. The design and implementation of the application determines whether it scales or not. In that sense, Erlang scales about as well as any other language. It does have some features that might make scalability easier to implement compared to some other languages, but you have to know how to use it properly.
You can get just about anything to "scale" given architectural compensations (like scads of containers), so this, ipso facto, doesn't mean much.
> so why not use a language that's nicer to work with?
Well, why not indeed? But here are some thoughts about that.
Nothing is a panacea, Erlang included. Some of the ecosystem is ugly and hard to work with, but improving. Erlang is in a vicious cycle:
* Erlang ecosystem sucks
* Fewer people want to use it
* Fewer people are available to improve the ecosystem
* Fewer people are available to fill jobs for Erlang
* Less demand for Erlang
* Loop
Erlang gives you certain things that no other language I know of does, and these things make it a more robust, easier to program, highly concurrent solution without needing architectural compensations for those things.
That doesn't mean that you can magically throw 1 million TPS at Erlang running on N boxes and your internal queues/mailboxes won't overflow. You, along with the rest of the world, need backpressure handling mechanisms.
What Erlang gives you that Java doesn't is totally decoupled processes that cannot directly affect each other. No mutexes or semaphores, no shared global memory.
It also gives you ultra-lightweight processes that garbage-collect independently of each other, that are cheap enough to start and stop that you can create one per incoming connection and make it completely unaffected by any trouble that any other process gets into. This is not true of threads in Java.
It gives you a supervisory framework that, if used judiciously (yeah, yeah, no true Scotsman, but that doesn't make this any less valid) gives you an ultra-robust system that degrades gracefully under error or overload conditions.
It gives you linked processes that detect a broken process and brings down the others (if you so choose) to avoid leaving orphaned processes cluttering up the system, plus automatic restart, baked in. These do come free of charge when using OTP supervisory frameworks in the recommended fashion.
I think that in some cases, Erlang has been overhyped and people come into it expecting it to magically solve all the hard problems. People that overhype Erlang do it a disservice. For example, the stupid "nine nines" thing was an almost once-off, limited situation in a very constrained environment and I am sure many Erlangists wish it had never been mentioned.
And I wish nobody had ever said the words "let it crash", because they have been misunderstood and taken out of context and again done Erlang a disservice. Just like the phrase "premature optimization is the root of all evil" has caused untold damage when it was (and is often) taken out of context.
The whole "let it crash" thing was shorthand for "if a process cannot reasonably handle and error condition itself, let it die and be reincarnated by the supervisory process". It does not mean "just let everything crash". For example, if you have a system that has long-lived connections, maybe (say) http/2 connections, or XMPP, or Apple push, you most definitely do not want to let it crash without doing everything possible to recover from errors such that the connection stays up under as many error conditions as possible.
But for the most part, a process should not be written defensively and try to recover from errors where it has no business doing so - it should leave it to the supervisor.
I have written Erlang systems, heavily used in production that ran reliably, without crashing, for many months. Usually they were only stopped to do an OS upgrade. They weren't error-free - we found out about some persistent errors by checking the logs occasionally, then fix them and often hot-patch the fix into the production systems.
Again, Erlang is not the world's best programming language or ecosystem. It is, within its design envelope, one of the finest soft real-time distributed multiprocessing environments around when used, like anything, with skill and careful architectural design.
If you use Java I'm assuming you are not using process. If so, I believe Phoenix is no worse than Java applications in terms of ease of use and scalability etc.
And it in terms of performance it indeed require some design experience on Erlang process. But it shouldn't be bad. After all WhatsApp and Discord have run on top of it with very few engineers to support massive customers before. I doubt online charging system could be more chatty.
Although not in Erlang, Discord uses Elixir which runs on the BEAM VM to much success as well for our real time distributed system as well as VOIP signaling components. Looking back years later - it’s safe to say we wouldn’t have chosen anything else.
Not OP, but have used Elixir pretty extensively; the major selling points for me are the friendlier syntax and more active community support, plus with rebar3 + Elixir's seamless Erlang interop you really don't give up anything, you can use any existing Erlang/Elixir libraries with a single syntax. It's pretty great.
One of the big differences between Erlang and Elixir is indeed that Elixir allows you to write macros in the language itself, using quote / unquote constructs just like lisps.
This makes it easy to generate code on compile time, which is then executed in runtime without any performance penalty.
A large part of Elixir itself is actually implemented as macros. For instance, the "unless" construct:
defmacro unless(condition, do: do_clause, else: else_clause) do
quote do
if(unquote(condition), do: unquote(else_clause), else: unquote(do_clause))
end
end
Your question is very vague. What exactly do you mean by "the same results"? And which "alternative tools"? How do those tools provide "an alternative"?
I don't think you're getting downvoted because you're not praising a piece of technology; you're getting downvoted because your tone seems rude, arrogant, and like you're on a witch-hunt rather than an honest inquiry. While tech fan-clubs do exist on HN, in my experience they respond quite constructively to reasoned criticism--if it's not phrased in an unkind way.
I'm not for or against Erlang. While honestly attempting to learn something new today, I became frustrated with how devoid of content most of the comments in this thread are. "I'm {senior title}, and {tech name} is great" is as far as most of the comments go. That's just extremely mediocre.
A comment section isn't the place to look for detailed content so why get frustrated about the brevity here? There are plenty of places to deep dive in the internet.
Having worked extensively with both Erlang and Elixir in the last 7 years, the main advantage for me has always been the smaller code base that those projects tend to produce when compared to other languages. Especially for projects that have some long-running components to them the Erlang built-in features allow writing really short yet fault resistant systems.
The Supervisor system, the per process garbage collection (no stop the world), 'start_link', "Let it crash", and pattern matching are for me the superpowers of beam. Elixir is contributing a great package ecosystem, 'mix format', and better features for the new live web of websockets
At least some of those examples are highly misleading:
> There is no better example of Erlang’s reliability than the English National Health Service (NHS). [...] Using Riak (written in Erlang), the NHS has managed 99.999% availability for over five years.
From what I could find out, NHS's Spine2 is mostly Python, with "a bit" of Erlang and Javascript. They do use Riak, but they also use Redis, RabbitMQ, Python/Tornado, Ubuntu and lots of other components.
This sentence could have said "Using redis (written in C)", or "Using Tornado (written in Python)", and still be fully correct.
> Almost all of the bespoke code is written in Python, but the message-passing architecture of the system and the approach to managing availability was directly inspired by reading about Erlang (in particular your book).
This approach is pretty interesting - I’d love to read about implementing Erlang-style availability in other languages.
The claim relates specifically to the spine 2 project, and is as measured over the 5 years since it went live. There are a lot of systems which depend on Spine 2 for their own availability, it is not the sort of thing that can go down at any time without a lot of people noticing.
I'm biased in that I've been heavily involved in the project, but I will stand by the five years of five nines availability claim.
The lead engineer/designer on the project makes this claim in these slides. I don't know whether this gives the claim more veracity in your eyes, but it's there.
I have never programmed Erlang but it feels like it is the currently only language that is some kind of secret weapon. It has similar aura than Lisp had before that with Erlang you can do stuff beyond "normal" languages.
> I have never programmed Erlang but it feels like it is the currently only language that is some kind of secret weapon. It has similar aura than Lisp
It has a good marketable brand and 'appearance' for sure. A good Aura and level 3 magicks, but once you start leveling up your spell tree, like other comments in the thread have alluded to, the cracks start to show and you bleed out mana trying to scale what you initially thought would be an effortless process.
A relational/object hybrid data model that is suitable for telecommunications applications.
A DBMS query language, Query List Comprehension (QLC) as an add-on library.
Persistence. Tables can be coherently kept on disc and in the main memory.
Replication. Tables can be replicated at several nodes.
Atomic transactions. A series of table manipulation operations can be grouped into a single atomic transaction.
Location transparency. Programs can be written without knowledge of the actual data location.
Extremely fast real-time data searches.
Schema manipulation routines. The DBMS can be reconfigured at runtime without stopping the system.
Yeah mnesia is extremely clever and very fun to learn. Also being able to store basically any term without having to do conversions is very useful.
I've found some practical aspects a bit painful though. It took me a while to figure out how to make backups, migrate schemas to different sets of nodes, that sort of thing. Also there are size limits on disk-based tables which are a bit limiting, and while you can use table fragmentation to get around that to some extent it doesn't seem straightforward to use (I haven't dared so far). I also don't like the way it deals with netsplits - when nodes reconnect it tends to require manual action to resolve.
'I also don't like the way it deals with netsplits'
I actually prefer it. Too often systems out there don't tell you what they do in the event of a netsplit. They may have marketing copy somewhere that tells you what they try to do, but then you see the Jepsen tests and realize that's a lie, or is naive, or whatever.
Mnesia makes no secrets about it. In the event of a partition it stays partitioned, operating as separate nodes. It's up to you to figure out what to do about that. You can grab a Raft implementation to perform leader election, with the tradeoffs that entails. You want it to self-heal and deconflict based on arbitrary logic, for an eventually consistent system? You can do that too. You want to pretend it doesn't happen, stick your fingers in your ears, "LALALALA" until it does and requires manual intervention? That's fine too! What it DOESN'T do is give you a false sense of security while handwaving away the decisions and implementation concerns that were made to determine CAP behavior, which I find pretty much every other distributed data store does.
It definitely has pain points in learning it, and it also has some very definite limitations, but as a baked in, minimally biased distributed store, it's a battery I really loved having included.
Erlang is highly optimized to build reliable network communication systems (application-level routers, message brokers, stream aggregators, perhaps MMO game engines, chat backends, etc.) The secret weapons are actor model and process separation (like microservice architecture, but within a single app), supervision (isolated processes are restarted on failure, there is no difference between known "application" errors and exceptions), and beyond state of the art protocol parsing and construction facilities, for both binary and text protocols.
For some tasks (scientific computing, UIs, offline analytics) Erlang might not be the best choice. However, within its domain of applications, it really shines.
Its strengths lie in its concurrency, simple message passing, distributed design, and robust VM.
The really interesting part is message passing and easy peasy baked in communication that allows to to send messages to other erlang nodes. So building applications that talk to each other over networks is just a normal function of the language and not a higher level dependency such as a library.
The concurrency is also pretty neat as erlang treats all threads of execution as an erlang process. So merging that with the distribution and simple message passing you can easily build stuff that scales without much external tooling.
It's almost an OS unto itself. There are two interesting IoT systems which use it to such length such as GRiSP which is erlang beam ported to bare metal using RTEMS and Nerves which runs beam directly on top of a Linux kernel making beam the serspace.
Tons of language are secret weapons if you use it for what they were created.
Here are some more outside of Erlang, that I know of and I'm sure there's more.
APL/J/K - mathematical algorithms
Prolog - logic, tons of business rules
Forth/Lisp - bottom up programming, when you have an idea of the primitives you need to tackle a problem, but not exactly sure how to put it together.
Assembly - When you must absolutely run impossibly fast especially on very small CPUs
AWK/Perl - slice and dice text files
For these languages, I don't substitute for any language. I'll never slice and dice with Java, Python, Go. I don't care. Awk or Perl. I'll never implement tons of rules in any other language, I don't care what logic library they implement without first doing so in Prolog.
For the solo programmer, the above languages hold very true. I have played around with Erlang since Prolog influenced it's syntax, but I'm yet to have to build a large scale fault tolerant system that needs it, but the knowledge is tucked away at the back of my memory should I ever.
I feel that Python, Javascript and Go can also be some sort of secret weapon if used in the right place.
My understanding is that is has an advantage in concurrency domain. It also has a relatively unique 'hot patch' support but i'm not sure how valuable that actually is.
And then you've got everything. As long as formats are reasonably documented, it's easy to parse them. Sometimes, you can even parse a size and use the size in the same match.
More often remarked; because of the language constraints, most notably a lack of shared memory between processes and immutable variables, most programming ideas end up expressed in a way that's amenable to massive concurrency, while being comprehensible. Most Actors have easy to understand behavior --- they may accept messages, leading to them sending messages and changing their internal state. From there, you may need to puzzle out the overall system behavior, but often times, getting each individual Actor's behavior correct, leads to correct (if hard to verify) system behavior.
Hot loading code reduces deployment time, which increases developer productivity. When you've got a million users connected to a machine, it takes a lot of time to move them to other machines so you can do a traditional stop / start cycle; hot loading means you can fix things in seconds (and, of course, it means you can break things in seconds too). You can, of course, hotload in C, and probably other languages, but very few people do it. It's comparable to pushing PHP files though.
And, of course, the most important thing is ejabberd is in Erlang, and it looks like it has what we need for a chat server, and I heard some other people scaled it really far. ;)
I’d not consider relup and module loading the same feature. It is rare to see relup (though I have used it, it’s tricky to make work) but hot code loading of modules was something we used extensively both during development but also in production to test out patches for problems. It was super easy to use and never a big problem.
Keep in mind that we rolled out the patch incrementally to one or two machines and then to more. If it looked good we’d roll it out to the rest of the cluster. From there we’d make note of the result and get the change ported into a proper release.
Each time I hear stories that both clustering and hot upgrades, module loading, etc are hard or rarely used I wonder if the person is just repeating someone else’s rumor. They’re great features and work fine if you do a little homework (just like learning anything, don’t treat it like magic).
This was from my time at Cloudant/IBM, though its far from the only case I’ve seen. We ran over 1000 machines this way with some clusters growing to more than 200 nodes using distributed Erlang (something I keep hearing is hard or impossible, it’s not).
Not sure if you're asking, but I'm not repeating someone else's rumor. I started the set-up I work with using hot loading (after all that is one of the things that attracts to Erlang) where I mean just reloading modules on the VM. Unfortunately I would get module order wrong from time to time or I would forget dependencies, and I've seen many badfun errors as well (I admit I used too many anonymous functions). With only 1 node doing the work (I haven't used Erlang on large clusters) that's a bit more risk and those problems have gone away completely since I switched to a strategy where the whole system just stops and starts.
I see. Fair enough. It's definitely something I know can work but like most things it does take work and sometimes it's not worth the trade in effort for the capabilities one might gain otherwise.
Badfun errors are a great example of something tricky if you're not used to thinking about how the compiler translates these to private functions via lambda lifting. It's the same reason funs are not something you should rush to pass around over a disterl cluster. Recursive functions which keep fun's around in a loop also become a problem if they're long lived.
I usually point out that first class modules or MFA tuples are more idiomatic in Erlang than opaque first class functions as values but we're off into the weeds here. It's a good example of where effort and gotchas become a barrier for many.
It’s rare in production, but fantastic in development. Often I don’t even restart my Elixir app when switching between git branches, because the hot reload is enough.
Mainly because the relup functionality is a bit complicated and not always clear, Learn You Some Erlang explains some context [1]. It does depend a bit on the situation, in some cases it's easy to recompile a module in a running system and it might not be much of a problem. But when there are multiple modules being updated, which depend on each other, which possibly require updates to some data structures, it becomes very messy very quickly. Also when modules contain anonymous functions this can cause some issues (badfun errors) because the old version might get lost, but can still have references in the running system.
> hopefully it's not just because of the ease of containerization/docker that is leading to reduced hot loading
It's because it's not super easy to properly architect in, it's hard to test, and it requires supporting running multiple versions of the code concurrently (and the ability to migrate data on the fly).
>You can absolutely do anything you can do in Erlang in another language. Some tasks will be easier, or harder, however.
Lots of languages have actor systems. But how many have preemptive multi-tasking on these actor systems? I can't think of any at the moment. I am well versed in Akka for Scala and the lack of preemptive multi-tasking for actors is a big pain in the ass. Erlang's advanced BEAM and support for this is it's main selling point, imo.
Erlang's advantage is being able to build a featureful, secure, scalable and performant enough communication backend in very little time and effort. This is a niche in which it can not be beaten.
You have a very naive view of large-scale distributed systems. For companies that truly need them, "time" and "effort" are never a consideration. Furthermore, the challenges facing these companies in battling project delays and cost overruns are the 100% organizational and political, not technical.
Being able to build these systems with significantly less time and effort makes them more approachable for smaller teams and companies, where time and effort can be more important than organizational politics.
> For companies that truly need them, "time" and "effort" are never a consideration.
I'm not sure that's true. Sure, a few of the most famous software companies in the world have built their own at considerable cost. But numerous other huge, important companies are dependent on large-scale distributed systems whose major drawbacks (reliability and maximum scale, usually) and major benefits (simplicity, time/resources saved on not having to hire tons of specialists, quick development time) are based precisely in the time and effort constraints under which those systems were developed.
a) Erlang (as opposed to something else) is only useful for large distributed systems.
b) If your organization is at the scale where you need a large distributed system, then the problems in your project aren't related to code or to coding speed.
a) Every programming language is a "very productive language" that "works great". Nobody ever thought to themselves "let's make a programming language that's hard to use and wastes time". (And even if they did, they would have never convinced other people to use it.)
b) No. Large distributed systems is not something you stumble onto, it's a result of many years of organic growth.
Sorry but it seems you have a naive view of large companies :-) They are giant spendthrifts, are bogged down in tons of bureaucracy, and overload developers at 1.5–2x allocation. With these conditions, you'll be lucky if you can get the compute capacity you need to run critical services. Oh, and also if some architect doesn't come around and instruct you to write everything in NodeJS 'because we can backfill NodeJS devs easily'.
Sure if you truly need something you will build it no matter what time or effort is involved, so in this way it is not a consideration. On the other hand if you can build it in half the time or cost you might find that some considering will be done as to which path should be chosen.
Time/effort/security/performance, really? IIRC the reasons to consider Erlang is purely to do with building a fault tolerant highly distributed system.
Not sure what you mean by introspection in real time ... but if you mean runtime introspection this is not that much of a niche feature. If you mean remote debugging, again not that much of a niche feature. JVM and Javascript provides both.
The fact that the entire ecosystem is based around the actor model is one of the primary ones. It's allowed for code that isn't simply a library, but effectively an entire application, to be used as open source to be dumped into your system, because it has tools like Erlang Term Storage, Mnesia, and other devices to make it easy.
Reductions. How do you ensure in other languages (or in other VMs) that every actor will get the same cpu time and that none can block the others? That's a tricky problem to solve as a lib.
The story of freebsd & erlang "needing" to be patched seems to be greatly exaggerated. Especially when it turns out that elixir/phoenix also achieved the same "2 million connections on single server" without needing those optimizations.
> The story of freebsd & erlang "needing" to be patched seems to be greatly exaggerated.
The story is greatly underreported and all focus is only on "they run Whatsapp on Erlang with just ~50 engineers".
Highscalability lists just some of the patches and optimisations they have here [1] and here [2]
Here's an incomplete list of patches only. There's also tuning and optimisation:
Erlang: Fixed head-of-line blocking in async file IO by patching BEAM, added round-robin scheduling for async file IO, added multiple instrumentation patches. Instrumented scheduler to get utilization information, statistics for message queues, number of sleeps, send rates, message counts, etc. Made lock counting work for larger async thread counts. Patched to dial down spin counts so the scheduler wouldn’t spin.
BSD: Backported a TSE time counter. Backported igp network driver.
More Mnesia (Erlang) patches discussed here: [3]
Are you ready to do this for your Whatsapp?
> elixir/phoenix also achieved the same "2 million connections on single server" without needing those optimizations.
There's more needed to run a chat server than just "2 million empty connections".
Note how I said it was an incomplete list of patches only.
There's also signinficant tuning and optimisation, both for the Erlang VM and FreeBSD.
There also things like (quotes from Highscalability):
"Mnesia: Using no transactions, but with remote replication ran into a backlog. Parallelized replication for each table to increase throughput."
"When Rick is going through all the changes that he made to get to 2 million connections a server it was mind numbing. Notice the immense amount of work that went into writing tools, running tests, backporting code, adding gobs of instrumentation to nearly every level of the stack, tuning the system, looking at traces, mucking with very low level details and just trying to understand everything. That’s what it takes to remove the bottlenecks in order to increase performance and scalability to extreme levels."
Or even the things like "What has hundreds of nodes, thousands of cores, hundreds of terabytes of RAM? The Erlang/FreeBSD-based server infrastructure at WhatsApp". Oh, wait. Erlang's default distribution mechanism grinds to a halt when there are more than ~60-80 nodes. And Mnesia has a 2GB limit on table sizes. So you have to work around those limitations yourself.
There are no magic bullets. Erlang will only take you so far. The rest (80-90% of the way) you have to take on your own, and you have to know what you're doing, and what needs to be done: patches, tuning, workarounds, limits of the systems you work with etc.
> Erlang's default distribution mechanism grinds to a halt when there are more than ~60-80 nodes. And Mnesia has a 2GB limit on table sizes. So you have to work around those limitations yourself.
I've seen people say these, and I have no idea where they come from. If you have a decent network, dist works fine at well over 80 nodes, but everyone says it doesn't work. pg2/global has some sharp edges if you're trying to have many nodes acquire the same global lock when you have a lot of nodes (a few hundred) or a smaller number if you have a lot of latency between them. There's options though -- maybe you don't need to acquire the same lock on all nodes, or maybe you can look in pg2.erl and global.erl and wiggle the locking code until it no longer live locks.
The Mnesia supposed 2GB limit is a bunch of hooey. Yes, disc_only_tables has (or had) that limit, because dets has that limit. Yes, it's a sharp edge, because there's no warning about it. However, a 2GB dets table is awful to work with anyway. You want to use disc_copies or ram_copies for big tables. Also, mnesia_frag is well supported, so if you really wanted to, you could make your disc_only_copies table 1024 fragments, and have 2 TB of dets, if that's how you wanted to role.
And yes, if you're going to hyperscale, you're going to need a couple people who know how to figure out what your system is doing. Is there a language/environment where that's not true?
I claim, without real proof, that Erlang's BEAM VM and OTP standard library are easier to understand and tweak when you do hit problems. You'll note however, that Rick Reed's first presentation was when he had been at WhatsApp for about a year, and he had zero experience with Erlang before that.
That’s a bit misleading. Many of their patches were needed and had been contributed upstream by the time Phoenix was being tested as such. Still, this is a great benefit of sharing the ecosystem as such as everyone gets to benefit from this work.
Awesome write-up, Francesco! To add to the success stories, our team's previous startup (Bugsense) was a huge user of Erlang, which allowed to scale to hundreds of thousands of concurrent devices with just a handful of servers [1]. It was a no brainer that we would now use Elixir for our current startup (AgentRisk [2,3]) and we wouldn't have been happier with our choice, especially paired with Phoenix, which is hands-down the best web framework we've ever worked with.
Fairly blind question: does Erlang|Elixir and BEAM make any sense in a pseudo-embedded, edge compute or IoT environment with constrained resources (CPU and memory), or even ARM architecture? Aside from the developer-side things, if AMQP were to the centralized communications hub (e.g. RabbitMQ) does using Nerves on devices make any performance benefit - meaning BEAM and app(s) there?
The intent would be pre-process data and stream inward, or simply be a command-and-control interface -> process and provide response.
Yep, know Nerves. Like the toolset and the model. Really asking about the underlying Erlang/Elixir that Nerves provides. The delivery flow (think image to SD card) fits the more tradition embedded/realtime model - versus OS, and installing app.
Well a Nerves image is a Unix Kernel + Erlang/Elixir & BEAM and whatever else you need, so its a completely stripped down image. It uses the Whitelist model so you add what you need instead of removing what you don't need. This is both more secure and more performant since you don't have 100's of apps/services running in the background, you only run what you need.
You can control everything remotely with nerves-hub, push update, debug etc.
Remember you only have to burn the image once, then its all update from here.
You can build a poncho application where everything is side-by-side, lets say you have your app-firmware (you build image from this), then you build separate app-ui, app-logic, app-whatever etc. and simply hard code the path as dependencies in app-firmware and each app has their own configuration.
Everything will be bundled into a single image and protects you from things leaking in from the sides rather than from above.
They make the most sense if you want lots and lots of simultaneous clients. Plus it gives you a little bit of safety if you write crappy code.
Typically not suited for embedded.
(Assuming you mean embedded as in severely limited in resouces, and not "modern embedded" where you have lots and lots of unused system resources. If you mean the latter then anything goes.)
It’s nice in that the actor model provides good concurrency. One great benefit is that each "device" or connection essentially runs as a single sequential process. That makes data stream processing pretty nice on embedded. Like processing serial port data. Though it sounds like you almost just want a job queue, which would be pretty easy but not really benefit from actor model.
One example where a Nerves setup would be great is say a redundant onsite mini-cluster for processing and redundancy. As an alternative to say a k8s setup to manage devices with Nerves you could readily flash your "app code" to multiple RPi’s, connect them on a lan, and not have to have a separate clustering layer.
For sure, serial data is a factor. I need to dig into the support of GPIO and other serial interfaces (e.g. USB) - but that's a great point. The image distro model is really cool, and need to evaluate that further as well.
Drop by the Nerves channel on the Elixir-Lang Slack. It’s pretty active and people are pretty helpful.
PS: I enjoy writing streaming code in Elixir (vs more procedural or OO methods). This is a snippet I use to decode a SLIP encoded binary UART stream with an CRC check:
Stream.repeatedly(fn -> receive_data_packet(timeout) end)
|> Stream.transform(<<@frame_end, @frame_end>>, &frame_splitter(&1, &2, {separator, max_buffer}))
|> Stream.map(&decode_slip(&1))
|> Stream.map(&frame_header(&1))
|> Stream.reject(&( &1[:code] == -1))
|> Stream.each(fn x -> if !x[:crc_check] do Logger.error("parser crc error: #{inspect x}") end end)
|> Stream.reject(&( &1[:crc_check] == false))
...
worked with Erlang lately in a side project, I think the actor model is really cool (and the FP style), but the error messages are not helpful at all (at least for a beginner), I wonder how other people manage.
Elixir gives you much more friendly error messages at runtime (I believe part of this feature was contributed to the Erlang VM, so improved it for all languages):
Runtime error messages or compilation error messages? I assume compilation. This is true. Compilation messages are bare bones.
But on the other hand the language compared to other modern languages is tiny and figuring out why something does not compile is a non-issue beyond an absolute beginner.
More often than not after I did figure out what the error was, looking back at the initial error message it told me exactly what I should have looked at to begin with.
So, if you can read them right, they are pretty good.
I use Elixir in production and Erlang in side projects.
I'm using Erlang for a production system at work and I completely agree, the error messages tend to be terrible. Once you've run into many errors you start recognizing patterns though (part of the problem is that the errors are not English sentences but data dumps with indicators like `badfun`, these you just have to learn to interpret) and Erlang makes up for the bad error messages in other ways. But yes, often I end up reasoning through the code to solve the problem, with the error message only roughly pointing me in the right direction, to a larger extent than in other languages.
I have been building grouper.ai to integrate voice ai systems with 2600hz's kazoo apis, and have enjoyed doing it as a Greenfield side project. Nerves for hardware on embedded Linux, c/c++ for the voice apis, ports in erlang/elixir to talk the low level calls, using elixir Phoenix in the cloud on the beam. Have used erlang also in day job for custom industrial M2M protocol translation (serial / Ethernet) very successfully also.
I have seen that Erlang is used in RabbitMQ, if I recall right their argument was that Erlang is particularily well suited for that kind of asynchonous message queue stuff. I donlt know all too much about Erlang, so I cannot judge if that claim makes sense.
I'm an Erlang novice, so take this with a grain of salt: Erlang's use in RMQ seems to have helped them immensely through the first few years of the system's life. They got an "all in one box" performant message broker off the ground, and relatively reliable, quite quickly.
However, RMQ did hit numerous issues that they had to work around on the Erlang platform. They had to engineer what was effectively their own scheduler on top of parts of the native Erlang one in order to prevent some starvation cases [1], and I have also heard that issues with Mnesia are at the root of some of the pathological behaviors (data loss) of some versions of RabbitMQ when restarting from a crash (this is admittedly anecdotal, so take it with a grain of salt).
My only exposure to Erlang has been through RabbitMQ and while I can't comment much on the language itself, managing the packages is a dependency nightmare. Like some twisted, mutant, supervillain version of Python, a given version of RabbitMQ will only work with a certain version of Erlang (with horrible, hard-to-understand error messages on incorrect versions), and on RedHat/CentOS, the Erlang packages all misadvertise what exact versions they supply, necessitating awkward forcing of the dependencies or else Yum screws everything up.
We're happily plugging along on KAZOO [0] since 2010 at 2600Hz. Erlang is a force-multiplier for us. Approaching 300K LOC and ~10 Erlang engineers (depends if Karl is writing code or CTO-ing that day) today.
Something I find interesting is that the tech stack is often chosen by non-engineers/founders. So languages and frameworks are marketed to business people. Freelance work is often weak on the requirements, but they have strong opinions on what language and framework to use.
That's often because most clients will already have some sort of environment set up, running some OS and with certain features installed and most importantly: they don't want to have 5 different projects working on 5 different frameworks because that'd be a nightmare to maintain. Once a company gets into a tech stack, they'll want to stay within it as much as possible and that makes plenty of sense.
You dont want to put all your eggs in the same basket, not in this age when a framework can become obsolete in just two years. It only make sense if you already have a small team - you want to use what they know best. But if you are growing you can have many small teams. You will also exhaust the talent supply if you lock yourself down to only one platform/language/framework.
If you really want to find yourself digging into a lot of different features in the Elixir ecosystem, you can’t go wrong with collaborative tools. If you really want to play with the latest hotness in Elixir, and dig into a challenge or two, I’d suggest building a light multi-user blogging system that leverages Phoenix LiveView for real-time Markdown->HTML previews, while allowing simultaneous editing/previewing by two users. Put a DB behind it so you can also play with Ecto. Outside of the 3 lines of JS you need to wire up LiveView, limit yourself to writing 0 additional lines of JS. Do everything in Elixir and eex/leex templates only. Since you’re focused on learning, skip dealing with user auth stuff. There will be some challenge to deal with handling multiple edits at the same time. May as well play with doing DB persistence in your LiveView, without any refresh. :)
Online gaming can be a good option—if building games is your thing. Personally, I find game-based demos/exercises with certain tools to be a bit too far removed from my daily work to help certain concepts really take root.
I'd use this if someone made a good one! I'm currently using Transmission on Mac, which has more than its own fair share of issues, most notably the UI freezes in any action that takes time, like moving a file across hard disks or to a network drive, but IIRC all the others I have tried (Deluge, qBittorrent) also have multiple similar issues.
That system was the backend of the functionality currently known as Facebook Messenger, not WhatsApp.
One might note, snidely [1], that three-four years after freezing development of an Erlang-based messaging system, starving maintenance work on that system of engineering effort, devoting massive engineering resources to a from-scratch C++ rewrite, and in the meantime blaming the language for relatively minor system design issues that could have been improved with a fraction of the effort (including, but not limited to, Erlang's ability to wrap allegedly critical C++ components) … Facebook plowed $19B into the acquisition of an Erlang-based messaging system.
[1] as a main author of the Facebook Chat version written in Erlang
> after freezing development of an Erlang-based messaging system...
> Facebook plowed $19B into the acquisition of an Erlang-based messaging system.
They didn't care what it was written in. They would've spent it on Whatsapp even it was written in PHP (like Slack). They paid for users and market penetration.
At that scale you are generally modifying your entire stack or building it from scratch no mater what you are using. Also they were on a pretty old version and a number of their fixes are in Erlang itself now.
WhatsApp engineers have been quite clear that using Erlang was a great choice.
WhatsApp did modify Erlang but it's interesting that Elixir/Phoenix was able to hit the same scale on a single server without any such optimizations. [1]
> I wasn’t thinking we could actually get WhatsApp-like scale, because when I read about WhatsApp, they were using FreeBSD and they forked Erlang and made some optimizations, they fine-tuned FreeBSD… So I was thinking that it was gonna be very difficult to try to replicate that kind of scale.
> So we were doing extra work, and it was really fulfilling to actually see that with minor changes in our initial best-effort approach with just a few tweaks was able to go to something that was able to get millions of connections. That was incredibly fulfilling to come full circle, and also it’s a great brag slide now, of showing that two million connections chart.
> WhatsApp used to use Erlang, not anymore. Facebook rewrote it completely in C++.
Highly doubt it. Here is Maxim Fedorov talking about scaling their cluster to 10000 nodes in 2018 https://www.youtube.com/watch?v=FJQyv26tFZ8 it would be pretty ridiculous for them to completely rewrite that in C++ one year later.
I did not downvote but possibly because the idea of "good" and "better" programming languages does not mean much. Different tools have different purposes. If you want to write a network driver for the Linux kernel C++ is probably more suitable but if you quickly want to build a simple chat system Erlang is almost certainly going to make your life a lot easier.
I think it is because with Erlang you get Beam and it is the real treasure of the Erlang ecosystem. It not only manifest the philosophy of Erlang well but, more specifically, provides a battle tested run-time for concurrent applications that you cannot get with C++ as you have to (re-)write that concurrent run-time each time.
Exactly. I worked at a few smaller but established FinTech companies delivery Trading Systems to banks, hedge funds, etc that had built their own unique systems written in C++ for concurrent applications. These systems were battle tested but they were not open-sourced and anyway they could not easily be re-purposed.
Yes, it is battle-tested, and the testing result was the big failure: memory-related issues and vulnerabilities alone caused billions of dollars of damage, and on their way to cause more.
It is generally not a good idea at all to use C++ in network-facing applications.
The first maxim of software architecture: You Are Not Google.
(And even at Google most systems running C++ code are their core indexing and analytical systems, which are not directly related to their Internet-facing perimeter; there are some exceptions, of course).
That said, looking back at previous jobs there are many places -- particularly in ad-tech stuff -- where I used Java server side that I now think C++ would have been more appropriate (or these days, Rust). I wasted a lot of hours tuning for garbage collection that I'd love to have back.
And yes, there is a crapload of stuff that is internet facing that is C++.
Well, at some point it is a matter of personal preferences and tradeoffs. One time, I used to work on a high-frequency trading system written in Java... yes, in the environment with zero tolerance to GC latency. The core system was written with “off-heap Java” style, with memory blocks preallocated... and for the periphery everybody could use regular GC-enabled Java.
Could have written everything in C++, but everybody hated C++ so much they preferred heavily modified Java compilers and environments.
Afterwards, some parts of the core were rewritten in Rust... and there were no significant performance or other gains, so it was left as is.
Erlang is nice enough language and I don't mind the syntax but it's also kinda cumbersome and verbose at times. For example, adding `true -> ok` to every if statement gets old fast. Similarly, Erlang/OPT is a nice platform but some parts are fairly bad. Looking at you `inets/httpc`.
But the really big problem was the whole ecosystem. Build process was needlessly complicated. Surprisingly enough multicore support was not great. Performance was not that great. It seemed like a lot of libraries were abandoned after 2013. I could go on.
We ended up rewriting the whole thing in Java and, so far, it worked out great. And after Java 8, Java the language is not so bad. I still have somewhat found memories of Erlang but I don't miss it.