Hacker News new | past | comments | ask | show | jobs | submit login

When I say this out loud, it sounds like a stupid question, but I'm still curious...

Outside of the fact that different programming languages run on BEAM and the JVM, what's the difference between the two?

I believe BEAM ships with supervisors, which is why Elixir can utilize them... what else is different?




The BEAM is a preemptive VM. That means that no one long running process can hog resources.

The BEAM was built on a principal of responsiveness since it was originally designed for telephonics.

Here's an in depth article on preemption in the BEAM that I shared with a friend this morning https://hamidreza-s.github.io/erlang/scheduling/real-time/pr...


> The BEAM is a preemptive VM.

The BEAM Book shared by armitron disagrees: https://happi.github.io/theBeamBook/#_scheduling_non_preempt...

Though an Erlang programmer never sees this in practice, because compiler inserts yield points appropriately.


Maybe on the definition of "preemptive", but it should be treated as a preemptive VM in practice.


Nah. You can screw with the BEAM’s reduction-scheduling even in pure Erlang code: just write a really long function body consisting only of arithmetic expressions (picture an unrolled string hash function.) Since it contains no CALL or RET ops, the scheduler will never hit a yield point, even after going far “into the red” with the current process’s reduction budget.

You just never see this in real Erlang code, because who would code that way? If you want to be idiomatic, use recursion. If you want to be fast, use a NIF. Why ever use an unrolled loop body?

But it can happen, and therefore, the BEAM does not guarantee preemption, even in Erlang. Reduction scheduling isn’t a “kind of” preemptive scheduling. It’s its own thing.


I mean, like i said, it should be treated as a preemptive VM in practice. In reality its not, but for most practical cases, its easier to understand it terms of preemption.


As a developer, yes. As an ops person trying to figure out why your deployed release isn’t making its deadlines, no.


I want that.

I hate looking at spinners on my computer because some software is hung up here or there.

I am prototyping a "desktop" search application which hits a number of sources and interacts with them by their protocol (maybe it gets full hits from bing but just gets identifiers from pubmed and has to look up the identifiers)

This can never show a spinner but instead keep the UI live all the time, filling in results as soon as they become available.

I have been looking at asyncio in Python and it is an unholy mess. Just as Java was the first environment to be really threadsafe (eg. the found the memory model of C underlying it was screwed and implemented the first sound memory model) BEAM seems to be the first environment designed with responsiveness in mind.


The WinRT environment enforces non-blocking async.

Any operation in the library that can take time is async. Async is everywhere.

I've always worked on embedded UIs where we had a physical watchdog timer (an independent chip power cycles your main device if a keep alive function isn't called periodically) that was set to ~30ms.

In test, anything running longer than 30ms had to be broken up.

The entire system was written in C and C++.

The lesson is you can write responsive apps in any language, but the entire underlying stack needs to cooperate. IIRC WinRT's requirement for "async" is operations that take more than 1 second. I disagree with that, I'd set the limit at 100ms, but I guess that'd annoy programmers even more. :)


I think WinRT is a non-starter for two audiences.

(1) People who know windows will find it hard to switch to the async model (2) People who don't know windows will have access to a disorientingly large API


To be fair, Python asyncio is a really ugly interface for something that other languages (e.g. JS and others) have done a lot better. Maybe look at other languages than Python for async programming.


I spend a quarter of the day looking at laggy JS apps so I would rule that out.


A language providing good support for asynchronous programming does not mean that developers use those features well. ;)

And even if they do, that does not mean that the application as a whole performs well. You can write beautiful code using async/await that performs awfully, if you wait on the right things.


Nice, I'll check it out!


The big features for me are:

Threads are cheap enough that it makes sense to spawn a thread for a task, and write the task in a straight forward way -- you don't have to interact with the event loop, you don't have to do awkward continuations, you can often just do one thing at a time, and it's all straight forward. If you need to parallelize sub requests, that's not too bad either.

Asynchronous message passing is a super useful primitive.

"No shared state" means you need to think about problems in terms of getting requests to the right worker that has the state, instead of locking the right thing; that turns locking problems into queueing problems, and queuing problems are easier. There are ways to share state (ets), and if you're not careful, you can get into the same locking problems that are easy to get into with traditional shared memory threading environments.

Hot code loading is lovely. It's theoretically possible to do something similar in Java, but nobody (to my knowledge) actually does. It's so much more convenient to update the code than to drain servers and restart them. There's certainly an abundance of ways to ruin your day with hot loading, but it's magic when it works.


Supervisors in Erlang are mostly, if not entirely, built on top of BEAM, not something in BEAM itself. BEAM was written to provide the primitives the supervisors need, but it's just primitives, not "supervisors".

Beam is distinguished by being a bytecode interpreter that implements pre-emption at that level, the concept of lightweight threads it implements and the communication mechanisms enabled between them, particularly the "linking" mechanic that allows one process to easily watch another in various ways, and the services it provides to those processes, which can be seen by querying the metadata for the processes.


One of the best references on BEAM is here: https://happi.github.io/theBeamBook/

You can skim it in order to see the major differences with the JVM. For me, the fact that BEAM has out of the box m:n threading [lightweight processes] and that each process has its own heap makes it stand out compared to the JVM.


Aside from preemption, the other notable difference is process management. They are incredibly lightweight in terms of RAM and CPU usage. You can create millions of them because they do not map directly to kernel threads, and thus to not incur the overhead associated with them.

On the contrary, the JVM uses kernel threads (except for old obsolete versions of some JVM implementations which supported green threads which ran in userspace), which are much more expensive. Because of this the JVM offloads thread scheduling to the OS. BEAM handles process scheduling internally.


Lightweight meaning 0.5kb for a BEAM process vs 1024kb for a JVM thread. A goroutine in Go is 2kb for comparison sake.


> lightweight meaning 0.5kb for a BEAM process

It's quite a bit more than that. According to the offical doc[0] it's 309 words without SMP or HiPE, on my machine (with both) it's 336 words.

That's 1.2~1.3k on a 32b system and 2.4~2.6 on a 64b system.

[0] http://erlang.org/doc/efficiency_guide/processes.html


Production Erlang systems tend to take advantage of hibernating many of their “millions of processes”, though, since it’s a rare architecture where those million processes are all hot. If you look at the memory usage of such systems, it’s a lot less than that figure would imply, mainly because of process hibernation.


That is true, best-case scenario (a process storing no data whatsoever) hibernating can remove the 233 words minimal heap (and stack) preallocated, yielding 76 words without hipe or smp, or 103 with them, for 304 / 608 / 412 / 824 bytes (nothing/32, nothing/64, hipe+smp/32, hipe+smp/64).


I came across that number a couple of years ago when I was doing research for a presentation. It may have been from a book but I'll see if I can dig up the source.


As derefr notes, it might have been the size of a hibernating (rather than live) process. It's 0.3~0.4k on 32b systems, though 0.6~0.8 on 64b ones.


Except JVM is going to get fibers, co-routines and tail calls in a future release.

"Project Loom with Ron Pressler and Alan Bateman" at this week's JVMLS 2018

https://www.youtube.com/watch?v=J31o0ZMQEnI


Interesting. It seemed to be independent library before. May be shoving it inside JVM lead to more usage finally.


I have been out of Java programming for a while. Why/when were green threads deprecated?


They were considered a workaround for architectures that didn't have native multi-threading. I don't remember when they were dropped, but it was well before the Oracle acquisition.


The JVM ran all its green threads on a single CPU, even if multiple CPUs were present.

When multi core systems became more common, Sun faced the choice of overhauling the green threads implementation or going to us system threads. I don’t know why they made the choice, but it likely was a combination of time pressure (writing and tuning a good m:n green threads implementation is hard) and doubts about the usefulness of green threads on multicore systems.


Preemptive is pretty nice. You can have a for loop that runs forever and not take over.

It does processes nicely they're green thread iirc very fast to create.

Clustering is built into it with security.

It's built for soft real time so reliability is pretty nice much more battle tested than perhaps JVM.

It's not own by Oracle is a big deal for me.


>Clustering is built into it with security

Just to note, distributed erlang is NOT secure AT ALL. You absolutely need your own source of network isolation to keep it secure.

By default the EPMD daemon (that coordinates nodes in a cluster) listens on an open port and accepts node join requests if they contain a correct secret cookie (short authentication string). EPMD has zero protection against brute force attempts to guess a cookie. If your EPMD port is open to the internet it is trivial to gain access to your entire cluster and execute arbitrary code.

See: https://insinuator.net/2017/10/erlang-distribution-rce-and-a...


What's "soft" real time mean, vs .... "hard" real time? Real real time?


"Real time" is defined in terms of deadlines, a system is real-time if operations have specific deadlines, times they take to execute.

The level of real-timeness is what happens when an operation misses its deadline:

* In a hard real-time system, missing a deadline is total system failure. Deadlines tend to be tight, and system correctness or integrity may not be maintained in the face of a missed deadline. Rocket guidance systems for instance, if a process misses its deadline the rocket can veer off-course or blowup or abort or…; other examples of hard real-time system are engine controls or medical devices.

* In a firm real-time system, missed deadline leads to QoS degradation, and the result of the operation is ignored/discarded after a deadline miss. Manufacturing systems tend to be that, missing deadlines will usually ruin parts (possibly for some time afterwards), but the production can usually recover and continue.

* Soft real-time are a relaxation of firm, where the result of the operation is still valid after a deadline miss, but its value usually diminishes as the deadline recedes into the past, real-time audio transmission (phones & stuff) is usually there: you want your audio to be delivered "immediately" but can still use it if there's a small delay (missed deadline), however as delay increases delivering the data becomes less and less useful e.g. a few hundreds milliseconds delay on a phone line is annoying, a few minutes makes it useless.


This is a fantastic explanation of the differences with really good analogies. Thanks.


hard vs. soft real time is a matter of latency tolerance. human-facing systems (like telecom) need to be low latency from a human perspective, which is a higher threshold than what you could tolerate in a mechanical context where 100ms could mean damaging some expensive hardware.


The difference is if your system requirements tolerate missing a timing deadline.

Hard real-time: antilock brakes

Soft real-time: video game frame rate



The JVM is extremely good at dynamic compilation. I think BEAM only has very basic dynamic optimisation capabilities and I don't think BEAMJIT or HiPE were ever as successful as was hoped. But I'm not an expert in BEAM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: