I'm always surprised and delighted to see so many "first implementations" of languages in prolog. It was described to me in college as "just a novelty" but in fact it's used extensively in the JVM internals [1] and apparently is the starting-point impl for other languages.
But prolog still seems so..awkward. I wonder why langs like Haskell or OCaml aren't more de-facto for these purposes; they seem to have similar expressive power for parser/grammar things and with less inside-out paradigms (imho).
Still, we can inspect the general pattern that is generated as a result of the compilation, and therefore write for example test cases that subsume many possible concrete cases!
On a general note, one reason why Prolog is so popular for writing compilers is that the language has a built-in mechanism for writing parsers, and in fact describing all kinds of sequences (for example, of tokens, of instructions etc.) in a very convenient way.
For example, the above three clauses can be written equivalently as a so-called definite clause grammar (DCG):
This makes it very easy to see the key aspects of the code. At the same time, the description is so general that it can be used in all directions, to generate, parse and test instructions.
The "inside out" paradigm of Prolog is what it is good for. Try to write complex procedural programs and you will be driven to insanity by the cuts, using logical failure to express procedural success, etc.
The original sin of conventional programming languages is that we write a sequence of operations that happen one after another in time. Thus we find concurrency, dealing with asynchronous events, and reacting effectively to the environment difficult.
Prolog makes writing simple parsers simple. Certain kinds of search problems are easy to solve, and some kinds of operations (say involving transitive closure) can be defined in two or three lines of Prolog that hopefully you could implement in 50 lines in a more normal language after looking at an algorithms book.
There has been a huge amount of research and commercial implementation of things that are like Prolog but different, for instance having built-in satisfaction solving, combinatorical optimization, etc. Prolog is backwards-chaining (finds rules to justify a conclusion) but other engines are forward chaining like Drools, the Jena Rules Engine, IBM iLog.
I would love to see this approach get more popular.
There is a good reason for using Prolog-like language in a compiler-related project. Prolog was used for rapid prototyping of compilers since the end of 70s and it's still ahead of the time. And if you take a look at a code from old Warren article [1], for example, you'll see that even today almost nobody uses so high-level description of a compiler. The main power of Prolog is not in parsing. Prolog is unbeatable in a program analysis and transformations. See Datalog (you may find it even in a Dragon book nowadays!), see Stratego/XT, they all inspired by good old Prolog.
By the way, Prolog is also good in describing formal semantics of PLs. Good PL reference or a standard should include clean formal notation for semanics of the language, not some bureaucratic English text... Because it's already 2018, we need it for compiler developers and for deeper understanding of a language.
> in fact it's used extensively in the JVM internals
I've not seen Prolog used in the JVM internals myself - it's just being used in this document as an inference notation to explain things as far as I know. Unless you know specifically where it's used, which would be interesting?
Well, OCaml is indeed rather popular for implementing compilers for new languages (sometimes later bootstrapped), including ones which became rather popular. I can think of F#, Rust, Elm, Haxe, FlowType for instance.
There's also a project for implementing a prolog like interpreter for doing type checking and other logic for Rust, I think it works on the MIR level, but could be wrong on that.
Having some experience with both languages (though not professionally), I personally much prefer Prolog to Haskell. There's an initial hurdle with Prolog because it uses such an unfamiliar paradigm, but once you get good with it, programming in Prolog is pure pleasure for me.
I like Haskell just fine, it's an excellent language, but my heart belongs to Prolog.
Probably because they predate haskell. In the case of Java, thats not true, but Haskell would have only been 5 years old at the time, and was largely academic.
Prolog and Erlang in the same thread - is it Christmas already? Two of my favourite languages, both hugely undervalued.
I remember the moment I first "got" Prolog. Having only programmed in C/C++ to that point, it took some mental contortion. But the "oh!" moment was amazing. Backward chaining & logical variables are just so incredibly elegant. I still use transitive closure as an example of just how powerful the language is:
path(A, B) :-
link(A,B).
path(A, B) :-
link(A,I), path(I,B).
That's it. Stunningly simple.
As for Erlang - as others have noted - it was conceived right from the start as a language for concurrency. It's had 20 years to mature and robustly solve problems that inherently single-threaded languages have barely even recognised yet (C/C++/Java/C#/python/javascript/rust/...). Supervisors / error handling and threading are two notable examples. It makes me sad to look at the spread of "async". An inelegant sticking plaster that adds yet another concurrency construct and turns it into a tactical decision - on every method - and at the call site too.
Erlang shows that a single, well-conceived and executed construct is both possible and preferable. Lovely.
Async is terrible as a primitive, but as sugar on top of real concurrency, it's not terrible. I use Task.Supervised.async() all the time in elixir, with confidence that I have access to all the parts that I might need for diaster recovery if I should need it later.
You might be interested in the idea of Web Prolog, a language being written by Torbjorn Lager. It aims to provide similar concurrency features as Erlang, but retaining the logic variables, backtracking and unification etc provided by standard prolog.
> In computer science, the concept of transitive closure can be thought of as constructing a data structure that makes it possible to answer reachability questions. That is, can one get from node a to node d in one or more hops? A binary relation tells you only that node a is connected to node b, and that node b is connected to node c, etc.
That snippet defines a function path(A, B) by specifying two cases:
1. a link(A, B) exists
-or-
2. some element I exists, such that a link(A, I) exists and path(I, B) exists.
The TL;DR is: reliability was the main goal, and using isolated processes which can react to each others' failures, even when running on separate machines, was the way they found to get reliability. This turned out to be nice for scalability also.
Nah. You can screw with the BEAM’s reduction-scheduling even in pure Erlang code: just write a really long function body consisting only of arithmetic expressions (picture an unrolled string hash function.) Since it contains no CALL or RET ops, the scheduler will never hit a yield point, even after going far “into the red” with the current process’s reduction budget.
You just never see this in real Erlang code, because who would code that way? If you want to be idiomatic, use recursion. If you want to be fast, use a NIF. Why ever use an unrolled loop body?
But it can happen, and therefore, the BEAM does not guarantee preemption, even in Erlang. Reduction scheduling isn’t a “kind of” preemptive scheduling. It’s its own thing.
I mean, like i said, it should be treated as a preemptive VM in practice. In reality its not, but for most practical cases, its easier to understand it terms of preemption.
I hate looking at spinners on my computer because some software is hung up here or there.
I am prototyping a "desktop" search application which hits a number of sources and interacts with them by their protocol (maybe it gets full hits from bing but just gets identifiers from pubmed and has to look up the identifiers)
This can never show a spinner but instead keep the UI live all the time, filling in results as soon as they become available.
I have been looking at asyncio in Python and it is an unholy mess. Just as Java was the first environment to be really threadsafe (eg. the found the memory model of C underlying it was screwed and implemented the first sound memory model) BEAM seems to be the first environment designed with responsiveness in mind.
The WinRT environment enforces non-blocking async.
Any operation in the library that can take time is async. Async is everywhere.
I've always worked on embedded UIs where we had a physical watchdog timer (an independent chip power cycles your main device if a keep alive function isn't called periodically) that was set to ~30ms.
In test, anything running longer than 30ms had to be broken up.
The entire system was written in C and C++.
The lesson is you can write responsive apps in any language, but the entire underlying stack needs to cooperate. IIRC WinRT's requirement for "async" is operations that take more than 1 second. I disagree with that, I'd set the limit at 100ms, but I guess that'd annoy programmers even more. :)
(1) People who know windows will find it hard to switch to the async model
(2) People who don't know windows will have access to a disorientingly large API
To be fair, Python asyncio is a really ugly interface for something that other languages (e.g. JS and others) have done a lot better. Maybe look at other languages than Python for async programming.
A language providing good support for asynchronous programming does not mean that developers use those features well. ;)
And even if they do, that does not mean that the application as a whole performs well. You can write beautiful code using async/await that performs awfully, if you wait on the right things.
Threads are cheap enough that it makes sense to spawn a thread for a task, and write the task in a straight forward way -- you don't have to interact with the event loop, you don't have to do awkward continuations, you can often just do one thing at a time, and it's all straight forward. If you need to parallelize sub requests, that's not too bad either.
Asynchronous message passing is a super useful primitive.
"No shared state" means you need to think about problems in terms of getting requests to the right worker that has the state, instead of locking the right thing; that turns locking problems into queueing problems, and queuing problems are easier. There are ways to share state (ets), and if you're not careful, you can get into the same locking problems that are easy to get into with traditional shared memory threading environments.
Hot code loading is lovely. It's theoretically possible to do something similar in Java, but nobody (to my knowledge) actually does. It's so much more convenient to update the code than to drain servers and restart them. There's certainly an abundance of ways to ruin your day with hot loading, but it's magic when it works.
Supervisors in Erlang are mostly, if not entirely, built on top of BEAM, not something in BEAM itself. BEAM was written to provide the primitives the supervisors need, but it's just primitives, not "supervisors".
Beam is distinguished by being a bytecode interpreter that implements pre-emption at that level, the concept of lightweight threads it implements and the communication mechanisms enabled between them, particularly the "linking" mechanic that allows one process to easily watch another in various ways, and the services it provides to those processes, which can be seen by querying the metadata for the processes.
You can skim it in order to see the major differences with the JVM. For me, the fact that BEAM has out of the box m:n threading [lightweight processes] and that each process has its own heap makes it stand out compared to the JVM.
Aside from preemption, the other notable difference is process management. They are incredibly lightweight in terms of RAM and CPU usage. You can create millions of them because they do not map directly to kernel threads, and thus to not incur the overhead associated with them.
On the contrary, the JVM uses kernel threads (except for old obsolete versions of some JVM implementations which supported green threads which ran in userspace), which are much more expensive. Because of this the JVM offloads thread scheduling to the OS. BEAM handles process scheduling internally.
Production Erlang systems tend to take advantage of hibernating many of their “millions of processes”, though, since it’s a rare architecture where those million processes are all hot. If you look at the memory usage of such systems, it’s a lot less than that figure would imply, mainly because of process hibernation.
That is true, best-case scenario (a process storing no data whatsoever) hibernating can remove the 233 words minimal heap (and stack) preallocated, yielding 76 words without hipe or smp, or 103 with them, for 304 / 608 / 412 / 824 bytes (nothing/32, nothing/64, hipe+smp/32, hipe+smp/64).
I came across that number a couple of years ago when I was doing research for a presentation. It may have been from a book but I'll see if I can dig up the source.
They were considered a workaround for architectures that didn't have native multi-threading. I don't remember when they were dropped, but it was well before the Oracle acquisition.
The JVM ran all its green threads on a single CPU, even if multiple CPUs were present.
When multi core systems became more common, Sun faced the choice of overhauling the green threads implementation or going to us system threads. I don’t know why they made the choice, but it likely was a combination of time pressure (writing and tuning a good m:n green threads implementation is hard) and doubts about the usefulness of green threads on multicore systems.
Just to note, distributed erlang is NOT secure AT ALL. You absolutely need your own source of network isolation to keep it secure.
By default the EPMD daemon (that coordinates nodes in a cluster) listens on an open port and accepts node join requests if they contain a correct secret cookie (short authentication string). EPMD has zero protection against brute force attempts to guess a cookie. If your EPMD port is open to the internet it is trivial to gain access to your entire cluster and execute arbitrary code.
"Real time" is defined in terms of deadlines, a system is real-time if operations have specific deadlines, times they take to execute.
The level of real-timeness is what happens when an operation misses its deadline:
* In a hard real-time system, missing a deadline is total system failure. Deadlines tend to be tight, and system correctness or integrity may not be maintained in the face of a missed deadline. Rocket guidance systems for instance, if a process misses its deadline the rocket can veer off-course or blowup or abort or…; other examples of hard real-time system are engine controls or medical devices.
* In a firm real-time system, missed deadline leads to QoS degradation, and the result of the operation is ignored/discarded after a deadline miss. Manufacturing systems tend to be that, missing deadlines will usually ruin parts (possibly for some time afterwards), but the production can usually recover and continue.
* Soft real-time are a relaxation of firm, where the result of the operation is still valid after a deadline miss, but its value usually diminishes as the deadline recedes into the past, real-time audio transmission (phones & stuff) is usually there: you want your audio to be delivered "immediately" but can still use it if there's a small delay (missed deadline), however as delay increases delivering the data becomes less and less useful e.g. a few hundreds milliseconds delay on a phone line is annoying, a few minutes makes it useless.
hard vs. soft real time is a matter of latency tolerance. human-facing systems (like telecom) need to be low latency from a human perspective, which is a higher threshold than what you could tolerate in a mechanical context where 100ms could mean damaging some expensive hardware.
The JVM is extremely good at dynamic compilation. I think BEAM only has very basic dynamic optimisation capabilities and I don't think BEAMJIT or HiPE were ever as successful as was hoped. But I'm not an expert in BEAM.
The thing is that MRI Ruby is effectively single-threaded (unless you use things like concurrent-ruby which aren't that performant anyway) and threads can be blocked waiting for I/O, in fact they do most of the time, and is in I/O where Erlang really shines thanks to BEAM's nature of lightweight preemptive processes.
CRuby is multi-threaded with a GIL. CRuby also supports light-weight cooperative threads called Fibres, which are used to build things like: https://github.com/postrank-labs/goliath/
JRuby has no GIL, JIT, and is significantly faster than BEAM.
CRuby/MRI/YARV is almost as fast as BEAM at web stuff in terms of total throughput and JRuby is significantly faster. Sinatra + Sequel is faster at some things and slower at others. The default Rails setup makes huge tradeoffs away from performance.
Always happy to see BEAM get faster but it won't match a NIF where you can control memory layout, mutate anything, use raw vector instructions. It's a killer combo though. Networking, memory management, high level application control in Elixir. Hot spots in NIFs. Kind of gets you the best of both worlds in terms of performance vs maintainability.
But prolog still seems so..awkward. I wonder why langs like Haskell or OCaml aren't more de-facto for these purposes; they seem to have similar expressive power for parser/grammar things and with less inside-out paradigms (imho).
[1] https://docs.oracle.com/javase/specs/jvms/se8/jvms8.pdf