Hacker News new | past | comments | ask | show | jobs | submit login
Implementing a JIT Compiled Language with Haskell and LLVM (stephendiehl.com)
161 points by andars on Sept 12, 2015 | hide | past | favorite | 26 comments



I've implemented this toy compiler (minus JIT) in Java, following the tutorial on the LLVM website (which uses C). After browsing through the Haskell code, it is incredible how much more suitable a statically-typed, functional language is for compiler development when compared to an imperative language like Java. The code is much more terse and most importantly, readable.


Yes languages like Haskell and Lisp are excellent for compiler development, as a compiler is one big pure function that involves a lot of transformations on recursive data structures. And what are language academics interested in doing? Implementing languages. So the languages they design are often suited to that task.


I think the greatest features of the ML family for compiler development are great AST representations (as algebraic data types) with pattern-matching. When you change the AST, it is very easy to go and fix all the AST transformations to handle less/more cases or fix the types.

There are other great features for compiler development in the ML family, but almost all of them revolve around static types.

Can you explain what you mean when you say Lisp is excellent for compiler development? When changing the AST, how would you go about changing all the code that needs to be changed?


ML got ADTs and pattern matching out of the box. Fine. With Lisp you can quickly build your own ADTs, with some enhanced functionality which is not possible in *ML family.

See a couple of examples of this:

[1] http://www.cs.indiana.edu/~dyb/pubs/nano-jfp.pdf

[2] https://github.com/combinatorylogic/mbase


> > When changing the AST, how would you go about changing all the code that needs to be changed?

Is there an answer to this question?


Have you seen Nanopass or MBase? They address this very question.


All programs are compilers when you come down to it. Even web apps compile web requests to HTML. There are just two kinds of programs: compilers, and daemons (which are just compilers with an infinite loop).


Many would disagree. Most on-line programs have various deadlines and scheduling requirements that are nothing at all like compilers. Their challenge is the allocation of computing resources among concurrent request in a way that tries to ensure timeliness. Some of them are also required to guarantee some coherence of distributed data. Their concerns include scheduling fairness, safety, liveness, timeliness and coherence -- very different from those of a compiler. They're not a compiler "running in a loop".

I think the best classification of programs was given by Gérard Berry[1], creator of Esterel, a very influential imperative, fully verifiable language for reactive systems[2]:

Computerized systems can be divided into three broad categories:

* Transformational systems compute output values from input values, and then stop. Most numerical computation programs, payroll programs, and compilers are transformational.

* Interactive systems constantly interact with their environment in such a way that the computers can be viewed as the masters of the interaction. A user calls for services, the system listens to him when it can, and it delivers the services when they are available. Operating systems, centralized or distributed databases, and the Internet are interactive.

* Reactive systems, also called reflex systems, continuously react to stimuli coming from their environment by sending back other stimuli. Contrarily to interactive systems, reactive systems are purely input-driven and they must react at a pace that is dictated by the environment. Process controllers or signal processors are typical reactive systems.

In another paper (The Foundations of Esterel) he expounds:

* In interactive systems... [t]he computer (network) is the leader of the interaction, and clients wait to be served. The main concerns are deadlock avoidance, fairness, and coherence of distributed information.

In reactive systems, the pace of the interaction is determined by the environment, not the computers. Most often, clients cannot wait. The main concern are correctness (safety) and timeliness.

[1]: http://en.wikipedia.org/wiki/G%C3%A9rard_Berry

[2]: http://home.ku.edu.tr/~stasiran/ecoe560/Papers/EsterelPrimer...


That sounds like an optimization pass of a compiler. I'm not convinced it's meaningfully different.

My background is linguistics, so perhaps I'm biased to see everything as a compiler. However, automata theory generally provides a similar perspective.


> I'm not convinced it's meaningfully different.

It is because the problems and algorithms involved are completely different. An optimizer might address some of those problems in the generated code; it may be a solver for some of those problems (e.g. register allocation, which is one kind of a scheduling -- i.e. resource allocation -- problem), but it is not itself a solution to scheduling and coherence problems.

As evidence, I'd offer the fact that the verification of online programs requires temporal logic (which is strictly more powerful than, say, session types) -- Esterel itself is based on temporal logic, and TLA+ uses temporal logic to verify algorithms used in online programs -- while, AFAIK, no temporal logic is required to verify a compiler. That their verification requires different logic shows that they solve a qualitatively different problem.


Would you say something like Clojure is also suitable for compiler development or not?


Yes, it got macros, so you can build something similar to Nanopass framework on top of it easily.


Well any turing complete language is suitable for compiler development work, but yeah Clojure is designed around symbolic computation, recursion and things like that.


Can we stop pulling out this turing complete quip?

That's like asking on a cooking forum how to open a coconut, and the reply being well any physical object with mass can technically open a coconut with the correct application of force, but yeah, you should probably use a machete.


BTW, the guy you replied to is part of the team working on what is probably the most advanced compiler technology breakthrough in the last decade (Truffle/Graal). So he's not "technically right", but has actually contributed to the world's most revolutionary compiler (which is written neither in Haskell nor a Lisp).


I was replying to what he said, not who he is.


I think that's relevant context which proves that your analogy in this particular case is inappropriate. He's a chef working in one of the world's greatest restaurants and he's not using a machete to open the coconut. Therefore, there are probably good reasons to prefer another tool and the advantages of a machete are probably not that big (or are offset by other factors). Again -- in this particular case.

So while the "any Turing complete language" is an over-generalization, so is "use the right tool for the job". Programming languages are not at all like knives or hammers as they are 1/ terribly expensive 2/ require constant training 3/ require specialization and 4/ each designed to be general-purpose. It's more productive to use just one language you know best than keep switching to a completely different one that's marginally better at the particular task at hand.


I was trying to challenge the idea that there is anything magic about compilers. Some people think they do magic things that they could never understand, but the reality all the techniques could be implemented in VB6 if you wanted. You don't need a server, a database driver, middleware or whatever. If your language can read from somewhere and write to somewhere that's all you need. That's what makes compilers perfect little programs in my mind.


I'm using this one, thanks!


Btw., you do not need a Turing-complete language to build a compiler. It is much better to use a non-Turing-complete one, ideally, a total language.


This is common knowledge for a lot of people, but I have found the author's "What I Wish I Knew When Learning Haskell" article is awesome too:

http://dev.stephendiehl.com/hask/


It just so happens that the author is hiring Haskell engineers right now to do related work: https://twitter.com/smdiehl/status/642790047163375616


Last semester for a compiler course I was taking we had to implement a compiler for a subset of Go (called GoLite), with free reign over language choices. I chose to implement a compiler[0] using Haskell and LLVM, this article helped a _lot_ when I was writing codegen, especially since it was my first real Haskell project.

However, some of the most interesting and important issues are left out of it. Some examples are implementing exceptions, threading, imports, garbage collection, closures , etc.. I would love a more advanced version of this article that discusses how to implement a more advanced compiler with more modern features (specifically in LLVM).

[0] github.com/xldenis/mgc


Sorry for offtopic but anyone know performance benchmarks LLVM vs BEAM ?


They're not really comparable things. One is a bytecode VM (BEAM) and the other is a bunch of compilation infrastructure. As well, BEAM is optimized for heavy network-oriented concurrency and only incidentally is not-slow on a CPU. If you run a CPU-only race with a BEAM targeting language (Erlang, Elixir, LFE etc etc) you're going to be disappointed.


Thank you for the compare/contrast!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: