As I believe that there are a lot of compiler people around at HN, I would like to ask:
What would be a good starting point to learn about writing compilers? I have written a small tree-walk language (and is following the byte-code based language) following Crafting Interpreters[0], but I would like some more learning resources.
I would ideally like some online book (I prefer readable resources) that are similar style with Crafting Interpreters, but more focusing on compilers & optimizing.
I have heard about the Dragon book, but it was too big and pricey :-(, I'm not sure if I can go through the entire book.
My humble opinion: do not get the Dragon book - it puts too much focus on parser theory and implementation which is in my opinion not the hard part of compiler construction.
Most modern languages use a handwritten recursive descent parser anyway and the interesting part starts after you have the AST (code generation and register allocation).
> My humble opinion: do not get the Dragon book - it puts too much focus on parser theory and implementation
This is my strong opinion as well. Almost everyone fixates on parsing.
I've been working on an interpreter/compiler full time since 2013, so coming up for eight years. I think I've spent maybe three or four days working on the parser. It's a vanishingly small part of the job.
I wish compiler courses started with an AST, and then parsing was a separate course unrelated to compilers.
Yes. A lot of one semester compiler classes spend way too much time on lexing and parsing. Amusingly, their coverage of parsing is limited to LALR. It almost seems as if profs are thinking, if I had to suffer through LALR(1) table generation then you're going to have suffer through it as well.
By comparison, at Berkeley they recreate 61C Machine Structures every few years. Out goes the unimportant, in comes the new.
Lexing and parsing isn't even a vanishingly small part of the job. It's usually someone else's job. Someone who works on LLVM code generation for the XYZ processor is the wrong person to keep the Clang front end up to date with the latest C++ committee's standard iteration.
They should push DFAs and grammars into theory classes. That said, I do like the 2nd edition of the Dragon book, particularly the later chapters.
In my experience, compilers are taught over three semesters or so: lexing/parsing is one course (or rolled into a prerequisite), the standard compilers is the second, and advanced (graduate-level) compilers is the third course.
Berkeley just has CS164 [1] which is all in one. They haven't taught a graduate compiler section in years (decades?). Stanford is on the quarter system. They have an even more compressed single quarter compiler class, CS143 [2], but they also have a graduate advanced topics class, CS243 [3] as well. Moreover, CS243 doesn't require 143 as a prerequisite. As an aside, Berkeley is very strong at computer architecture (RISC, RISC-V, ...) while Stanford is very strong at compilers (Aiken, Lam, Ullman, Hennessy, ...)
Yes - it's probably the case, because there is nice theory behind it (DFAs for lexing and grammars for parsing).
What I can really recommend (and me and others have recommended it already countless times) is TECS: The elements of computing systems (also know as "Nand2Tetris"). Although it also covers other parts like the hardware stuff it lets you implement
- an assembler
- a VM
- a compiler that compiles a simplified Java-like language to this VM
It takes a bottom-up approach, but if you go through all projects you get a really good view about the big picture. However as it does code generation via an intermediate VM language it doesn't really touch register architectures as a compilation target. Anyway I highly recommend that book.
The small amount of time spent on writing the parser might have something to do with how well-understood that area of computer science is. If nobody had ever written those books which were mostly about parsing, then maybe writing the parser would be the hardest part.
I don’t think so. Almost everyone abandons all the theory and just writes recursive descent. It's not the case that they're using it but have just come to see it as normal. They're not using it.
Yeah, I always write the grammar before writing the recursive descent parser. I can't imagine doing otherwise. It would be a mess.
I'd go further and say that for most people, the first time you read or write a context-free grammar is when you're trying to write a parser. So the theory is motivating the practice.
Parsing C++ is not that hard if you use hand rolled recursive descent parser. On the other hand hacking some parser generator framework to be able to parse C++ is hard and IMO mostly pointless busywork.
That's not what 'hard' means in this context. Requiring template instantiation (or some other interplay I'm not aware of) doesn't make parsing harder - it just means it can't be done independently.
And what's more that kind of interplay is easier outside a parser framework!
Many languages are not very pure for parsing. If you're writing a parser using lex and bison this means bending over backwards to make it work.
In context of parsing the situation when all the state machine/language theory has it's place is when you have some reason to care about compiler passes and generate output while reading input. That is, when you are not constructing some intermediate in-memory representation of whole compilation unit, presumably because you are running on system with (very) limited memory. For programming language compilers this ceased to be interesting limitation in 80's.
I concur, an incredible book. Simple, neophyte-friendly, yet deep. Unlike Dragon Book, which should have been called Parsing Book, this is actually focused on compiling and languages.
I've been picking up older (but still perfectly relevant) textbooks from half-price books for years now. I have a hardcover copy of the dragon book that I bought for $5.99.
I have written the Go arm64 and sparc64 compilers, as well as several other proprietary production compilers for several languages.
The Dragon Book is the best, for sure, and in my opinion essential material. Many people find its style too dry. It's basically a reference, not a tutorial. If you want something more hands on, I recommend this book about LCC[1]. It covers LCC, which is a C compiler. It's not a generic book about writing compilers, but I believe to be very useful anyway because it's very hands on. The current version of LCC is here[2]: Not sure if you can regenerate the book out of the current code or not (LCC is written in literate programming style, and the book used to be just the literate part).
[1] A Retargetable C Compiler: Design and Implementation (Addison-Wesley, 1995, ISBN 0-8053-1670-1)
Yes, I find that The Dragon Book to be a fundamental reference in my university days of studying compilers. Even so, many sections of it are still relevant to this day.
Also, I think I might have seen your name before whilst looking at the Solaris port of Go years back. Mind you, I'm looking at bringing Golang to Haiku (0) with similar changes like Solaris and Windows.
What are you interested in learning? Compilers is quite a broad field, consider:
* Lexing/parsing
* Classical scalar optimizations
* Loop (vector) optimizations
* Code generation trifecta of scheduling, register allocation, and peepholes
* Dataflow and static analysis
* Interprocedural, link-time, and whole-program optimization
* JITs (to the extent that it differs from the above)
* Garbage collection
* A variety of advanced topics, such as decompilation, dynamic binary analysis/rewriting, superoptimization, formal verification, etc.
There's no resource that's going to cover everything well, and some of them (particularly the last topic) are going to best covered by reading the research papers that come out at the premier compiler conferences such as PLDI. The Dragon book, for example, mostly covers parsing, classical scalar optimizations, and code generation, giving you very little to go on for interprocedural optimization, for example. Allen & Kennedy's "Optimizing Compilers for Modern Architectures" is going to be a better resource for loop optimization.
If you want to get these textbooks for a cheaper price, try finding a nearby university, looking up when they offer a compiler course, and seeing if the bookstore has a used copy before the semester starts. I was lucky in that the compiler class had been cancelled immediately before the semester started, so the entire pallet of books was available, offering me my choice of highest quality used book.
> What would be a good starting point to learn about writing compilers?
You also might find the LLVM/Clang kaleidoscope based tutorials interesting to get started having an insight into writing an LLVM-based compiler. Also Adrian Sampson has a great article in introducing LLVM for new-comers [1].
> You also might find the LLVM/Clang kaleidoscope based tutorials interesting to get started having an insight into writing an LLVM-based compiler.
Hmm, I've always assumed that the tutorials were for the people who already 'know' compilers enough, it's great that it's more accessible than I thought.
Definitely will going to look through them. Great resources :-)
Pascal is the ideal language to learn compilers, because the language is so easy to parse. Nonetheless, it is still a real language, much closer to C than, let's say, Lisp.
I wasn’t trying to say Pascal is problematic beyond toolchain logistics. Pascal is no longer mainstream and using it probably means using a text editor or an unfamiliar IDE. Compared to more familiar languages, you’re pretty much on your own to figure out tools unless you already use Vim or Emacs. Which why people use Vim and Emacs.
Now you have three problems...without regular expressions.
The book isn't for people who are able to write compilers.
When I discovered Hacker News, I'd never heard of Python. When I joined, I used Visual Studio Express, or rather Expresses because each was language locked. Obviously, I ran Windows. It was still four years until I bit the Emacs bullet.
Yeah, installing and running Pascal wouldn't be a problem for me now. But I remember a time when it would have been. I remember what that was like. Half of all programmers are below average. I'm certainly one of them.
So they are in for a surprise, because even on FOSS friendly OSes programming languages aren't installed by default unless one choses either full installation or a development profile.
Being part of distros doesn't mean they are installed, specially on commercial UNIXes since compiler tools became a separate product.
Better read the history of gcc, it was commercialization of UNIX C compilers introduced by Sun, that made people start paying attention to gcc. Until then it was a largely ignored effort.
Finally I don't believe that anyone skilful enough to learn how to install Linux, is not able to install a compiler.
I don't understand why you're comparing Pascal to vim and Emacs. You don't even need to use Pascal to create a compiler in Pascal, do you understand that? I once created a simple Pascal compiler using C, and it was a simple project, much easier than parsing any other language. You can use any language you want, even Python would work.
The best things about compilers is that they have gentle ramps for learning. If you start simple enough, https://github.com/oriansj/M2-Planet the rest just becomes just add one more thing...
which gives some introduction into how a modern compiler like gcc is designed/structured. For context, the author, siddhesh is a current glibc maintainer.
Another option is to just find undergrad course materials for a compilers course and just go through it. Lecture slides can be great; sometimes they have other reading materials too. That should get you up to speed with the fundamentals fairly quickly, and most likely that's all you're looking for.
If after that you're decently good with those and want more, you can look at grad-level materials, though in my personal opinion the materials for those can be quite a bit more dry (and painful) if you try to self-study, at least unless you find a particularly good source.
Have you tried the book "Game Scripting Mastery"? It lays out the detail steps to construct a C-like scripting language and a Virtual Machine to run on top of a customized byte-code.
In case you're worried this is outdated, note this is for version 10 of gcc (pdf is a pre-release) as stated on the frontpage of the document. I am curious how they plan to keep this updated, I could not easily find the repository for this.
GCC 10 is marked as "prerelease BETA" in my package manager, so I know that this PDF is up-to-date. I was looking to use it as a cheap way to keep track of new developments ;)
What would be a good starting point to learn about writing compilers? I have written a small tree-walk language (and is following the byte-code based language) following Crafting Interpreters[0], but I would like some more learning resources.
I would ideally like some online book (I prefer readable resources) that are similar style with Crafting Interpreters, but more focusing on compilers & optimizing. I have heard about the Dragon book, but it was too big and pricey :-(, I'm not sure if I can go through the entire book.
[0]: http://craftinginterpreters.com