Hacker News new | past | comments | ask | show | jobs | submit login
Thought Experiment: An Introductory Compilers Class (semantic-domain.blogspot.com)
96 points by ingve on Feb 17, 2020 | hide | past | favorite | 22 comments



I disagree a bit with the premise. I feel that the most important part of a compilers course is communicating that compilers are not magic, that none of this is magic. Good developers need to be able to see through their abstractions when necessary, and it's too easy to treat compilers as a black box.

As a slightly different concern, I think the curriculum above runs the risk of giving the impression that even if compilers aren't magic, they definitely need a lot of heavy duty abstract math. I'm not in favor of this either.

With that in mind, my ideal compilers course would probably take an intuituve approach to lexing, recursive decent parsing, and extremely simple tree-walking, stack-based code generation. Keep calm, it's all just data and basic algorithms.

My college education had the effect of lowering my expectations of my fellow students, so a more sophisticated course may be appropriate in some cases. YMMV.


Personally, I feel that a compilers course should actually be about learning how to build a compiler. After all, the only reason to take a compiler course is because you want to--it's not going to be on the necessary track for a degree. The attitude of "you're not going to remember anything about compilers anyways, so let's make this an advanced math class" is quite frankly insulting, most especially to people who actually want to take a compilers course to learn about compilers.


I mean, I wouldn't object to having compilers be mandatory. Out of the other side of my mouth, I like to say that a compiler has great properties for teaching actual software engineering, e.g. managing complexity, dealing with changes (hey kids, we're adding a new syntax feature! Now we're adding closures!), while still teaching you that compilers aren't magic... so that's another approach I would endorse. At that level, you actually would want some formality, but you could add it gradually.


I think it is required at some institutions (pretty sure it was at CMU when I was there), because not only is it useful to understand such a fundamental daily tool at a deeper level, but as the post says:

>everything in compilers is packed to the gills with absolutely gorgeous algorithms

So you get to see practical usage of some advanced things rather than just abstractions.


> it's not going to be on the necessary track for a degree

Compilers was a required 2-class series for my CS degree.


> After all, the only reason to take a compiler course is because you want to--it's not going to be on the necessary track for a degree.

I think that depends on the program?


"I disagree a bit with the premise. I feel that the most important part of a compilers course is communicating that compilers are not magic, that none of this is magic"

It depends what you mean by "magic". I think compilers are one of the hardest things for a programmer to learn naively. I've had co-workers who were semi-self taught and who simply didn't get the idea of an abstract language - it really is a mathematical construction that's somewhat difficult (making a language into a "object" is generally a mistake). For example, it's much more coherent to write a parser that will check whether X string properly specifies file than to just start writing loops and checking values (of course, you really want to just ask the system but that's a different point).

A smart-ish person can just wing-it for a variety of programming tasks, say simple string-processing tasks or computer games, but there are definitely places in compiler construction where you have to know what you're doing, where you have to learn an abstract idea and not just a bunch of specifics.

What I'd like, in a sense, is a map showing which parts of this process are parts where nothing standard and which parts are all kind of the same because it doesn't seem like most compile-construction mark out their in anything like this way.


I agree strongly about the "not magic" part, and think that there's no better way to show that than to have the goal of the course be to create a self-compiling compiler.

As a matter of fact, a whole course could be devoted to the study of this masterpiece: https://github.com/rswier/c4


> Keep calm, it's all just data and basic algorithms

This is a good approach for all introductory systems courses. It's also a good approach for most real-world systems.


I'd agree with this general take on things, because while I've never actually built a 'real' compiler and probably never will, even a shallow level of lexing and tree-walking has been majorly useful several times in my career even working mostly in frontend web development, since it makes for highly efficient DSLs and configuration tree transformation.


I tried to add a comment on the source page, but that's not possible if you don't have a Google account. So I'll add it here.

There are two great resources for an introduction to compilers. First, "The Unix Programming Environment" has a chapter on programming tools which builds the equivalent of the Unix 'dc' desk calculator. This includes variables and procedures, so it's non-trivial.

The second one was first posted to USENET by (as I recall) Jack Ganssle called, "Let's Build a Compiler". This is notable because it builds a recursive-descent compiler, rather than the usual LALR ones.

Considering the intent was a course for beginners, I think much of what that page wants to cover is more suitable for a _second_ course. If you go through the UPE book first, you'll be in a much better position to slog through the Dragon book and get all those low-level things.


Introductory compilers classes should drop LR parsing as a topic. It takes a huge amount of time to teach and while it is mathematically elegant, it isn't really a good approach to writing a compiler front end that produces high-quality error messages. Teach recursive descent and move on to more interesting topics.


I'd go even further. Minimize the exposure time to parsers.

We started my undergrad compilers class with several weeks on parsing. I think it's by far the hardest aspect in all of compilerdom -- and as a Lisp programmer, I didn't understand the need to be tortured that way.

I do understand on a theory level why parsers would get put first in a compiler class, but they're a tremendous barrier to entry and just make the whole subject seem inaccessible. There's lots of other ways I can think of which would make it easier for newcomers to get some traction on the subject of compilers.


I don't agree that parsing is the hardest aspect in all of compilerdom. More like parsing is the least interesting part of what makes a compiler. Perhaps as you allude to, use Lisp or S-expressions as the frontend language and then build the compiler as you don't need much of a parser. Then you can get straight to the more interesting bits.


Let's go another step. Move parsing into a mandatory class on automata theory, where you can properly dig into the machine models underlying different parsing algorithms.


On the other hand LR compilers are fast. That is why golang uses an LR parser.


This is more like "Compilers for functional languages I", to be followed by about two more courses.

Non-optimizing compilers for simple imperative languages are straightforward. There's a front end, which turns

    a = b + c;
into

    b c + a =
which can be represented as a tree. Then, for each operator, you generate code. So you generate something like

    load a0,b
    add a0,c
    store a0,a
Then you show how that can be generalized to cover functions, arrays, loops, etc. That's Compilers 101, Wirth's classic book on Pascal.

This article takes a completely different approach. All functional, all the time. Not sure whether this is good or bad, but it's definitely different. Is what this guy is promoting mainstream in advanced compiler work today, or is he off in his own world?

One issue with functional everything is that naive functional compilation is very inefficient. If you write

    f(g(a)) 
where a is some kind of big collection, and g produces a similar collection as output, then you have an intermediate variable which is itself a big collection. Much of functional programming optimization is about trying not to generate big intermediate values, but converting the problem to a pipeline instead.


About the last paragraph, R and Pandas both have that issue. (Although I'm not sure if it's a bottleneck in typical code, since there are other problems too.)

But Julia and the newer Weld compiler apparently have loop fusion:

https://julialang.org/blog/2017/01/moredots/

https://dawn.cs.stanford.edu/2018/07/30/weldopt/


My favorite introductory compiler book is Niklaus Wirth's "Compiler Construction", which implements a compiler end-to-end with full source code, in a bit more than 100 pages: https://inf.ethz.ch/personal/wirth/CompilerConstruction/inde...


Rather than teach compiler construction, maybe we should instead teach language design with an added implementation component. Compiler construction on its own seems like too dry of a topic for most students, and I say this as a PL PhD. Language design might attract more interest, and provides more context on why you would want to write a compiler in the first place.


That's interesting. What would you teach in the language design part of the course? By the time compilers are taught (2nd or 3rd year undergraduate), most students rarely understand advanced language concepts, whether call/cc, GADTs, higher-kinded types, or type-based lifetime tracking, message-passing vs shared memory parallelism etc, let alone the tradeoffs involved in the design space.

Note also that there is no scientific consensus on what makes for a good programming language.


I feel like this might still be too high level. A better approach may be to teach by the example of writing a basic FORTH environment from machine instructions, then basic helpers written in the language, and working up from there.

Various types of better things can be added in once there's something to optimize.

Edit: I was probably thinking of this version that I likely read a couple years ago:

https://news.ycombinator.com/item?id=13505285

Quoting their post

"" rwmj on Jan 28, 2017 [-]

This is a how-to-write-a-FORTH tutorial that I wrote a few years ago. It's particularly nice that you get to see how various control structures are implemented, like IF, CASE and even comments!

Part 1: http://git.annexia.org/?p=jonesforth.git;a=blob;f=jonesforth...

Part 2: http://git.annexia.org/?p=jonesforth.git;a=blob;f=jonesforth...

(github mirror: https://github.com/AlexandreAbreu/jonesforth/blob/master/jon... https://github.com/AlexandreAbreu/jonesforth/blob/master/jon... )

Previous HN comments: https://news.ycombinator.com/item?id=10187248

""

The immediate reply covers my feelings for how the __introductory__ class should proceed: minimal assembly and maximum programmer clarity.

"" rsaarelm on Jan 28, 2017 [-]

Working through Jones Forth got me up to speed on both understanding how Forth works and getting my hands dirty with some practical assembly coding.

Once I got a better idea of Forth, I also realized that Jones stays in assembly for rather long. He builds up the entire interpreter from raw assembly words, because you need the interpreter to start parsing textual Forth source. But you if you could somehow write Forth before the interpreter is put together, it would be quite natural to switch to Forth much earlier. And it turns out you can do that. Jones even defines the `defword` macro. But for some reason he makes very little use of it. Rewriting most of the code leading up to INTERPRET using defword was a fun exercise.

The next step would've been making the whole system bootstrapping by rewriting the assembler in Forth, but x86 machine code generation is hairy enough that I bailed out at this point. ""




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: