Hacker News new | past | comments | ask | show | jobs | submit login
Compiling a Lisp (2020) (bernsteinbear.com)
114 points by swatson741 7 months ago | hide | past | favorite | 32 comments



As mentioned by the author, "An Incremental Approach to Compiler Construction" [0] is one of the best papers I've ever read and I highly recommend reading it and implementing even just the first few steps. It was the first resource I found that made writing compilers approachable. I made it through about half of the steps (about as many as the author of this post) before setting it aside for awhile. I'd like to pick it back up and keep going sometime. I also recommend writing the compiler in lisp if you already know lisp, rather than C. :)

[0] http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf


https://mitpress.mit.edu/9780262047760/essentials-of-compila...

Essentials of Compilation by Siek (Racket and Python versions available) are book-length treatments of that paper's topic.


Ah, published in 2023! I missed this release. Ordered a copy. Thanks for the link!


Now I have to look into these as well for my library. Thanks for the heads up. :)

By the way, for the Python folks with a Pi around, this one is also an interesting read to dive into what it means to write a compiler

"Writing Interpreters and Compilers for the Raspberry Pi Using Python"

https://www.amazon.com/Writing-Interpreters-Compilers-Raspbe...


Whoa, what a treasure trove! https://bernsteinbear.com/pl-resources/


Related:

Compiling a Lisp to x86_64 (2020) - https://news.ycombinator.com/item?id=26631308 - March 2021 (31 comments)

Compiling a Lisp to x86-64: Labelled procedure calls - https://news.ycombinator.com/item?id=24931577 - Oct 2020 (13 comments)

Compiling a Lisp: Instruction encoding interlude - https://news.ycombinator.com/item?id=24822534 - Oct 2020 (2 comments)

Compiling a Lisp to x86-64: Heap allocation - https://news.ycombinator.com/item?id=24746587 - Oct 2020 (3 comments)

Compiling a Lisp to x86-64: If expressions - https://news.ycombinator.com/item?id=24711072 - Oct 2020 (4 comments)

Compiling a Lisp to x86-64: Let expressions - https://news.ycombinator.com/item?id=24652842 - Oct 2020 (39 comments)

Compiling a Lisp: Reader - https://news.ycombinator.com/item?id=24580453 - Sept 2020 (27 comments)

Compiling a Lisp to x86-64: Primitive binary functions - https://news.ycombinator.com/item?id=24490108 - Sept 2020 (1 comment)

Compiling a Lisp to x86-64: primitive functions - https://news.ycombinator.com/item?id=24386826 - Sept 2020 (29 comments)


I'm a big LISP fan. Clojure syntax is my favorite, but Common lisp's ability to compile to native AND be interpreted is so damn impressive.

Over the years I've toyed with running uLisp running on microcontrollers. There has been some work to add macros, and the running of assembly...so of course the next step would be to write a small compiler that can output RISKV and Arm thumb assembler so that I can reproduce the Genera LISP machine on and esp32 or teensy 4.0 driven handheld computer. That's the dream at least.


There's an effort afoot to bring this to the Clojure world, with the lovely name jank: https://jank-lang.org


I guess to some extent there is also Ferret (Clojure to C++, https://ferret-lang.org/) and GraalVM if you want to compile Clojure.

Thanks for the tip about jank!


His interpreter series is “sadly” in ocaml. See https://bernsteinbear.com/blog/lisp/00_fundamentals/ hard to follow using 2 unfamiliar languages.


Lol they were written 4 years apart and don't even try to implement the same Lisp. Two different projects for two different learning experiences


It's always amazes me how common lisp is so much more dynamic than, say, python, and is still compiled down to native code, and has running speed in the same ballpark as java/go/c#.

Unfortunately it's not a simple language, and has much historical baggage (the standard has mandated Roman literals in formating, really guys?..)


I dont think it is uniquely difficult compared to any other programming language. If anyone reading this has given up multiple times trying to learn lisp i encourage to keep trying. Its not really that different than python or javascript (Here me out). At first i genuinely couldnt understand it at all despite being well versed in tons of other languages, but after a few weeks, one day my brain suddenly could read it as if it was c++ or python. It was an instantanious switch.

There is a sort of shape translation, like an affine transformation between lisp code and algol family stuff like c, etc. Once your brain sees the shapes of the paragraphs, and the function and class def shapes, you will not feel lisp is really different from the other languages except in form. ( excluding the macros. those are truly unique)


> like an affine transformation between lisp code and algol family stuff like c

I wonder how far one might go introducing common lisp with parallel examples in javascript and python -- something like a Rosetta Stone approach, also likening ')' and '}' and <dedent>.


Very far. actually thats probably a really good way to do it.

lisp for python devs, amd its just basic python shit on the left page, and the clisp version on the right page.

We do the same thing for human language books. Ive got a copy of the little prince where the left is in english and the right is in japanese. the bookstore is full of these.


Well, the funny thing is that for base Common Lisp, it's less dynamic than you think.

The functions and symbols in the core packages (e.g. COMMON-LISP package) are considered fixed and immutable. For example, out of the box, the compiler can assume that things like '+' aren't going to be redefined behind its back. It's assumptions like that can lead to very efficient code. But, at the same time, the core lisp language is actually not very dynamic, its more like your typical, compiled language in that sense.

In contrast to say, Scheme, which has no presumptions. Out of the box, naive Scheme compilers can not make such assumptions. So, if you have something like:

    (set! a (+ x (- y (+ z w))))
The compiler actually has to look up (dispatch) '+' twice! Did '-' change the definition of '+'? Maybe...you don't know. So you can't (again naively) inline anything (not even counting the issues with numeric tower and types of the variables). So, what would ideally be a couple of simple machine instructions turns in to a huge endeavor.


It is true that in Scheme you can shadow built-ins if you want, but a compiler would have to be extremely naive, perhaps lacking support for macros entirely, to not recognize primitive `+` from a user defined `+`. One of the very first transformations applied to the user program would be alpha conversion, where all user defined variables are replaced with program-unique names. At that point, the compiler would be absolutely sure that `+` is the primitive addition operator and compile it directly as an add primcall.


I believe it is legal to actually reassign the global builtins, not just shadow them. Proving that a program doesn't do this is difficult to impossible, and I think some schemes give you an option to tell the compiler you won't do it. Chez Scheme has a section in their manual recommending pulling top level declarations into lets that it can analyze locally in order to optimize them.


Scheme always felt less dynamic than Common Lisp to me, with less repl-messing capability.

Both Scheme and CL let's you completely shadow any symbol in a given compilation unit, however.


If Common Lisp allowed additional methods to be added to + (if they were generic functions, and if the new methods did not override the standard ones) without changing the existing methods, it would be possible to be just as efficient. That's because one could just dispatch as now and instead of erroring if nothing applied, instead check the new methods.


for those wanting generic +, equality and comparison in CL, there's a nice library: https://alex-gutev.github.io/generic-cl/


What would be nice is a way of having that generic +, but making it as efficient as builtin + on the standard numeric types. CL implementations do not currently have that capability (except perhaps if they make builtin + rather slow).


I'm guessing there are very few people out there that think of programs as that dynamic? Especially considering that the message you are quoting started from python, how does it compare there?

I'm also curious to know why I would want something that was so dynamic that + could be redefined on me.


One that comes immediately to mind is that you might want + to handle vectors as well as scalars. Something like:

    (define scalar-sum +)

    (define (+ . arguments)
        (if (vectors? 'arguments)
            (vector-sum 'arguments)
            (apply scalar-sum arguments)))


    vectors? and vector-sum not shown for clarity.

Edit: or maybe you want any sum in excess of (say) $5,000 to alert a manager.

    (define old-plus +)
    (define (+ . args)
      (let ((value (apply old-plus args)))
        (if (> value 5000)
            (begin
              (alert-manager)
              value)
            value)))
The nice thing here is that this Just Works anywhere + appears in the code. You don't need to put the conditional and the call to alert-manager everywhere you're calculating a sum. Of course, in a real implementation you'd want to pass stuff to alert-manager like what module you're in, the transaction number, etc.


Fair, I should have tried to use my imagination there. Basically operator overloading? I forgot that generic functions are different beasts, and I've rarely had need for this, myself.


> For example, out of the box, the compiler can assume that things like '+' aren't going to be redefined behind its back.

Because it is undefined behavior to do so.


> the standard has mandated Roman literals in formating

By the way, in case anyone is wondering what this is, it just means that you can use the FORMAT function to output Roman numbers.

  (format t "~@r" 4)
  ;; => IV
You can also do:

  (format t "~r" 4)
  ;; => four
and just interpolating the number would use "d":

  (format t "The number is: ~d" 4)
  ;; => The number is: 4


As far as I understand, the SBCL compiles each function separately, and can fall back to very slow codepaths in doing so. Getting peak performance out of Common Lisp takes some effort.


Wouldn't that just be while developing using the repl? I would think compile-file would compile the whole file, not just 1 function at a time ...?


> historical baggage (the standard has mandated Roman literals in formating, really guys?..)

Come now, it's not _that_ old :)

It's just a bit of fun, not really historical baggage...


I did a code golf competition in college once where we could pick whatever language we wanted per problem. One problem was to format a number as a Roman numeral, and of course I won that one by just calling the built in Lisp function. Good times.


I could have sworn it was bernstainbear.com...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: