"But if you try to implement a Lisp interpreter in a low-level language by translating that metacircular interpreter into it (as I did later that year) you run into a problem. The metacircular interpreter glosses over a large number of things that turn out to be nontrivial to implement — indeed, about half of the code is devoted to things outside of the scope of the metacircular interpreter. Here’s a list of issues that the Lisp 1.5 metacircular interpreter neglects, some semantic and some merely implementational:..."
Amen! Those metacircular interpreters are neat, and very useful from a learning-programming-languages standpoint, but they gloss over about 90% of the details and pain of language implementation.
I was really looking forward to Paul Wilson's An Introduction to Scheme and its Implementation (http://www.aboulsaadat.com/wael/teaching/324/exdocs/schintro...), but at the time it was really incomplete. Looks like there's more now, but still no garbage collection as far as I can see.
It only depends on about 350 lines of LISP code for standard functions like CADR, APPEND, READ, etc, plus avout 400 lines of C for low-level stuff (variable binding and garbage collection).
The 350 lines of LISP standard function even contain low-level things like interning of symbols:
This is very interesting, thank you! At first glance, it does perhaps gloss over a few things still: instruction encoding, jump-target backpatching (or its moral equivalent), the system call interface, the fetch-execute cycle, and a debugger; but not, for example, function entry and exit, or, as you point out, garbage collection or the reader. Also, on p. 16, there's an error seemingly due to the author not knowing about the Y-combinator:
The LABEL special form,
as implemented by EVAL,
was a clever construct
for defining anonymous recursive functions.
Anonymous functions
("lambda functions")
cannot usually recurse
(apply themselves),
just because they are anonymous,
so there is no name that can be used
to invoke them.
In case you are still reading this: the book is about ancient LISP (LISP 1.0, 1.5, PDP-6 LISP), which was dynamically scoped. The Y combinator requires lexical scoping, though.
Hmm, that's a good point. The general-case solution they adopted for not having lexical scoping at the time was FUNARG, but of course that's just as much of a kludge as LABEL (which was, yes, in 1959 Lisp, before FUNARG). There are cheats like this, of course:
Perhaps surprisingly, this does work in dynamically-scoped Lisp-2s like Elisp (which, unlike Lisp 1.5, doesn't have FUNARG), even for λ-expressions:
(funcall
(quasiquote-y '(lambda (f n)
(if (< n 2) 1
(* n (funcall f f (- n 1))))))
4)
Defining a version of QUASIQUOTE-Y that works for the following subtly different LAMBDA form has so far escaped me but seems like it ought to be feasible:
(lambda (f n)
(if (< n 2) 1
(* n (funcall f (- n 1)))))
Now, of course, in modern Lisps, we consider it a barbaric practice to treat a list beginning with the symbol LAMBDA as a function as such, as opposed to a piece of code which could be EVALed to produce a function. But this ancient abomination remains honored in the dark precincts of Elisp, and of course was entirely supported in Lisp 1.5 (though doing the equivalent of QUASIQUOTE would have been considerably more awkward; FUNARG was the favored design error of the day) and McCarthy 1959.
ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-199.pdf ("The function of FUNCTION") relates the 1970 perspective on this problem.
The FUNARG device was defined in the LISP 1.5 manual, but never implemented. Later versions of MACLISP did a limited version of FUNARG (called FUNCTION*) that solved the downward, but not the upward FUNARG problem.
Then dynamic scoping interferes badly with tail call elimination (TCE), because bindings do not have indefinite extent and need to be removed at some point. The exact point of removal leads to all kinds of interesting problems. The short version is, you can have TCE or correct bindings, but not both. So a dynamically scoped Y will kind of "work", but eventually either cause a FUNARG problem or overflow the stack.
Really? I had no idea! I thought FUNARG did get implemented (Moses implies on p. 13 that it was implemented by 1967, which is later than the Lisp 1.5 Manual (1962)), but maybe you're right that it only got implemented for the downward funarg problem, which is sort of the less interesting one — with dynamic binding, in the more common cases, you could solve the downward funarg problem by naming your variables things like CARBONATE-MAYONNAISE (thanks qq) instead of X, similar to the variable capture problem in Common Lisp ("unhygienic") macros.
Moses points out in his paper that none of this is particularly difficult if you use deep binding ("deep access", as he calls it) rather than shallow binding ("shallow access"): you "just" "abandon a stack and use a structure which is quite close to an alist", but of course that means that now all your variable lookups take O(ASSQ) time, as he sort of points out in footnote 4 on p. 12.
It's interesting that (though he cites Landin!) the idea of lexical scoping seems not to have entered his mind; he seems to be confusing it with what he calls "evaluating functions against the binding environment", which is similar but not quite the same.
https://openlibra.com/en/book/the-funarg-problem-explained is Weizenbaum's "unpublished" 1968 memo "The FUNARG problem explained" that Moses was working from. On skimming the memo, I think Weizenbaum did understand things in the modern sense and in particular solved it in a way that permitted proper garbage collection — but in SLIP-OPL, not LISP. Weizenbaum says (p. 55):
> Our solution to these problems is mainly that of providing a tree structured symbol table in place of the more usual stack structured symbol table. That solution is, of course, not original with us. It shows up, for example, in some versions of LISP and in Landin's SECD machine.
And at the opening of his memo he claims that "the original LISP implementation solved that problem," which I take to mean the problem as he understood it; in particular, he critique's Bobrow's 1967 paper on FUNARG with shallow binding: "Still, years after that implementation, the problem remains ill understood. Bobrow, for example, published a faulty solution for it only recently."
Of course, the fact that Weizenbaum thinks the original LISP "solved that problem" doesn't strongly imply that it solved it correctly, much less with TCE.
He doesn't seem to mention TCE, unless I missed it. I think the modern understanding of the importance of TCE dates from Scheme and the whole "ΛTU" series of AI memos — at the time I think it was considered merely an optimization. But you're certainly correct that if you want to loop indefinitely with Y you need precise TCE.
Interestingly the Bobrow paper seems to have been published a year earlier as a BBN Scientific Report: https://apps.dtic.mil/dtic/tr/fulltext/u2/647601.pdf but I haven't compared this version to the ACM version. On p.15 he describes in a footnote the degree to which his approach, which definitely will not work for upwards funargs, does work in BBN LISP:
> *With appropriate passing down of pushdown list [call stack] pointers we can even achieve the same effect as the standard FUNARG device on the 7094 for functional arguments. This FUNARG device preserves local context well enough to handle the Knuth's [sic] "Man and Boy" compiler problem (Algol Bulletin 18.1).
I admit I'm certainly no expert on late-1960s LISP code; my acquaintance is mostly limited to what appears in popular AI memos and HOPL papers.
Early LISP history is a bit blurry, and ancient implementations are
full of surprises! Of course, the FUNARG problem was solved intheory
in the LISP 1.5 manual, but, as you noted, at the cost of linear symbol
lookup. This is probably the reason why this solution never made it into
the actual implementation. Lexical scoping in combination with shallow
binding did not even appear on the horizon before the LTU papers were
published. Program performance was important on machines operating in
the 200k instructions per second range and lexical scoping appeared to
be expensive (it is, compared to dynamic scoping).
TCE was also a non-issue before the LTU papers, because it was (1) not
necessary, you could just use a tag body, and (2) impossible to get right
with dynamic scoping. Either you unbind after tail-calling and get no TCE,
or you unbind before tail-calling and get a FUNARG problem.
If you want to experiment with LISP 1.5, there are 709 emulators out
there, complete with original LISP tapes.
E.g.: http://www.cozx.com/dpitts/ibm7090.html
Be prepared for some weirdness, though! Programs are entered as
CONS (FOO BAR)
meaning (CONS (QUOTE FOO) (QUOTE BAR)), there are no comments, and
in-situ lambda functions do not work. For instance,
>If you want to counter Ken Thompson’s “Trusting Trust” attack, you would want to start with a minimal compiler on a minimal chip; StoneKnifeForth might be a good approach.
Some people have been working on fairly good Forths for different, minimal hardware including the Parallax boards.
The ga144 seems more like a replacement for a cpld+mcu so I wouldn't call it "high performance" in the sense that a mid ranged arm board will out perform it on terms of raw throughput for most tasks.
Well, I think an FPGA might be a closer equivalent. The GA144 can do something like 100 billion 18-bit integer additions per second, or two billion 17×17-bit integer multiplications (my rough notes on http://www.greenarraychips.com/home/documents/greg/WP003-100... are where I'm getting that). With a CPLD like the US$1.00 Altera 5M40ZE64C5N you might be able to get 100 million 16-bit integer adds per second, three orders of magnitude slower — and CPLDs don't usually have multipliers on them, so multiplication will be slower by the same amount.
What do you mean by "a mid ranged arm board"? This Blue Pill is a US$2 ARM board, and a MacBook Pro is a US$2048 ARM board, so maybe something like the geometric mean, US$64? Maybe like https://www.digikey.com/en/products/detail/stmicroelectronic... a US$54 STM32F750N8† eval board?
Well, that has 64 KB of Flash and 340 KB of RAM (an order of magnitude more than the 9216 18-bit words on the GA144), and a screen, and a lot of I/Os and integrated peripherals, and the MCU has a bunch of fixed-function blocks like AES and SHA-2. But "in terms of raw throughput for most tasks" I don't think it's anywhere close: it's just a 216-MHz Cortex-M7, which is a 32-bit 2.1DMIPS/MHz dual-issue core with a 6-stage pipeline, "DSP instructions", and floating point. The "DSP instructions" turn out to be multiply-accumulate and integer SIMD, so you can do four 16-bit multiply-accumulates per cycle instead of one, if the pipeline doesn't stall. Or eight 8-bit multiply-accumulates.
That adds up to about 0.9 billion 16-bit multiplies per second, which is indeed in the same league as the GA144. But it's also only 0.9 billion 16-bit adds per second, or 1.8 billion 8-bit adds, which is short of the GA144's 100 billion adds per second by about a factor of 64! While using an order of magnitude more power (thus "GreenArrays", I suppose.) And it's a very inflexible 0.9 billion multiplies — it's hard to not stall the pipeline and to ensure that you're not accidentally serializing on a single functional unit. The GA144 cores don't have pipelines — but you'll probably build pipelines when you floorplan your app on it.
I mean, I imagine the AES core on the STM32 is actually faster than AES on the GA144, which takes 38 μs per 128-bit block using 17 of the 144 processors.※ That's 420 kilobytes per second, or 3.3 megabytes per second if you replicate it 8 times. I don't know how slow the STM32's AES hardware is but I bet it takes less than 70 clock cycles per byte. That's a bit faster than (one core of) this laptop's CPU, an i7-3840QM at 2.8 GHz, with no AES-NI, which gets 2.1 megabytes per second. But it's not very flexible. You can switch those 17 cores to run a different application in about a microsecond; the AES hardware is forever AES hardware.
With floating-point math I think the comparison might be arguable. The GA144 has to implement floating point in software, like we did on our 486es, which usually imposes a speed penalty on the order of 32×. The ARM core has single- and double-precision floating-point hardware. The GA144's throughput is around 128× as many instructions per second as the ARM's, but it has to multiply a single multiplier bit at a time, so it might end up being slower in this case.
Believe me, I love me some "mid ranged arm boards" for "raw throughput" (see http://canonical.org/~kragen/sa-beowulf/ for some context) but I think an FPGA is a closer analogy. A small FPGA, like a US$6 ICE40UP5K, which has eight 16-bit multipliers running, I think, up to about 100 MHz, so it can do nearly one billion multiplies per second. But when it comes to addition, or state machines, or square roots, or FFTs, the GA144 hopelessly outclasses it. You need to move up to a much larger FPGA to get into the same order of magnitude.
So why would anyone use a CPLD or an FPGA or an MCU when they could use a GA144? Because you can program the MCU in C or Python, and you can program the CPLD or FPGA in Verilog, but if you want to use the GA144 you have to program it in Forth. Worse, until recently you had to program the GA144 in a dialect of colorForth. With an FPGA nextpnr does the floorplanning for you; with arrayForth you have to do the floorplanning. There are efforts to improve this situation, like Chlorophyll‡ ("Compared to MSP430, GA144 is 19 times more energy efficient and 23 times faster when running this [accelerometer-based hand-gesture recognition] application"), but they're still research projects, and they still require you to learn new languages.
C was a research project in 1972, Verilog was a research project in 1986, and yosys was a research project in 2014 (and still is if you want to use the 5M40ZE64C5N), but now they're battle-tested tools that lots of people know how to use. The GreenArrays tools to allow you to program at a higher level are not there yet, because Chuck Moore doesn't think that's the way to do things.
Disclaimer: I haven't used any of the chips mentioned above except the i7, so probably some of the things I said above are wrong. Corrections would be greatly appreciated.
I built an AOT compiler for Whitespace with its own SSA-form intermediate representation that lowers to LLVM IR. Whitespace is very similar to a minimal Forth, except, well, syntax. Since StoneKnifeForth has no filesystem access, only stdin and stdout, the principles could be easily adapted for Whitespace.
Oh hey, I'm glad people are enjoying SKF. Last week I was thinking that it might be interesting to port it to RISC-V, which involves maybe a certain amount of perverse instruction encoding pain, but is otherwise very much simpler than i386.
Is it Forth week on HN :-? I like this but only allowing single letter identifiers is an unfortunate choice - although I'm sure it made parsing much simpler.
How strange, I just picked up a book from my bookshelf, my old copy of Starting Forth by Leo Brodie. I was trying to remember if Forth was created by Charles Moore or Calvin Moores. Charles Moore invented Forth.
I really enjoyed reading Starting Forth when it came out. I've used Starting Forth together with Threaded Interpretive Languages: Their Design and Implementation by R. G. Loeliger to create my own Forth system. This second book is not about "threads" as we know them today; threading is a way to very efficiently dispatch the execution of Forth primitives.
Calvin Moores is a language inventor too. I've also implemented a version of his language TRAC. TRAC is mentioned prominently in a book that I really enjoyed while in grad school in the 70's, Computer Lib by Ted Nelson. Described by Wikipedia as the first book on personal computers.
The appeal of Forth, and even more so of TRAC, is that they are very simple languages to implement. It is completely possible to implement either language in assembly language, but C would be a bit easier. As a project to learn Perl, back when it was new, I wrote a TRAC interpreter over a couple of days.
TRAC, like the roughly contemporaneous language GPM invented by Christopher Strachey, is a macro processor. GPM and TRAC are "Turing complete" macro processors. So they can be used to write arbitrary programs and are in fact designed for this purpose.
Some links/references:
Leo Brodie, Starting FORTH : an introduction to the FORTH language and operating system for beginners and professionals, Prentice-Hall, 1987. https://www.forth.com/starting-forth/
Speaking as somebody who submitted a couple of Forth links (but not this one) - I saw there were two Forth related links on the front-page at one point yesterday, was reminded of the famous "Erlang Day" incident in HN history and thought "wouldn't it be cool if HN were flooded with Forth links for a little while?" and submitted a couple. Can't speak for anybody else, but I'm guessing other people had a similar reaction.
These things tend to happen on HN - someone discovers something new or new to them and posts something and triggers others to read up on the subject and/or reminds people of related projects, and suddenly you have a slew of submissions about similar things.
IIRC the reason I only allowed single-byte identifiers was to simplify the symbol table, which is implemented by these four lines of code:
: Type Four* header 6144 + + ; ( Table of definition Types: 1=code, 2=data )
: Addr Type 1024 + ; ( table of definition Addresses )
( register a colon definition )
: Colon %flush dup Type 1 xchg ! HERE @ xchg Addr ! ;
( register a variable definition )
: Var %flush dup Type 2 xchg ! HERE @ xchg Addr ! ;
Abbreviating this as follows would clearly be highly immoral, so I would never do that:
: T F h 6144 + + ; : A T 1024 + ; : C % d T 1 x ! H @ x A ! ; : V % d T 2 x ! H @ A ! ;
"Parsing" is kind of the same as it would be if more bytes in the name were significant, because the other bytes in the name are still there in the source and the compiler still has to skip them:
What's simpler is that it doesn't have to store them anywhere, or implement any kind of string compare operation, and consequently doesn't use C@, and consequently doesn't have to implement C@ either. It implements @ and ! and C! (which it calls "store"; I should have called it "c!" since capital "C" was already taken for "colon"). It needs C! ("store") for emitting sub-word-length instructions and at any rate for backpatching jumps.
It's definitely a questionable choice, though! I think it might be worth using two significant letters instead of one for a "tiny boot FORTH".
Traditional FORTHs, you probably recall, used a fixed-size four-byte "name" in their dictionaries: a length byte and the first three letters of the name. Chuck Moore received some criticism for this choice, in response to which he wrote a letter to a popular FORTH magazine arguing, effectively, that thr-- let---- was ade----- to tel- wor-- apa--; col------- bet---- wor-- wer- suf--------- rar- wit- the fir-- thr-- let---- bei-- sig--------, and of cou--- it was mor- eff------. FTP and SMTP commands are all four letters long for the same reason.
Well, not all the hackers have yet been driven from "Hacker News" yet, I guess. Where there are hackers, there will be FORTH — as Devine Lu Linvega said, "Lisp is a slippery slope that leads inevitably to FORTH." DSSP was an independent Soviet invention (on a balanced-ternary machine!) but its differences from ANS FORTH are smaller than the differences between Moore's early text interpreters and polyForth or ANS FORTH.
FORTH's a bit of an occupational hazard of hacking, you might say. Like suicide, or being burned at the stake. (It's been two days since the anniversary of Giordano Bruno being burned at the stake for being a hacker, though he was too early to write a FORTH himself.)
No! I remodeled the basement, picked the lock, hopped a freight train, and married into the gentry years ago. Since then, I've founded a company, been betrayed by my closest friends, written a book (which turned out to be unreadable), set myself on fire repeatedly, violently subdued villainous bandits, fallen in love with beautiful women, broken their hearts, had my heart broken by them, and digitized the Oxford English Dictionary.
I've read through ancient, forgotten books in faraway lands; I've learned strange tongues and seen fortunes won and lost; I've comforted the grieving, and I've restrained deadly robots as they wriggled to escape my grasp.
I've comforted the suicidal, and I've broken the trust of those who placed their hopes in me.
After midnight, I danced in the moonlight with a beautiful Brazilian as we stood guard outside her now-demolished ecovillage; she told me she'd thought she'd lost the ability to fall in love. We didn't need any music.
I've studied at the knee of wise elders, meditated in monasteries, translated poetry, and embraced madmen who predicted my demise in their psychotic ravings.
I've made foolish mistakes whose consequences maimed those I loved, and I've averted catastrophe innumerable times.
And most of that is even true.
Thanks for asking! I hope you're doing well too :)
Too much? I've been posting things I think other people might find interesting while researching implementing a Forth compiler after having just done a Scheme.
I love that it produces actual runnable binaries instead of emitting assembly or C, like so many tiny compilers do. I know it's technically the job of the assembler and the linker, but without these parts it kinda feels like cheating.
For this sort of thing I bounce back and forth between targeting C vs
ASM. C is so universal and there are so many tools you can
use to harden it. On the other hand, you're still using a big ol'
abstraction layer between C and the machine. On the gripping hand, CPUs
are doing so many things under the hood now that ASM can be
considered an abstraction layer obscuring the actual machine, so you kind of need a modern compiler to have a hope of taking advantage of all that without doing a ton of research.
Over on the "αcτµαlly pδrταblε εxεcµταblε" page ( https://justine.lol/ape.html )
there's a subsection called "x86-64 Linux ABI Makes a Pretty Good Lingua Franca"
it seems to me to be a reasonable suggestion (although I've never like
x86 myself.) You can bundle an x86 emulator with your code to run on
other architectures.
I like the Project Oberon 32-bit RISC developed by Prof. Wirth as a
target. It's really simple for educational purposes. (Meaning you can
more easily teach more people Oberon RISC than, say, RISC-V. ALthough
see the "Selfie" project: https://selfie.cs.uni-salzburg.at/ "An
educational software system of a tiny self-compiling C compiler, a tiny
self-executing RISC-V emulator, and a tiny self-hosting RISC-V
hypervisor.")
There are many emulators for Oberon RISC already, including one in Javascript, so
you can run the old Oberon OS in your browser! Launch an emulator with a
disk image from here: https://schierlm.github.io/OberonEmulator/
I have done something along the lines of TFA a long time ago, a system that could boot from a floppy and recompile itself from source (never did anything useful with it), but today I daily use a Forth dialect written in C that is designed to play nice with C and the OS.
C is a sure bet. Interpreters that rely too much on JIT are simply not available on processors where the JIT is not available (duh), which was often the case for ARM before it became really popular. Same thing with relying on an emulator (also, you are running a VM in another kind of VM; it's usually bad for performance).
As far as speed is concerned, if your Forth implementation makes it really easy to extend, you can always get something fast enough by re-coding the critical parts in C. That's why I stopped worrying about which implementation technique is the fastest and chose the one that made it the most trivial to extend the interpreter.
> That's why I stopped worrying about which implementation technique is the fastest and chose the one that made it the most trivial to extend the interpreter.
I like that, it appeals to my sense of pragmatism.
I wrote a Joy interpreter in Nim the other day, and when it came time to switch from machine ints to BigInts it was a breeze. There was a library already written, I updated the type and the compiler told me what else to change, the whole thing was over in fifteen minutes. And if I ever find that integer math is a performance bottleneck there is also a wrapper for GMP ready to go.
- - - -
BTW, I'm a big fan of Oberon, and I think Astrobe is awesome and admire you for making and supporting it. Cheers!
Is that dialect written in C available online? I'm collecting implementations at the moment. If you know any other C implementations besides the popular ones I'd be grateful for pointers to those too.
I've been wanting to put it online for a long time to show how I do it for cases like this, but I need to strip it first of semi-confidential stuff because I hack it a lot to do work-related things. And it also need some documentation to be useful as an "educational" Forth. Maybe one day...
For Forth implementations, although I suspect it might be yourself (you gotta admit this sort of fetish is not very common ;-), Charles Childers [1] is collecting them.
pfe ("portable Forth environment") was the first one I used; it was included in Slackware96 and included a Tetris game. You're probably aware GForth, BigForth, and F-83 are mostly written in Forth. I recently read through Retro-Forth, which compiles to an interpreter for a bytecode called Nga, written in C. There was another one I ran into recently but can't find now... aha, SOD32: https://lennartb.home.xs4all.nl/forth.html
Nowadays it's sometimes adequately fast to implement a JIT by spawning gcc or tcc and then running dlopen() on the result ;) It also makes the FFI easier :)
Technically I should still have a floppy and its backup, but the old laptop I used to make it is dead. It's kind of difficult to find floppy drives these days, and the whole thing is so old that the floppies might be no more readable.
The dlopen trick is something I'll remember if I need that sort of stuff, but I've grown wary of FFIs. With libraries, there's often #define constants or macros that make FFIs a bit awkward. I prefer to use static linking if the lib is not too big, or dynamic loading but still with light bindings if not. If you know a bit the internals of Lua, I use a very similar method to extend Forth's "vocabulary", and the loading (and manual linkink with dlsym) is done only when the extension is invoked and initializes itself.
Very nice.
Building a new FORTH using FORTH might have been easier; FORTH is traditionally very good at rebuilding itself. e.g. WimpForth on a PI400 recompiles its own kernel to an absolute image (executable) in 00:00:007.
FORTH week on HN is great.
The main trouble with FORTH is remembering to stop playing with FORTH and actually write the App.
Yeah, it probably would have been easier to metacompile from an existing FORTH, but the whole point was to not have a lot of dependencies on existing code.
Amen! Those metacircular interpreters are neat, and very useful from a learning-programming-languages standpoint, but they gloss over about 90% of the details and pain of language implementation.
I was really looking forward to Paul Wilson's An Introduction to Scheme and its Implementation (http://www.aboulsaadat.com/wael/teaching/324/exdocs/schintro...), but at the time it was really incomplete. Looks like there's more now, but still no garbage collection as far as I can see.