A Lisp interpeter in a thousand lines of Bash

VLM · on Jan 13, 2014

As a nostalgic trip, a little over 30 years ago I was playing with Randall Beer's LISP interpreter which ran in MS Basic on a TRS-80 model III, very slowly, as seen on page 176 at this link (this is the first in a multiple article series)

https://archive.org/details/80-microcomputing-magazine-1983-...

I distinctly remember as a kid it was very slow indeed, but interesting.

I got lost reading the ads. In retrospect, computing used to be a much more expensive hobby than it is today. Not just relative terms, but absolute terms. Then again, people are much poorer now, so its required.

Anyway, since 1983, he became a neuroscience prof and mentions his BASIC LISP on his homepage

http://mypage.iu.edu/~rdbeer/

The line numbers are not consecutive, but I think he's well under a thousand lines of BASIC, there just aren't enough pages of code in the listing to exceed that.

And yes, this was considered reasonable coding style back then. That is why this generation never shrank in terror at the sight of bad Perl code. Why yes, this is a bit hard to read, but I've certainly seen worse...

Morgawr · on Jan 13, 2014

Slightly related to the "let's implement Lisp on weird languages/platform" theme, here's awklisp, a project that apparently helped/inspired Dipert to write Gherkin: https://github.com/darius/awklisp

brudgers · on Jan 13, 2014

In 1983, the TRS-80 was a mainstream computer. BASIC was also a mainstream language - so was Assembly. Even a decade later, before the explosion of the internet, getting one's hands on a C compiler typically meant paying for a commercial implementation...probably on floppies and delivered over the shoe net.

VLM · on Jan 13, 2014

This is true, but I think by the standards of the 2010s most of the defining characteristics of early 80s MSbasic would be considered really weird.

Line numbers. Control flow exclusively by if/then and for/next. We Love GOTO (noobs probably don't know what a GOTO is or why some considered them harmful). No namespaces, everything was a giant global shared namespace where $T meant the same thing everywhere. No naming conventions for variables (like Hungarian or CamelCase). No one used revision control. No unit tests. No symbolic debuggers (they were coming, soon, and something like them existed for assembly, but this was 1983 and not-assembly). For better or worse, no REGEXes. No object orientation, pure procedural. Everything is in one file because quite a few people had no disk drive and relied on cassette tapes. Line editing was a little crude compared to modern vim/emacs and pretenders to the throne. No IDEs until Turbo Pascal and the like quite a few years later (Or was quickbasic first? Either way it would be a long time...)

By modern standards MSBASIC is pretty weird.

brudgers · on Jan 13, 2014

BASIC is perhaps one of the less weird things about computing the early 80's. It was the high point for women studying computing in universities - well over a third of all computer science degrees were awarded to women: three times the ratio found today. [1]

But more to your point, by the 1990's the idea that computer languages should be understandable by non-programmers with a reasonable education and a general familiarity with the principles of computing was dead.

Think of CoBOL. If its approach hadn't been abandoned for the obfuscations of C++ and Perl, Eric Edwards wouldn't have written a book and gone on the lecture circuits to spread the gospel of ubiquitous language. There's a reason the unschooled learned HTML and PHP in the 1990's - they were accessible to moderately educated people and could do useful work in the same way that CoBOL and BASIC were by design.

The TRS-80 ran a version of TinyBasic. The use of line numbers and GOTO allowed BASIC to run closer to the metal - GOTO 15 is an Assembly Language JUMP to the address of where ever the instruction on line 15 mapped by the assembler. Without it, BASIC would not have been such a successful path to a higher level programming language for serious programmers steeped in assembly language. GOTO is handy if you want to translate from Knuth's MIX without a lot of fuss.

[1] http://en.wikipedia.org/wiki/Women_in_computing#The_Gender_G...

mikeash · on Jan 13, 2014

Many older BASIC implementations are interpreted, and it looks like Tiny BASIC was as well:

http://en.wikipedia.org/wiki/Tiny_BASIC

It's strange to look back and see just how popular virtual machines were at the time. BASIC typically used one, as did many other languages. Smalltalk is famous for using a virtual machine, for example. Microsoft's original Mac apps all ran bytecode in a virtual machine.

It seems crazy, because these computers were already tremendously slow, relatively speaking, and adding a virtual machine makes it much worse. However, it was ultimately a useful tradeoff because these machines were even more limited on RAM than they were on CPU power, and using a virtual machine with bytecode that allowed for an efficient instruction encoding could save a lot of space. It doesn't matter how fast your code runs if it doesn't fit in RAM, after all.

brudgers · on Jan 13, 2014

Speed is of course relative. Interpreted BASIC is faster than working things out by hand. For something critical, Assembly Language was always available.

On a TRS-80 Model 1, compiling means the compiler, the input and the output have to live in 4k of ram (or 16k in the later versions).

Considering that the Level I TinyBasic interpreter lived on a 4kB ROM; Level II lived on a 12kB ROM; and Mass storage for most early machines was audio tape - not only slow but also notoriously prone to not loading files correctly, the compiling code would have been great for masochists, not so good for people who were just trying to get something done.

And that's before considering the complexities of tuning a compiler to optimize code.

VLM · on Jan 14, 2014

Having been there, there exists an intermediate step of tokenized code. So you store ascii strings as .. ascii but as you enter source code a "then" as in if/then gets tokenized into hex 0xD6 or something. So the poor CPU doesn't have to run a full lexer at runtime to see if the "t" belongs to "to" or "then" it just matches hex 0xd6 which is much faster. This works real well if you have 128 (or so) or less tokens in your language. This can also save a huge amount of memory, depending on your coding style I suppose.

Tokenization also allows some syntax error detection to occur as you type code in, which was interesting. I don't remember enough about this. Obviously some mistakes won't tokenize at all or will tokenize into gibberish.

So tiny basic in memory stored plain old ascii and saved plain old ascii to cassette tape. lvl2 msbasic stored tokens in memory although it could optionally save pure ascii to cassette tape. This had some interesting software distribution issues and compatibility issues as it was sorta kinda half way possible to save something on lvl1 and load it into lvl2 if you were careful and vice versa.

VLM · on Jan 13, 2014

The TRS-80 model III level 2 didn't run tinybasic anyway, it was licensed msbasic. Same as basica on dos. Applesoft basic was msbasic plus some graphics.

Level 1 basic was pretty much a model I 1979 thing only. I believe level 1 was technically available for the M3, but...

The article was more or less contemporary with the M4 which was 80 columns and used a licensed ldos instead of trsdos and I'm pretty sure was level2 basic only. So by the time of the article L1 basic was about two generations and 4 years out of date.

Also I recall Radio Shack sold the L2 upgrade eprom for something ridiculous like $19 so a L1 only machine was probably a 1979 experience (before the release of L2) or somewhat unusual in not having been upgraded.

ams6110 · on Jan 14, 2014

IIRC, Borland Turbo C was available by the mid/late 1980s, and was not too expensive.

FigBug · on Jan 13, 2014

Long ago I wrote a Scheme interpreter using Prolog. It's a horrible mess because I had no idea what I was doing.

https://github.com/FigBug/scheme

RBerenguel · on Jan 13, 2014

You should see some code I have written to parse a few English grammar subsets... That is messy, your code is extremely clean!

breadbox · on Jan 13, 2014

I also took the time to type in Randall Beer's Lisp from 80 micro. In fact, I'm pretty sure that that was my first time accessing a working Lisp interpreter. (Mostly working, anyway. I think I had a typo that I never managed to track down.) Unfortunately I didn't really know what to do with it. I didn't really learn how to use Lisp until I spent several days trying to write Lisp programs in college, and finally started to understand the functional paradigm. But yeah, that may have been one of the longest magazine programs that I ever typed in.

wfn · on Jan 14, 2014

> since 1983, [Randall Beer] became a neuroscience prof and mentions his BASIC LISP on his homepage

> http://mypage.iu.edu/~rdbeer/

Just wanted to say thanks for linking to this fine chap. He seems to have a cross-section of interests such that I can really relate to and draw from. In particular, some people here might be interested about the intersection of cellular automata theory and (what it has to say about) cognition, autonomous systems, etc. (see e.g. [1, 2]) This is one of his approaches to understanding dynamic systems and how coordinated bahaviour can arise in them. See his publication list, too. [3]

Apparently formal treatments / approaches to autopoiesis have been developed for quite some time (currently recommended books seem to be from e.g. 1980 and onward.) Interesting indeed! :)

[1]: http://pubs.cogs.indiana.edu/pubspdf/34233/34233_varela.pdf

[2]: http://mypage.iu.edu/~rdbeer/Papers/Beer2014.pdf

[3]: http://mypage.iu.edu/~rdbeer/pubs.html

mikeash · on Jan 13, 2014

This is fascinating. I assumed it was yet another ridiculous attempt to build something in an environment completely unsuited for it, but it seems that they are serious. But they're also sufficiently aware of the craziness of the project that the first thing they do is explain just why the heck they're doing it:

https://github.com/alandipert/gherkin/wiki/Why-gherkin%3F

The short version is that bash is the closest thing to being universally available on every UNIXoid system no matter what, and so by writing stuff in bash, you make it so that it can run everywhere. But because bash sucks to program in, this is a minimalist interpreter for a sane language. You can then write programs in that language, and they will only depend on bash and on this interpreter, and the interpreter is simple enough not to need any sort of complex installation.

I can't quite think of a use case for this where it's not worth e.g. installing Python first, but it's an interesting project all the same.

lloeki · on Jan 13, 2014

> I can't quite think of a use case for this where it's not worth e.g. installing Python first

Whereever you get Bash, you can reasonably assume Perl5 (unless in an initrd or something). Even on some old AIX 4 I had a readily available Perl 5.005.

Nonetheless I wish there were more actual shells that were not sh descendants.

mmastrac · on Jan 13, 2014

Bash is probably a stretch in more initrds anyways (IIRC, Redhat used a very light sh-like shell).

chubot · on Jan 13, 2014

Yeah, I guess bash is technically more available than a C compiler. But I think for the overwhelming number of use cases, you could just do "cc -o lisp single-file-lisp-interpreter.c" (there are many compact options). Not to mention just installing Python :)

Also I wonder if awk would have been a better language than bash. I think awk is more available across various Unixes.

If anything, the length that they are willing to go to do this points to the brokenness of package managers. There are still a lot of disadvantages to a package manager versus cp-ing some code.

seryoiupfurds · on Jan 14, 2014

> lisp in awk

Here you go!

http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/l...

rogerrohrbach · on Jan 15, 2014

I made that 25 years ago. I've just downloaded it onto my MacBook Air, changed the shebang to reference /usr/bin/awk, and fired it up ("./walk walk.w p -"); it works as advertised. You can still format the doc with "nroff -ms walk.ms", too.

chubot · on Jan 16, 2014

That's awesome! A testament to the timeless power and stability of Unix :)

parfe · on Jan 13, 2014

Written for Bash 4.0 so OS X will not run without additional work because Apple refuses to ship GPL 3.0. Otherwise really cool project. I love bash software (when other people do the coding).

gaius · on Jan 13, 2014

I started a project like this about 10 years ago, but then I discovered that you could just compile Lisp on your own workstation and upload it to prod with a .sh extension and no-one would actually check, they would just blindly run it. Not even the size was suspicious. Used the same trick abit later with OCaml and Haskell, you just compile them as whatever.py and no-one's any the wiser.

shawndumas · on Jan 13, 2014

ReadMe ==> https://github.com/alandipert/gherkin/blob/master/README.md

cbsw · on Jan 14, 2014

+ - * / even doesn't support multi-data. (+ 1 2 3) would be 3,stupid

crnixon · on Jan 14, 2014

Cool, awesome, great point. I tried Googling for your implementation and couldn't find it. Could you drop a link?

wooby · on Jan 14, 2014

Yes, the arithmetic primitives don't support variadic arguments. You can see how + works - and that it only deals with the first two arguments - here: https://github.com/alandipert/gherkin/blob/278354246aebf14b8.... A pull request fixing this would be welcome.

In the meantime, for variadic addition, one can do:

(load-file "core.gk") (reduce + 0 '(1 2 3)) ;=> 6

mzs · on Jan 13, 2014

There a UUOC in strmap_file, in fact all those uses head, tr, and tail could likely be just handled by sed.

finin · on Jan 13, 2014

interpeter => interpreter