There is a very cool paper about structuring query compilers that I've read a couple years ago, and came back to recently when doing some experimentation.
The paper[0], "How to Architect a Query Compiler, Revisited", presents an approach to structuring compilers based on Futamura Projections. In practice this means that you write most of your high-level code as though it was an interpreter, but then mainly rely on constructors and operator overloading (and occasionally special functions) to emit compiled code for your expressions and nodes under the hood.
The paper should be interesting to anybody who is also interested in the posted link.
"based on Futamura Projections" is kind of cheating: they don't have a `mix` specialization function, which is the entire point of Futamura's paper. They're doing automatic symbolic execution using operator overloading, but I'm looking extremely suspiciously at them referring to that as "the first Futamura Projection without mix".
Working through _Crafting Interpreters_ atm. Both an educational and enjoyable read. I find the author's ability to reason across programming languages as a skill to aspire to. It feels like a meta way to think of programming languages. As any language is an instantiation of a set of parameters, made through the design decisions.
> Second, we get to write an actual, honest-to-God compiler. It parses source code and outputs a low-level series of binary instructions.
Your compiler doesn't need to emit byte or machine code to be real or useful. High level to high level compilers can be incredibly useful and relatively easy to write. For optimal runtime performance, you can target c or some other low level language and let it deal with register allocation, instruction selection and shared lib/executable generation.
A transpiler is a sort of compiler. A single compiler might emit LLVM, C, or JavaScript, depending on the target. That doesn't mean that it's only a compiler sometimes, it means that some of the modes of the compiler transpile.
Recursive descent is great for parsing most of an algol-like programming language.
Once you get into expressions, you'll really want to do precedence climbing or Pratt parsing to get things working correctly.
This assumes that you want PEMDAS-like operator precedence in the first place. It does seem like a waste once you've implemented it and realize how little it's actually utilized by most code. Maybe the Lisp and forth people have a point...
Even with recursive decent, you still need a strategy to deal with left-associative binary operators. This is where Pratt parsing comes in: not to replace recursive descent, but to supplement it.
Yep. And whenever I've implemented a recursive-descent parser and dealt with left-associative operators, I've just inlined the equivalent of additive-expression2 and the loop. E.g.:
def parseAddExpr():
parseMultExpr()
while peekToken in ["+", "-"]:
consumeToken()
parseMultExpr()
Honestly, I really like recursive-descent parsing, even for expressions. It's very straightforward once you've learn a few simple patterns like that.
Decently large precedence hierarchies (15-20) don't bother me; I'll typically just number the levels rather than use names like "term", "factor", "additive", or "multiplicative". It can get a little verbose, yes, but it's easy to write and skim through. (Golang-like from what I've read, though I'm not a user of Golang.)
But I also find it scales better once you start adding things like ternary expressions, function calls, and type casts -- things that aren't just simple binary operators. Handling those feels a lot less clean to me with Pratt, precedence climbing, shunting yard, etc.
One main downside that I'll admit to recursive descent though is that it's a bit harder to have a language with user-defined operators. I don't like those much, though, so that's not much of a loss to me.
In the second grammar, additive-expression2 is still contains a recursive call in the right-hand term, so it would produce a parse tree that's right associative. For example, given the string `A - B - C` you'd get:
(additive-expression
(multiplicative-expression A)
(additive-expression2
-
(multiplicative-expression B)
(additive-expression2
-
(multiplicative-expression C)
<null>
)
)
)
Which corresponds to the right-associative expression `(A - (B - C))`. You could do some post-processing on the results to invert it and make it left-associative, but I don't think the two grammars are equivalent.
I get what you're saying: the grammars are equivalent in the sense that they match the same language.
Maybe "equivalent" was the wrong word to use on my part. My point is that this refactoring doesn't actually give us an implementation that solves the issue of left-associativity. As you said, you still need to introduce some sort of loop to do this. It doesn't have to be complex, but it is different from the right-associative case where expressions are built up directly through recursion. My argument is that once you introduce explicit iteration, you're no longer really doing recursive descent, you're doing Pratt or shunting-yard or something else.
The paper[0], "How to Architect a Query Compiler, Revisited", presents an approach to structuring compilers based on Futamura Projections. In practice this means that you write most of your high-level code as though it was an interpreter, but then mainly rely on constructors and operator overloading (and occasionally special functions) to emit compiled code for your expressions and nodes under the hood.
The paper should be interesting to anybody who is also interested in the posted link.
[0]: https://www.cs.purdue.edu/homes/rompf/papers/tahboub-sigmod1...