Summary of parsing with derivatives (Brzozowski 1964) for the unfamiliar (like me):
For example, with respect to the character f, the derivative
of the language for which [[L]] = {foo, frak, bar} is
Df (L) = {oo, rak}. Because foo and frak start with the
character f and bar does not, we keep only foo and frak
and then remove their initial characters, leaving oo and rak.
We repeat this process with each character in the input
until it is exhausted. If, after every derivative has been performed,
the resulting set of words contains the empty word,
then there is some word in the original language consisting
of exactly the input characters, and the language accepts
the input. All f this processing takes place at parse time, so
there is no parser-generation phase.
The specific algorithm they build off (Might 2011) has extended this to memoize repeated derivations of the same arguments and to allow for lazy evaluation in the case of recursion.
But this algorithm was implemented in a mind-bendingly-slow way:
For example, they report
that a 31-line Python file took three minutes to parse!
Their implementation is orders of magnitude faster of prior PWD implementations, but still orders of magnitude slower than a Bison parser of the same grammar, which they attribute to language choice (C vs Racket).
Please correct me if I'm wrong about any of this! Summarizing research like this helps me understand it, but that doesn't mean I actually understand it.
Yeah, that's accurate! I'm glad we finally know the running time for parsing with derivatives.
Let me add some more background, about stuff that I would expect to be covered in this paper's related work section, except that it doesn't have one...
Most well-known parsing algorithms that handle arbitrary CFGs run in cubic time, so this paper's O(n^3) running time is about the best you would expect. Some other parsing algorithms that also handle arbitrary CFGs in cubic time include CYK[1], Early[2], and GLR[3]. I used to be excited about parsing with derivatives because of its simplicity, but CYK and Early parsers are both actually simpler than parsing with derivatives, once you've thrown this paper's optimizations in.
One important note is that this holds for general grammars that can contain everything. If the grammar is restricted somehow, for example to lr(k) then the best complexity is linear (there are even parsers that manage both LR(n) in linear time and general parsing in cubic).
A boolean matrix multiplication problem can be turned into a CFG problem. Therefore, an algorithm that solves CFG problems can be turned into an algorithm that solves boolean matrix multiplication.
Sorry, yes, i deleted a line accidentally, but it turns out you can show both directions, and to prove a tight bound, you need to show they reduce to each other without changing the complexity class.
Out of curiosity, do any of the other algorithms have the ability to be "paused" arbitrarily, i.e. with inverted control? I'm thinking specifically of parsing input from an asynchronous interface.
Derivatives do that well, since input is essentially pushed into the parser.
Pausing is very straightforward with bottom up algorithms like GLR. The entire state of the parse is stored in an explicit stack data structure, as opposed to implicitly in the program's call stack as with top-down algorithms. So resuming the parse is as simple as resuming the parse loop using a previously-stored stack.
I think this is a bit of an orthogonal issue. Inversion of control is a problem that can be solved at the language level with coroutines / continuations / threads / etc without forcing you to choose a special type of algorithm.
> CYK and Early parsers are both actually simpler than parsing with derivatives, once you've thrown this paper's optimizations in.
Yeah, GLR is very elegant as well; I'm a little puzzled by this paper's initial claim (and similar claims in the Yacc is Dead paper):
> Current algorithms for context-free parsing inflict a trade-off between ease of understanding, ease of implementation, theoretical complexity, and practical performance. No algorithm achieves all of these properties simultaneously.
GLR is has good performance and isn't much harder to understand than LR, which is a very elegant theory. So I assume that 'ease of implementation' is the property the authors think GLR lacks. But ease is a little hard to measure, until you have an equally fast and production-ready implementation of both algorithms.
But this algorithm was implemented in a mind-bendingly-slow way:
They prove that the worst case is actually cubic time and provide a concise cubic-time implementation in Racket: https://bitbucket.org/ucombinator/derp-3Their implementation is orders of magnitude faster of prior PWD implementations, but still orders of magnitude slower than a Bison parser of the same grammar, which they attribute to language choice (C vs Racket).
Please correct me if I'm wrong about any of this! Summarizing research like this helps me understand it, but that doesn't mean I actually understand it.