Yeah, that's accurate! I'm glad we finally know the running time for parsing with derivatives.
Let me add some more background, about stuff that I would expect to be covered in this paper's related work section, except that it doesn't have one...
Most well-known parsing algorithms that handle arbitrary CFGs run in cubic time, so this paper's O(n^3) running time is about the best you would expect. Some other parsing algorithms that also handle arbitrary CFGs in cubic time include CYK[1], Early[2], and GLR[3]. I used to be excited about parsing with derivatives because of its simplicity, but CYK and Early parsers are both actually simpler than parsing with derivatives, once you've thrown this paper's optimizations in.
One important note is that this holds for general grammars that can contain everything. If the grammar is restricted somehow, for example to lr(k) then the best complexity is linear (there are even parsers that manage both LR(n) in linear time and general parsing in cubic).
A boolean matrix multiplication problem can be turned into a CFG problem. Therefore, an algorithm that solves CFG problems can be turned into an algorithm that solves boolean matrix multiplication.
Sorry, yes, i deleted a line accidentally, but it turns out you can show both directions, and to prove a tight bound, you need to show they reduce to each other without changing the complexity class.
Out of curiosity, do any of the other algorithms have the ability to be "paused" arbitrarily, i.e. with inverted control? I'm thinking specifically of parsing input from an asynchronous interface.
Derivatives do that well, since input is essentially pushed into the parser.
Pausing is very straightforward with bottom up algorithms like GLR. The entire state of the parse is stored in an explicit stack data structure, as opposed to implicitly in the program's call stack as with top-down algorithms. So resuming the parse is as simple as resuming the parse loop using a previously-stored stack.
I think this is a bit of an orthogonal issue. Inversion of control is a problem that can be solved at the language level with coroutines / continuations / threads / etc without forcing you to choose a special type of algorithm.
> CYK and Early parsers are both actually simpler than parsing with derivatives, once you've thrown this paper's optimizations in.
Yeah, GLR is very elegant as well; I'm a little puzzled by this paper's initial claim (and similar claims in the Yacc is Dead paper):
> Current algorithms for context-free parsing inflict a trade-off between ease of understanding, ease of implementation, theoretical complexity, and practical performance. No algorithm achieves all of these properties simultaneously.
GLR is has good performance and isn't much harder to understand than LR, which is a very elegant theory. So I assume that 'ease of implementation' is the property the authors think GLR lacks. But ease is a little hard to measure, until you have an equally fast and production-ready implementation of both algorithms.
Let me add some more background, about stuff that I would expect to be covered in this paper's related work section, except that it doesn't have one...
Most well-known parsing algorithms that handle arbitrary CFGs run in cubic time, so this paper's O(n^3) running time is about the best you would expect. Some other parsing algorithms that also handle arbitrary CFGs in cubic time include CYK[1], Early[2], and GLR[3]. I used to be excited about parsing with derivatives because of its simplicity, but CYK and Early parsers are both actually simpler than parsing with derivatives, once you've thrown this paper's optimizations in.
[1] https://en.wikipedia.org/wiki/CYK_algorithm [2] https://en.wikipedia.org/wiki/Earley_parser [3] https://en.wikipedia.org/wiki/GLR_parser
[EDIT: Previously, I stated "CFGs, in general, can't be parsed in better than cubic time.", which is incorrect. Thanks, pacala.]