Yeah, that's accurate! I'm glad we finally know the running time for parsing wit...

pacala · on June 25, 2016

CFG parsing can be done better than O(n^3), reducing to Boolean matrix multiplication, which is about O(n^2.37).

http://theory.stanford.edu/~virgi/cs367/papers/valiantcfg.pd...

https://en.wikipedia.org/wiki/Coppersmith%E2%80%93Winograd_a...

Agreed that CYK and Earley are rather simple to implement. Though Earley has an edge, as its best case is O(n), whereas CYK is O(n^2) best case.

justinpombrio · on June 25, 2016

> CFG parsing can be done better than O(n^3), reducing to Boolean matrix multiplication, which is about O(n^2.37).

Is there a known lower bound on running time to parse CFGs, like O(n^2)?

DannyBee · on June 25, 2016

It is known that any fast CFG parsing algorithm can be turned into a fast boolean matrix multiplication algorithm.

So if CFG parsing was n^2, so is boolean matrix multiplication :)

See http://arxiv.org/abs/cs/0112018

yxhuvud · on June 25, 2016

One important note is that this holds for general grammars that can contain everything. If the grammar is restricted somehow, for example to lr(k) then the best complexity is linear (there are even parsers that manage both LR(n) in linear time and general parsing in cubic).

moefh · on June 25, 2016

I'm confused by this argument. To show that

> if CFG parsing was n^2, so is boolean matrix multiplication

wouldn't you need to show that any boolean matrix multiplication can be turned into CFG parsing (and not the converse)?

justinpombrio · on June 26, 2016

Great question, I just learned something.

A boolean matrix multiplication problem can be turned into a CFG problem. Therefore, an algorithm that solves CFG problems can be turned into an algorithm that solves boolean matrix multiplication.

DannyBee · on June 28, 2016

Sorry, yes, i deleted a line accidentally, but it turns out you can show both directions, and to prove a tight bound, you need to show they reduce to each other without changing the complexity class.

mcguire · on June 25, 2016

Out of curiosity, do any of the other algorithms have the ability to be "paused" arbitrarily, i.e. with inverted control? I'm thinking specifically of parsing input from an asynchronous interface.

Derivatives do that well, since input is essentially pushed into the parser.

maxbrunsfeld · on June 25, 2016

Pausing is very straightforward with bottom up algorithms like GLR. The entire state of the parse is stored in an explicit stack data structure, as opposed to implicitly in the program's call stack as with top-down algorithms. So resuming the parse is as simple as resuming the parse loop using a previously-stored stack.

ufo · on June 26, 2016

I think this is a bit of an orthogonal issue. Inversion of control is a problem that can be solved at the language level with coroutines / continuations / threads / etc without forcing you to choose a special type of algorithm.

kmill · on June 26, 2016

Earley parsing can handle asynchronous input just fine.

maxbrunsfeld · on June 25, 2016

> CYK and Early parsers are both actually simpler than parsing with derivatives, once you've thrown this paper's optimizations in.

Yeah, GLR is very elegant as well; I'm a little puzzled by this paper's initial claim (and similar claims in the Yacc is Dead paper):

> Current algorithms for context-free parsing inflict a trade-off between ease of understanding, ease of implementation, theoretical complexity, and practical performance. No algorithm achieves all of these properties simultaneously.

GLR is has good performance and isn't much harder to understand than LR, which is a very elegant theory. So I assume that 'ease of implementation' is the property the authors think GLR lacks. But ease is a little hard to measure, until you have an equally fast and production-ready implementation of both algorithms.