> Parser generators look promising until you actually start building anything of moderate complexity
I've done both, by hand and with parser generators (flex/bison and antlr) and getting the machine to do the boring work is total fuckload[0] faster and more productive.
Edit: and unless you know what you're doing, you will screw up when hand-writing a parser. I know of a commercial reporting tool that couldn't reliably parse good input (their embedded language).
What do you think is special about recursive descent parsing that makes you more likely to screw up unless you know what you're doing?
My experience has been the exact opposite - particularly as the language gets complicated and/or weird. In which case the generated parser becomes horribly brittle. Adding an innocent looking new rule to your working Antlr or Yacc grammar feels like fiddling with a ticking bomb - it might work straight away or it could explode in your face - leaving you with hours or days whack-a-moling grammar ambiguities.
I didn't say recursive descent parsing wrt screwing up, I just said "hand-writing a parser". Nothing about the flavour.
I guess our experiences differ but I don't know why. I have written a seriously major SQL parser in antlr and had no problem. And it was huge and complex, well that's TSQL for you.
It may be you have been parsing non-LR(1) grammars in bison which could prove a headache but... well IDK. Maybe I've been lucky.
That's an interesting coincidence - the biggest parser I wrote was also an SQL parser using Antlr. In fact, SQL was only part of it - it was a programming language that supported multiple flavours of embedded SQL (DB2 and Oracle). It worked but I always dreaded when it would have to be changed to support a new feature in a new release of Oracle (or DB/2).
I don't think it's an LR vs LL thing either. I feel that there is no sense of locality with parser generators; it's a bit like quantum mechanics - every rule has a "connection" to every other one. Change one seemingly small part of the Antlr grammar and some "far away" seemingly unrelated parts of the grammar can suddenly blow up as being ambiguous.
Coincidence indeed - I'm currently modifying my giant SQL grammar right now and building an AST off it. And struggling a bit with that, but that's mainly down to me not antlr.
It is strange that we're having such different experiences of it. I don't recognise your quantum view of it either, as a antlr rules, and bison, are very context-specific as they can only be triggered in the context of larger rules, and only when given piece of the target language (SQL here). They get triggered only in those very specific cases. I've never had your struggles with it. I don't understand.
I completely agree, yacc/bison are my goto tools - the big different i]s that you are building trees from the bottom up rather top down - if you're building an AST it probably doesn't matter, however a bunch of things (constant folding, type propagation) tend to go in the same direction so sometimes you can combine stuff.
I've done both, by hand and with parser generators (flex/bison and antlr) and getting the machine to do the boring work is total fuckload[0] faster and more productive.
Edit: and unless you know what you're doing, you will screw up when hand-writing a parser. I know of a commercial reporting tool that couldn't reliably parse good input (their embedded language).
[0] 3.14159 shedloads in metric