As others have said, parsers for complex formats (be it text-based or binary) are exceptionally hard. There are some classes of C/C++ software where the choice of a language doesn't have such a striking effect. For parsers, the effect is hard to ignore.
But parsers are also the kind of stuff you almost always end up writing in C/C++, and there are semi-compelling reasons for doing so - chiefly, performance and flexibility. You can disagree and make your pitch for Ocaml or JavaScript or whatever, but really, if we had clearly superior choices, we wouldn't be dealing with this problem today (or it would be a much more limited phenomenon). There are some interesting contenders, but the revolution won't happen tomorrow, no matter how much we talk about it on HN.
Perhaps a more fitting conclusion is that if you are parsing untrusted documents, our brains are too puny to get it right, and the parser really needs to live in a low-overhead sandbox. Mechanisms such as seccomp-bpf offer a really convenient and high-performance way to pull it off.
I agree with sandboxing, but a problem is that by using something like seccomp-bpf, you're turning highly portable C code into a highly platform-specific and even architecture-specific system. (Well, it won't be architecture-specific if you use the old plain seccomp, which is probably all you need.)
You also need to deal with serialization and deserialization of the parse tree, which isn't super hard, but people still get it wrong.
Writing parsers is already hard, but writing secure parsers is at least 2x harder than that, or maybe 4x if you have to port it to 3 platforms. There really needs to be some kind of "libsandbox" for parsers.
I think that Haskell's parser combinator libraries, like Parsec, are clearly superior for most parsing tasks.
Parsec is about as concise as a formal grammar, but it is real code rather than a code generator, so you don't have the extra complexity of a parser-generator. Parsec parsers are type-safe, so there's no way that you'd get something back that wasn't a valid AST (or a parse error). Error handling is also quite good.
Parsec in particular is not especially fast - there are other libraries like attoparsec and cereal (for binary serialization) which trade off some of the flexibility of parsec for improved performance.
I think Haskell programmers mostly use these libraries, because they are clearly superior to the alternatives (for instance, you get a lot less interest in RegEx in Haskell-land, because when you have a really good parsing library there's much less reason to use them). C programs like SQLLite aren't using this approach because monads can't be expressed in the C type system, and there is no syntax sugar for monads in C, so Parsec would end up much less pleasant and much less safe than in Haskell.
Many of the horribly vulnerable parsers are generated with Bison / Flex, so it's not exactly a robust solution. Plus, especially for binary formats (images, videos, etc), it's hand-written or bust.
But parsers are also the kind of stuff you almost always end up writing in C/C++, and there are semi-compelling reasons for doing so - chiefly, performance and flexibility. You can disagree and make your pitch for Ocaml or JavaScript or whatever, but really, if we had clearly superior choices, we wouldn't be dealing with this problem today (or it would be a much more limited phenomenon). There are some interesting contenders, but the revolution won't happen tomorrow, no matter how much we talk about it on HN.
Perhaps a more fitting conclusion is that if you are parsing untrusted documents, our brains are too puny to get it right, and the parser really needs to live in a low-overhead sandbox. Mechanisms such as seccomp-bpf offer a really convenient and high-performance way to pull it off.