As others have said, parsers for complex formats (be it text-based or binary) ar...

chubot · on April 15, 2015

I agree with sandboxing, but a problem is that by using something like seccomp-bpf, you're turning highly portable C code into a highly platform-specific and even architecture-specific system. (Well, it won't be architecture-specific if you use the old plain seccomp, which is probably all you need.)

You also need to deal with serialization and deserialization of the parse tree, which isn't super hard, but people still get it wrong.

Writing parsers is already hard, but writing secure parsers is at least 2x harder than that, or maybe 4x if you have to port it to 3 platforms. There really needs to be some kind of "libsandbox" for parsers.

TazeTSchnitzel · on April 15, 2015

There are some other interesting choices. Like Rust, or Cyclone.

jhpriestley · on April 15, 2015

I think that Haskell's parser combinator libraries, like Parsec, are clearly superior for most parsing tasks.

Parsec is about as concise as a formal grammar, but it is real code rather than a code generator, so you don't have the extra complexity of a parser-generator. Parsec parsers are type-safe, so there's no way that you'd get something back that wasn't a valid AST (or a parse error). Error handling is also quite good.

Parsec in particular is not especially fast - there are other libraries like attoparsec and cereal (for binary serialization) which trade off some of the flexibility of parsec for improved performance.

I think Haskell programmers mostly use these libraries, because they are clearly superior to the alternatives (for instance, you get a lot less interest in RegEx in Haskell-land, because when you have a really good parsing library there's much less reason to use them). C programs like SQLLite aren't using this approach because monads can't be expressed in the C type system, and there is no syntax sugar for monads in C, so Parsec would end up much less pleasant and much less safe than in Haskell.

stingraycharles · on April 15, 2015

In C++, we have Boost.Spirit, and for C there is Bison/Flex. Surely those are better and more safe alternatives than hand-rolling your own parser?

f- · on April 15, 2015

Many of the horribly vulnerable parsers are generated with Bison / Flex, so it's not exactly a robust solution. Plus, especially for binary formats (images, videos, etc), it's hand-written or bust.

a8da6b0c91d · on April 15, 2015

Marpa is amazing.