Excellent links. And this was just on my mind in fact.
In most web-servers there is a list of regexes - typically starting stems, almost more glob-like - that are defined in precedence order to match requests to handlers.
Are there any libraries that can treat this as a single effective operation, rather than just testing against each regex as an island until there's a match?
Are you are interested in the parsing of the HTTP requests? I think I've recently seen some hand-coded state-based C implementation (I guess it can even be O(n)?) in some fast open-source HTTP server.
The last time I looked at nginx it parsed the HTTP request with a hand-coded state machine but routes that are defined by regular expression matching were handled one expression at a time.
Routes that are not defined by regular expressions used a much faster tree based method.
Lex and Flex both generate C (Flex has an option to output C++ instead, IIRC), yes. There are similar tools for other languages though: Java has JLex, ANTLR, etc. for example.
Re: the config file - how often does it change? Does it really need to be deferred to runtime, or is it just startup time? My (probably out-of-date) understanding of rewrite rules in apache, for example, is that you need to bounce the server for it to pick up changes; that would also be sufficient for it to pick up changes to the flex file and rebuild. Running flex and compiling the output is quick enough to be instantaneous from the point of the admin restarting a server.
If it really can change at runtime, I guess the point is to avoid downtime while updating the configuration? There are ways around this too, some more practical than others. If your servers are behind a load balancer you can do a rolling restart to pick up changes. If that's not an option it's possible, though probably not trivial, to hot-patch the native code. YMMV. :-)
They also extend it to weighted regular expressions and even lazy, infinite regular expressions.