Hacker News new | past | comments | ask | show | jobs | submit login
How Regexes Work (plover.com)
73 points by aleyan on Nov 12, 2022 | hide | past | favorite | 10 comments



Probably taken from yesterday's discussion: Ken Thompson's NFA regex patent

https://news.ycombinator.com/item?id=33566557


Always a worthy, related mention: https://swtch.com/~rsc/regexp/regexp1.html

Another fun idea is the opposite of the regex: the regex constructs a state machine of the needle and is then quickly able to run many haystacks through it. Sometimes you need the other optimisation: compile the entire haystack down to a state machine, and you can run many needles through it! It becomes a very primitive search index.


The haystack idea reminds me of this wonderful post from the creator of both the Rust regex crate and the ripgrep command line utility (among other awesome things).

https://blog.burntsushi.net/transducers/


Interesting read.

Also discussed only 1 year ago:

https://news.ycombinator.com/item?id=28243636 (174 points, 10 comments)


That link is a very high quality write up.

I’m having a hard time understanding how your “haystack compilation” would work. I get the desire to be able to efficiently run many regexes on the same large file, but how could you optimize the file for arbitrary regex matching?


January 2007

No thanks


How is the age relevant here? The article doesn't deal with current events or the newest hottest framework. It explains algorithms and abstract ideas which haven't changed in the last 15 years.


I worked briefly with Mark-Jason Dominus in 2000. He added instrumentation to the Perl regex engine so that we could develop a regex debugger for the forthcoming Komodo editor. Each time the engine advanced to process the next step, we would get a callback that included vital information such as the location in the regex string and the target string.

Today, there are numerous excellent tools like this on the web. Back in 2000, it was dark magic and so much fun to work on.


There are websites that can step through a regex execution (e.g. Regex101), but I still don't have a regex debugger with arbitrary callbacks! When do I get one in Python?


Curious if this exponentially slow regex is the reason Notepad++ crashes on loading modestly sized (10 MB) JSON?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: