Hacker News new | past | comments | ask | show | jobs | submit login
Advanced Regular Expressions (smashingmagazine.com)
47 points by pchristensen on May 7, 2009 | hide | past | favorite | 25 comments



How did this get to the top of HN? It seems like a boring blogospam rehash of a bunch of very basic CS concepts, not even well written...


I wish HN had downvotes for articles instead of harsh top comments.


Or if the "flag" button made posts' gravity higher so that they fell to the bottom sooner.


...basic CS concepts...

Something which is ironic, and also gets back to basic CS concepts, is that modern scripting languages have added so much power to "Regular Expressions," that they are no longer strictly Regular Expressions. They are no longer equivalent to Finite Automata. "Regular Expressions" in many cases are outright Turing Complete.

And so the abuse of our field's terminology continues.


DFA vs NFA. I thought it was humorous that the most important features of modern regex engines are the things that prevent them from meeting the criteria of the original definition for a regex engine. And of course, with a true DFA regex engine, you'll never suffer from catastrophic backtracking or exponential performance degradation with a bad regex or bad subject string.


Consider the word ‘smashing’. Using the above regular expression, the regex engine will first try to match the pattern ‘hi’ in ‘smashing’. It will not find a match.

... what?


I find when I move across languages (which happens frequently) or even tools (like grep), regexp implementation often differs enough to throw me off and introduce subtle bugs.

Is there a page that shows a mapping between different implementations of regular expressions and which languages and applications they're used in? Is there a better way to figure these things out than hunt for documentation every time?


A tool I find useful enough to warrant purchasing and running in a VM or Wine is RegexBuddy. It is great for debugging / optimizing regexes and supports several different language flavors. Still waiting for Vim and PHP flavors though. :) It allows you to generate code snippits in several languages that use the regex you constructed, and will assist you in performing cross-language translations.


I've written an automated tool to convert Vim-style regexes to Perl-style. This is kind of a dramatic case as the dialect is very different, but it was a fun thing to write.


Is it bi-directional? Is it on the vim.org website somewhere? I find Vim's escape heavy syntax cumbersome and I frequently wish I could write a quick Perl syntax regex and convert it to Vim.

Of course, Vim is one of the few regex engines to support variable width negative look-behinds, so I guess that counts for something. :)


Only Vim -> Perl is fully fleshed out right now. Doing it the other way would be fairly easy though because so much would be reused. Also, with the Vim regexes I had to worry about magicness and so on, the Perl stuff would parse easier.

It's not in Vim.org. I had the code floating around with an MIT license, but it's offline right now. I thought about building a page where you could do the translation (I figured Perl -> Vim is what people would want more) or just use a simple GET request from other code.

Maybe I should do the Perl -> Vim bit and make it available.


Can you make vim use pcre?


From my look in the Vim sources, it would be tons of work to truly make it use pcre internally.

But I think it wouldn't be hard to do a plugin that translates a pcre into a Vim regex, allowing you to search or replace using pcres.

My code though is not integrated with Vim, it takes a free floating regex and converts it. I used it to convert all the regexes in Vim syntax files and build a syntax highlighter. It was a silly project, just for fun.


indeed. grep -P is your friend. (except older grep versions seem to segfault on me constantly with various -P regexes)


sudo aptitude install txt2regex


I hadn't seen recursive regex before (almost a tautology). The operator in the article, "(R?)", isn't in Perl (v5.8.8).


Yeah. PHP syntax. From what I understand, they stuffed it in there mostly to allow easier processing of nested HTML tags and such.


I did some experimenting with recursive regexes after reading this article. I hadn't heard of it before and it sounded like an interesting concept.

However, captures in the recursion can't be retrieved after it gets returned. This may not be surprising, but it does reduce the power of recursion somewhat.


coz you can't expect anyone using PHP to, you know, write a proper parser...


The first four of these I wouldn't call "advanced"...


And yet I was pleasantly surprised. I'm quite used to finding "advanced" articles on various programmic topics and finding that everything said in them is so basic that I can't learn even the smallest piece of information from it.

This one still had some quite interesting pieces of information that I hadn't heard of or hadn't studied in detail yet.


not for full-time programmers, but SM's audience seems to be more designers who may dip into code now and again


Welcome to Smashing Magzine.


I'm a big fan of letting named groups being my documentation, instead of using the # comment notation.


Named groups have their use and are nice when the language supports them. There is a lot to be said for the (?xi) flags to allow a complicated regex to be indented and commented usefully.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: