I think everyone who doesn't know regex should make learning regex a priority. (...

Someone1234 · on March 16, 2018

Which version of RegEx? I've "learned" RegEx two or three times and then switched language/platform and had everything I previously learned no longer work reliably.

You might think I am just talking about Microsoft's quirky implementation but even in the Linux-sphere it isn't consistent see:

http://www.greenend.org.uk/rjk/tech/regexp.html

You take a complex format string which was design to use the fewest characters instead of with clarity in mind, you then have every major application and library diverge on basic support and spec for features, and then you have all of them hack on support for UNICODE in their own unique way.

Regular Expressions likely won't ever die, but I for one would happily switch to an alternative with better readability, UNICODE support from day zero, and fewer niche features to keep things uniform. I'm tired of re-learning RegEx only to have everything I've learned either be forgot or not work the second I app switch.

OskarS · on March 16, 2018

That's an overstatement of the differences between various regex engines. They all follow the basic standards, with [] being character classes, () being submatches, * being "0 or more", + being "1 or more", etc.

The two main differences between various engines are which characters are "literal" and which characters are "magic" (Vim's engine is particularly annoying here), and how to write the "convenience character classes" (like what the shorthand for "alphanumeric character class" is). But these are minor issues, once you've learned how to write a regex, these are trivial to look up.

Knowledge of regular expressions transfer from one engine to another just fine.

thefifthsetpin · on March 16, 2018

I generally include either \v or \V in my vim regex, at which point I no longer have to think about which characters are magic. I suppose this means that I agree that vim's default is annoying here, but imho vim more than makes up for that by making magic configurable.

ken · on March 16, 2018

> They all follow the basic standards, with [] being character classes, () being submatches

You've already described a feature which has different syntax in one of the primary regex dialects I use (Emacs).

pygy_ · on March 17, 2018

That's the syntactic differences, but there are also semantic ones.

Most notably, the choice operator can either be ordered like in PEGs (if the first branch matches, the other isn't evaluated) or pick the branch that produces the longest match, CFG-like.

kbenson · on March 16, 2018

For the most part, it's just a matter of knowing if you're using POSIX Basic Regular Expressions (BRE), POSIX Extended Regular Expressions (ERE), or Perl regular expressions.

Learn those, or at least the main differences between them, and the vast majority of the regular expression engines in software you use will become more recognizable.

egeozcan · on March 16, 2018

You wouldn't be programming in Regex and for small things a google search for the platform quirks is usually faster than writing a parser, isn't it?

I'd agree with you if everything weren't so easy to look up.

amelius · on March 16, 2018

But don't forget to point at the limitations. For example, you can't use regexps to match an arbitrary but equal number of nested opening and closing parentheses.

lr4444lr · on March 16, 2018

I presume you're familiar with the infamous "can you parse HTML with regex"?

https://stackoverflow.com/questions/1732348/regex-match-open...

amelius · on March 16, 2018

Another potential problem with regexps is that the underlying finite state machine can grow exponentially in the size of the expression.

oftenwrong · on March 16, 2018

regex engines like PCRE can:

    ^(\((?1)?\))$

swiley · on March 16, 2018

If it can then it's not "regular expressions."

kbp · on March 16, 2018

But when most of the commonly used "regular expression" libraries aren't regular, I think if you really mean solving something with only regular expressions, you should probably specify that explicitly. The term's been corrupted enough that using it by itself to rule out things like backreferences isn't clear communication.

loup-vaillant · on March 16, 2018

That's a shame, because regular grammars have a very important property: they're processed with a Finite State Automaton. This makes them blazing fast and quite memory efficient. (Heck, even with a non-deterministic one they're fast.)

dozzie · on March 16, 2018

Which is very computer-science-y approach, and totally uninteresting at that, because the (?PARNO) syntax is still embedded in the same regexp engine for any practical purpose or distinction.

If you didn't know, a regexp engine with capture groups is already stronger than what in formal languages theory is called "regular expressions".

posterboy · on March 16, 2018

I appreciate the sentiment same as you, but it is not quick to learn, because you'd either stop short or basically learn, well ... patterns, some for different usecases and categorize many different patterns that achieve the same, while the typical pupil has problems with simple arithmetic, calculus etc. already. So the question would be why the patterns aren't abstracted behind a nice composable gui [1]. Not to mention the confusion around the various ever so slightly differing applications.

Also, you don't wanna spoon feed students, they'll never learn to fish. You would indeed have to go as far as implementing a regex engine or implementing I don't know, a certain finite automate in regex. I'm kidding but all you could realistically achieve would be the usage of a catalogue like command-line-fu or stackexchange unless the whole thing fits in a broader cs syllabus/curriculum.

[1] lrovocative statement: sed and awk are breaking the "do one function and it well" idea of unix.