Hacker News new | past | comments | ask | show | jobs | submit login

I think everyone who doesn't know regex should make learning regex a priority. (However, I find that lookahead and lookbehind in particular do not tend to come in handy very often. So maybe just make a mental note that this exists and then look it up when you need it.) Just learn the basics and maybe take a very quick look at the theory, finite automata (maybe the name puts people off, but its just a couple of circles connected to other circles with a bunch of characters written on the connecting lines. I'm pretty sure you could explain it all in a few sentences a few examples). You'll get an intuitive feeling for what you can and can't do with regular expressions.

You don't even have to be able to code to make use of regular expressions. You can use regular expressions when searching and replacing in editors (even slightly barebones editors like gedit or kate). You can transform input data from almost any format into any other format using nothing but your editor and a series of replace statements. (No computations though.)

I think they should teach regex in high school. Many people working in non-IT office jobs could benefit from knowing regex, and I think it's really quick to learn this. (Now if only Excel's/Word's search/replace supported regex...)




Which version of RegEx? I've "learned" RegEx two or three times and then switched language/platform and had everything I previously learned no longer work reliably.

You might think I am just talking about Microsoft's quirky implementation but even in the Linux-sphere it isn't consistent see:

http://www.greenend.org.uk/rjk/tech/regexp.html

You take a complex format string which was design to use the fewest characters instead of with clarity in mind, you then have every major application and library diverge on basic support and spec for features, and then you have all of them hack on support for UNICODE in their own unique way.

Regular Expressions likely won't ever die, but I for one would happily switch to an alternative with better readability, UNICODE support from day zero, and fewer niche features to keep things uniform. I'm tired of re-learning RegEx only to have everything I've learned either be forgot or not work the second I app switch.


That's an overstatement of the differences between various regex engines. They all follow the basic standards, with [] being character classes, () being submatches, * being "0 or more", + being "1 or more", etc.

The two main differences between various engines are which characters are "literal" and which characters are "magic" (Vim's engine is particularly annoying here), and how to write the "convenience character classes" (like what the shorthand for "alphanumeric character class" is). But these are minor issues, once you've learned how to write a regex, these are trivial to look up.

Knowledge of regular expressions transfer from one engine to another just fine.


I generally include either \v or \V in my vim regex, at which point I no longer have to think about which characters are magic. I suppose this means that I agree that vim's default is annoying here, but imho vim more than makes up for that by making magic configurable.


> They all follow the basic standards, with [] being character classes, () being submatches

You've already described a feature which has different syntax in one of the primary regex dialects I use (Emacs).


That's the syntactic differences, but there are also semantic ones.

Most notably, the choice operator can either be ordered like in PEGs (if the first branch matches, the other isn't evaluated) or pick the branch that produces the longest match, CFG-like.


For the most part, it's just a matter of knowing if you're using POSIX Basic Regular Expressions (BRE), POSIX Extended Regular Expressions (ERE), or Perl regular expressions.

Learn those, or at least the main differences between them, and the vast majority of the regular expression engines in software you use will become more recognizable.


You wouldn't be programming in Regex and for small things a google search for the platform quirks is usually faster than writing a parser, isn't it?

I'd agree with you if everything weren't so easy to look up.


But don't forget to point at the limitations. For example, you can't use regexps to match an arbitrary but equal number of nested opening and closing parentheses.


I presume you're familiar with the infamous "can you parse HTML with regex"?

https://stackoverflow.com/questions/1732348/regex-match-open...


Another potential problem with regexps is that the underlying finite state machine can grow exponentially in the size of the expression.


regex engines like PCRE can:

    ^(\((?1)?\))$


If it can then it's not "regular expressions."


But when most of the commonly used "regular expression" libraries aren't regular, I think if you really mean solving something with only regular expressions, you should probably specify that explicitly. The term's been corrupted enough that using it by itself to rule out things like backreferences isn't clear communication.


That's a shame, because regular grammars have a very important property: they're processed with a Finite State Automaton. This makes them blazing fast and quite memory efficient. (Heck, even with a non-deterministic one they're fast.)


Which is very computer-science-y approach, and totally uninteresting at that, because the (?PARNO) syntax is still embedded in the same regexp engine for any practical purpose or distinction.

If you didn't know, a regexp engine with capture groups is already stronger than what in formal languages theory is called "regular expressions".


I appreciate the sentiment same as you, but it is not quick to learn, because you'd either stop short or basically learn, well ... patterns, some for different usecases and categorize many different patterns that achieve the same, while the typical pupil has problems with simple arithmetic, calculus etc. already. So the question would be why the patterns aren't abstracted behind a nice composable gui [1]. Not to mention the confusion around the various ever so slightly differing applications.

Also, you don't wanna spoon feed students, they'll never learn to fish. You would indeed have to go as far as implementing a regex engine or implementing I don't know, a certain finite automate in regex. I'm kidding but all you could realistically achieve would be the usage of a catalogue like command-line-fu or stackexchange unless the whole thing fits in a broader cs syllabus/curriculum.

[1] lrovocative statement: sed and awk are breaking the "do one function and it well" idea of unix.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: