Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Why no regex AND?
3 points by _acme on Aug 12, 2016 | hide | past | favorite | 11 comments
Why does the syntax for regular expressions include an "or" (union) operator but not an "and" (intersection) operator?


FWIW, http://adsabs.harvard.edu/abs/2008arXiv0802.2869G says: "Similarly [to complement], when constructing a regular expression defining the intersection of a fixed and an arbitrary number of regular expressions, an exponential and double exponential size increase, respectively, can in worst-case not be avoided."

I found the 'greenery' project, which says, at https://qntm.org/greenery :

> Elementary automata theory tells us that the intersection of any two regular languages is a regular language, but carrying out this operation on actual regular expressions to generate a third regular expression afterwards is much harder than doing so for the other operations under which the regular languages are closed (concatenation, alternation, Kleene star closure).

The developer has code, which lets you do:

  >>> from greenery.lego import parse
  >>> print(parse("(ab{0,3})*") & parse("(abba)*"))
  (ab{2}a)*
  >>> print(parse("((ss*)t*)") & parse("((ss*)+(tt*))"))
  s+t+
I have no other experience with the code, but it was nice to know that such a package exists.

The author also wrote that it was "the most algorithmically complex thing I've ever implemented."


A regex is one big implicit AND already. The OR is an exception to the normal rule.

/(abc|def)(123|456)/

You can read that as "(abc OR def) AND (123 OR 456)". The string "abc789" wouldn't match, for instance.


If I understand what OP means by "and", it doesn't mean "(abc OR def) AND (123 OR 456)" — it means "(abc OR def) FOLLOWED BY (123 OR 456)". Let's look at another example, with a hypothetical & operator:

   /(\D\S)+/
   /(\D|\S)+/
   /(\D&\S)+/
If you look at the string "b5 ", the first regex matches "b5", the second regex matches the whole string because all of the characters are either not a number or not whitespace, and the third regex only matches "b", because that's the only character that is both not a number and not whitespace.


\D is a subset of \S, so the \S accomplishes nothing (said another way, there is no character that matches \D that doesn't also match \S).

Secondly, there are very few intersecting character classes (sets) that I'm aware of, and in all cases, you could achieve the desired result more clearly in other ways.

Said another way: "AND" would just make regexes even harder to understand/approach, and that is almost always undesirable.


I was just trying to illustrate how the logic of AND differs from what was shown above, not give a useful example.

A more practical example might be something like

    /(10|22)(.*crab.*&.*apple.*)90/ 
in order to only match strings where the content between the numeric codes matches both "crab" and "apple" in any order.

To be clear, I don't know that it's useful enough to warrant inclusion in a regex engine. I'm just trying to provide a useful illustration.


That is concatenation. Intersection would be the ability to write something like "abc....&....xyz" and have it match the same thing as "abc.xyz".


Use this syntax for logical AND,

  (?=...)(?=...)
http://stackoverflow.com/a/24102539


This syntax is not supported by the tools I use. If we're going to add to the language, why not just add a true intersection operator? My question was meant to be more theoretical - what was the rationale behind incorporating union into the basic syntax, but not intersection, when regular expressions were first adapted to text processing?


I do have awk available, which allows one to combine regular expressions using logical operators, but I'm interested in the historical and theoretical aspects of my inquiry, if any.


AND is explicit in the groupings.


Could you please clarify what you mean by this? Are you referencing concatenation as being a form of logical AND?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: