What are your Ruby Regex Idioms?

sant0sk1 · on Nov 30, 2009

I use Rubular all the time for testing my regexes with ease. Give it a shot, its really well done:

samstokes · on Nov 30, 2009

Looks like a useful tool. I've been using a similar one: http://regex.powertoy.org/

The latter supports several languages' regex dialects, and I like the visual way it displays group matches, particularly when groups are nested. It can be a bit clunky, though, whereas Rubular is pretty clean and usable.

nanijoe · on Nov 30, 2009

This (ruby) regex thing still gives me a headache , esp when I'm trying to figure out other people's code

carbon8 · on Nov 30, 2009

You can also use this as String#slice, which is a little clearer:

    "foo@example.com".slice(/@(.*)/, 1)

sh1mmer · on Nov 30, 2009

I love regex, but man it's still crazy voodoo. I hope those lines were well commented.

iamwil · on Nov 30, 2009

I didn't see it mentioned, but, btw, this code allows a method to introspect to figure out what the method name that's currently being executed.

caller[0][/`([^']*)'/, 1]

Note that you only need to do this for Ruby 1.8. In Ruby 1.9, there will be a __method__() and __callee__() method that does the same thing.

sofal · on Nov 30, 2009

One of my favorite patterns:

  _, username, domain = */([^@]+)@(.+$)/.match("foo@example.com")

jherdman · on Nov 30, 2009

If you're capturing something you don't want, you can always prefix your match with '?:' to have a non-capturing group.

FWIW, I personally prefer the #match method. Some of the examples in this article just make me want to hurt people. Especially usage of $1, $2, etc.

tierack · on Nov 30, 2009

In that case, /([^@]+)@(.+$)/.match("foo@example.com")[0] returns the whole string matched by the expression, and [1] and [2] return the first and second groups. That's why the throwaway is there. To get rid of the throwaway, this is a possibility:

  username, domain = */([^@]+)@(.+$)/.match("foo@example.com").captures

Of course, your string better match, otherwise you'll get a NoMethodError.

steveklabnik · on Nov 30, 2009

$1 and $2 are probably still there do to Ruby's Perl heritage. I don't particularly mind their use, but that's probably due to my long history with Perl.

lzell · on Nov 30, 2009

Nice. I am intrigued by the leading underscore, are you just using that as a throwaway variable? You could get the username and domain on their own with:

  username, domain = [*/([^@]+)@(.+$)/.match("foo@example.com")][1..-1]

And it is super readable.

twoism · on Nov 30, 2009

this works as well...

  username,domain =  */([^@]+)@(.+$)/.match("foo@example.com").captures

zzleeper · on Nov 30, 2009

Woah.. I want that for Python..

jacobolus · on Nov 30, 2009

I sure don’t. Putting a regular expression in a slice for a string runs totally counter to all existing python idioms, and would utterly confuse anyone trying to read the code. There are definitely other ways of accomplishing this with reasonably compact syntax that are also intuitive and clear. (But I'm unconvinced it needs special syntax at all. It saves a couple lines here and there, maybe, but at the cost of making the language more complex.)

  caller[0][/`([^']*)'/, 1]

isn't really so much shorter than:

  re.search("`([^']*)'", caller[0]).group(1)

and the latter is quite a bit more explicit.

samstokes · on Nov 30, 2009

> I'm unconvinced it needs special syntax at all.

What's interesting is that it's actually not special syntax - at least, it's not special syntax for regex usage. The syntax obj[arg, arg, arg] looks weird because in most languages things that look like subscripts can't have multiple arguments, but it's just sugar for the method call obj.[](arg, arg, arg), which is pretty mundane. The semantics are just the semantics of that method on the class of 'obj', and can therefore be defined in library or user code.

Note I'm not defending the specific use of it here to write odd-looking regex code - I just think it's interesting that the language is flexible enough to allow that kind of usage without special-casing it.

Raganwald wrote a blog post [1] along similar lines, arguing that it's a core part of Ruby's language philosophy, about this snippet:

    (1..100).inject(&:+)

[1]: http://weblog.raganwald.com/2008/02/1100inject.html

jacobolus · on Nov 30, 2009

Fair enough. I don’t like the special semantics then, that say that string objects take regular expressions as possible slice indices. It's trying to be too cute, and compactness comes at the direct expense of clarity. Basically the reason I don't like Ruby generally, is that everything tries to be too clever by half, too many behaviors crammed into too few parts. It's sort of halfway from Python to Perl.

carbon8 · on Nov 30, 2009

"say that string objects take regular expressions as possible slice indices."

I think it would be more accurate to say that Ruby's String#[] aka String#slice method accepts arguments other than just indices (including regular expressions and other strings) for returning substrings/subpatterns. It's just a different approach, intuitive and rather uncontroversial, IMO.

samstokes · on Nov 30, 2009

I mostly agree.

In a dynamically-typed language there's no reason not to pass a Regexp, or indeed a Fruitbat, to a slice expression, so long as it behaves in some expected way. What I don't like about it is that there's no obvious convention for what that should be. String#[] has two fairly different behaviours depending on whether the argument behaves in one way or another. Because they have the same syntax, they look like they should mean similar things, but whether they're similar is debatable at best; and even if that case is debatable, the semantics of this are completely different:

    Proc.new {|x, y| x + y}[2, 3]   # => 5

(That is, Proc#[] is aliased to Proc#call, to work around the fact that Ruby's method call syntax is a special case whose semantics are not programmable, and can't be used on Proc objects.)

More generally, I think it's a net win that the language detaches syntax from semantics in this way, and allows the semantics to be programmed; but for readability, syntax should still suggest semantics - i.e. a convention that should be followed when defining the semantics. The authors of the Ruby standard library didn't follow such a convention for [].

emmett · on Nov 30, 2009

Your code does not accomplish the required task. The goal is to return nil when there is no match, or the nth match if there is.

  >>> re.search('a(x)a', 'axa').group(1)
  'x'
  >>> re.search('a(x)a', 'aya').group(1)
  Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  AttributeError: 'NoneType' object has no attribute 'group'

The idiomatic python for the job is:

  match = re.search("`([^']*)'", caller[0])
  if match:
     match.group(1)
  else:
     None

(or some nonsense involving try/except, which would be even worse).

jacobolus · on Nov 30, 2009

Ah, you’re right. In many cases, I don’t want to return None when there’s no match, but for short scripts quickly tossed together, the extra few lines can be a bit annoying.

  try: re.search("`([^']*)'", caller[0]).group(1)
  except: None

grandalf · on Nov 30, 2009

  (foo){2,5}