Hacker News new | past | comments | ask | show | jobs | submit login
Regex that only matches itself (codegolf.stackexchange.com)
322 points by biot on Nov 11, 2016 | hide | past | favorite | 59 comments



Mostly unrelated, but I use the opposite trick to ensure that the "grep" process itself doesn't show up when I "ps | grep" for a process.

Something like:

    ps -ef | grep "myprogra[m]"
Is still a regex that matches the string "myprogram", but it doesn't match itself, so the "grep myprogram" doesn't show up in the output. Easier than doing " | grep -v grep" or similar.


Wow, thanks. I've been doing the pipe trick for 15 years.

Sokath, his eyes open


I just use pgrep for this now.


Not all platforms have pgrep, so I had to look up a trick like that in GP this weekend


This is ridiculous. How does one go about deriving something like this? Really impressive.


If you think that's crazy, check out https://www.infoq.com/presentations/miniKanren at the 31:30 mark. These two guys teach their logic programming system a subset of scheme, then they can run it "backwards" to find quines - programs that evaluate to themselves.


That is amazing. I don't want to spoil the fun but I highly recommend that video!


Once you've seen the trick for making quines you can apply it to any language.


Reminds me of a quine - a non-empty computer program which takes no input and produces a copy of its own source code as its only output

https://en.m.wikipedia.org/wiki/Quine_(computing)


It is literally a quine, viewed as a program in a language where (for example) the code consists of a regular expression and the output is the sequence of all strings matching that regular expression.


In the esolang community, the name for this variation of Quine is a Narcissist or Narcissus program. You can see a bunch of examples in a number of different languages on the Rosetta Code website.

https://rosettacode.org/wiki/Narcissist


It also reminded me about quines, I especially like the Quine Relay project which uses 100 different programming languages (each producing an input for the next one) to finally produce its own code https://github.com/mame/quine-relay


Stubbornness and vanity. Except in code form.


I wish I was as smart as these people.


These are smart people of course, but it's easy to get overly impressed by these results because all the scaffolding is missing. Behind such a result, there is a lot of tinkering, many leads going into dead ends. By the time you hit a solution, you have in your mind a theory of why it works and you can start cutting it down by removing anything non essential. You end up with a clean result that seems almost magical.


If you know anything about regular expressions, perhaps considering yourself a fairly decent expert... you might even know how to write basic recursive regexes. Then you are like me and think you know how to begin ripping apart any regex.

Then you see "<^<()(?R){2}>\z|\1\Q^<()(?R){2}>\z|\1\Q>", and you can't even begin to parse it. For one, I didn't know you could add a quantifier to a recursive pattern. Then there is a backreference to what appears to be an empty capture, which is part of an alternation for fuck's sake, which also happens to include the \Q notation.

I imagine I am in the top 20-30% of regular expression users. And I straight up do not understand a single aspect of how this expression manages to match anything, let alone its intended purpose.

It doesn't matter how this regex was found. Whether it was obtained through an entirely manual process or assisted by some kind of AI, it's beyond incredible. 99.9999xxx% of senior software developers who have been programming for 30+ years could not come up with this if they were given 10 years to try and discover it out without 3rd party assistance.

>> These are smart people of course, but it's easy to get overly impressed

How bloody arrogant, and frankly a bullshit statement from someone who has no idea what they are talking about. This is amazing. Someone would have to sit down with me for at least a week - probably 3+ months - to even begin to explain to me exactly how this works.

Why did I take this comment so personally? I don't know, it sounds like the parent has never even seen a regular expression, and yet is trying to promote their own sense of worth as if they're a genius when in fact they have an IQ of 40. That somehow the people who managed to figure this out are only lucky to have done so, rather than there being any significant time or skill involved. I bet you there are less than a thousand human beings on Earth who could have solved this. It's likely that the people who fucking designed and wrote PCRE could not have even solved this.

Basically, stfu. Why do you have to comment only to diminish someone else's success story? You couldn't have solved this, so don't even try to minimize the achievements of others.


I didn't read that as trying to diminish the brilliance of the original accomplish, but just trying to frame it less as "magic" or "genius" that is incomprehensible and more as an impressive human feat by a human who had to do hard human work to get there. I believe the intent was to encourage us not to lessen the extent to which we believe that this is something that only a higher class of people can do. Ie: yes, we can't comprehend it on first glance, but they didn't come up with it instantly either. They spend time gradually working and modeling the problem in their head so that the end result is comprehensible to them.


Thank you, that was indeed my intent here. I don't want to take away anything from the achievement, but you explained it better than I did.


     Everybody is a Genius. But if you judge a fish by
     its ability to climb a tree, it will live its whole
     life believing that it is stupid.

     - Albert Einstein


Probably not Einstein http://quoteinvestigator.com/2013/04/06/fish-climb/ (not that it matters in any deep way).


"The problem with quotes on the internet is that it can be difficult to verify their authenticity." - Abraham Lincoln


I remember seeing this on Facebook and looking in to it. Turns out some fish do climb trees!

Also no evidence that Einstein said it. Indeed he seems to contradict the general idea behind it.

Not everyone is a genius. Being amazing at woodwork or running is not "genius", it's admirable but different. Some people are [intellectual/intelligent] geniuses, period. Not everyone is.

The best way to take this aphorism IMO is "don't stress out if you're a fish and you're no good at climbing trees, if your thing is swimming that's good too".


As a non-fish that can't climb trees I find this even more depressing.


I'm inclined to say that some people 'just get' certain things like this or complex SQL. How they got to 'getting it', I don't know.


For starters, they've probably spent 100 to 1000x more time doing it than you or I have.

Perhaps their talent is actually just an attraction to studying it really hard. I, for instance, do not have the patience to study regex so much as to ever get the point where I could easily answer this question.


And monads.


My suspicion is that no one wants to explain monads because if they did it would be painfully obvious that they're a purgatorial way of getting around the basic mismatch between functional purity and actual computers.


Monads for all your I/O yeah, probably kind of dumb. Monads as a tool for dealing with cross-cutting concerns? Ridiculously useful. http://reasonablypolymorphic.com/blog/ideas-and-men has some basic examples.


In Haskell and PureScript, Monad is a typeclass just like any other.

They are an abstraction, effectful computation is not inherent to them, they represent computational contexts, e.g. the list monad is used to model non-determinism, the maybe monad represents computations that can fail, both of these are pure.

This is an excellent set of posts for building an intuition about monads: http://mvanier.livejournal.com/3917.html


> the list monad is used to model non-determinism

See this is the kind of thing that made learning Haskell so hard for me. I'm not a math guy, I'm terrible at math. I have no idea what "non-determinism" means, and I'd venture to guess that list monad is used to model "ordered collections of things".


Scala calls the monad bind "flatMap". Java actually implements it for Streams.

If you have a `Stream<Deck> decks;` of card decks, and want to get all the cards from all the decks, you might try to do something like `decks.map(deck -> deck.getCards().stream())`, but that will give you the type `Stream<Stream<Deck>>`. That sucks, we just want a single stream, not a stream of streams, so we call the function that flattens at the same time. `decks.flatMap(deck -> deck.getCards().stream())`.

Nondeterminism is just a standard use for this construction: given one choice, generate more choices, given that choice, generate more choices, do some computation with the choices, and then return the list of possible final results.


I'm not a math guy either and I did indeed find it difficult to penetrate the inner sanctum of Haskell for this very reason. I won't say I've completely succeeded in that effort, but I've come quite a long way.

I'm fairly certain that in this context, non-determinism means that there can be more than one answer/result.


"non-determinism" implies unordered.


See, to me, the sequence of keypresses I make while banging on my keyboard seem like it could be represented as an ordered list. I may be missing something, though.


If something is unordered it does not mean it may not be ordered. It only says you cannot assume that it is ordered. (A Collection in Java/C# is an unordered contract, but a List is also a Collection.)

But the idea of a list also suggests an ordering to me.


Actually, it's the other way around. Monad tutorials are so frequent that it's become a meme among Haskellers that everyone will eventually write a monad tutorial to relay his own mental model of monads.


And as someone who's read what is probably most of those tutorials - I fulfill the other half of the meme.

"To understand monads, you first need to understand monads."


Sorry -- you're right. I should have said "explain monads well."


Naw, everyone who understands monads wants nothing more then to explain it. Hence the profusion of tutorials and over-focusing on the concept.


This is probably a stupid question but what is the minimal formal language that has an accepting path as a statement in the language?


A formal language is basically just a mathematical set of acceptable strings made up of symbols in a (typically finite) alphabet; thus the smallest is probably the empty set, or if you insist on an accepting path, a set containing only the empty string.

The interesting part comes up when you want to match an infinite set of strings with an algorithm. Probably the simplest such algorithm would be "regardless of input, accept" which isn't exactly useful but would suffice.

It's dissatisfying to have an algorithm that puts weird, arbitrary restrictions on composing languages; for example it's usually the case that if you take a language L1 and a language L2 that are both acceptable to the algorithm, the concatenation (for every string in L1 and every string in L2, the concatenation of their strings is in the new language) language is usually acceptable to the algorithm.

Some commonly used languages do not have all of these properties; in particular the intersection of two context-free languages may not be a CFL, nor the complement. Deterministic context free languages are even more restrictive. You seem to need to give some things up as you move up the language hierarchy in power; in particular string homomorphism and intersection seem to be mutually exclusive in languages more powerful than regular expressions.

If you're interested, the theory of abstract families of languages has been studied, although I do not know a lot about it myself.


I think I might be missing something. Wouldn't something like "a" be a valid 1-byte regular expression that only matches instances of "a"?


No. "a" produces a match when run on "a" and "bad" and "aaaa" and many more things.

The regex ^a$ only matches "a", but it doesn't match itself.


That isn't a match...

I can still see the neat aspect of the post. But that is not a match.


What do you mean?


Regex libraries often distinguish between a string containing a regex, and matching it.

The string "aaaa" is a Regular Expression that describes a language with exactly one string: "aaaa".

Some regular expression libraries have two operations: They can tell you if a string contains a match somewhere within it, or is exactly a match. Some other libraries only do the first operation, and you need to explicitly ask for "^aaaa$" to get an exact match.

This is one of the things on the differences between the formal definition of a Regular Expression, and a regex library.


Yeah. I want trying to be a jerk. Pedantic some, but I thought the difference between match and find was pretty general in regex land.


> NOTE, delimiters must be included

Your regex is actually /a/ which does not match the string "/a/".


I would say that adding slashes is quite arbitrary constraint, and it is really language specific syntax that is limited to minority of languages (I believe it was introduced in Perl). If you do regexp in Java, Python, C you don't use slashes.

Edit: looks like slashes weren't even used in the example.


It's a pretty boring question if you don't make that requirement, though. The whole exercise is arbitrary, so it doesn't bother me.


Perl's use of this syntax had ancestors in awk, sed, ed, and qed - originally just for string replacements in qed until Ken Thompson wrote a version that did regexps too. It dates back to 1967. https://www.bell-labs.com/usr/dmr/www/qed.html

Deutsch&Lampson's original paper on qed showing s/// used for string substitution prior to it gaining regexps is here: http://research.microsoft.com/en-us/um/people/blampson/04-On...

Incidentally the name 'grep' originated here, 'G/RE/P' was a qed command to do a Global Regular Expression match and Print the matches.


Okay, I'll quote some more from the question:

> […] for the sake of the challenge, let the expression be delimited (starting symbol, expression, ending symbol ex: /fancypantpattern/ or @[^2048]@), if you want to argue quotes as your delimiter, so be it. I think given the apparent difficulty of this problem it won't make much of a difference.


Ahh. Thank you.


Completely unrelated: http://codegolf.stackexchange.com/questions/99026/build-me-a... (using Trumpscript)


funny how many times this needed to be submitted before it stuck. https://news.ycombinator.com/from?site=stackexchange.com


Wow, and none of them even have [dupe] on them.

HN's story ranking system is undisclosed to prevent startups and/or VCs from gaming things AFAIK, but I'm wondering if it pushes things up based on time of day.

Alternatively, maybe the people who check /new are only around at certain times.

In any case, I'm definitely checking /new a lot more myself. Most of the things I've posted haven't "stuck" either, I realize just how many people this seems to be hitting.


>I'm wondering if it pushes things up based on time of day. //

What do you mean by that? I don't think HN sniffs location so hour would it even work with a global audience?


That is an incredibly good point, I certainly wasn't thinking clearly while I was figuring that out, hah

I remember seeing someone do some data analysis of all of HN and noted that there was a certain time of day that always seemed to be consistently upvoted the most. I think I was overextrapolating that data, especially considering I've no idea what the article was called now so I couldn't review it.


Straight-up wizardry.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: