Mostly searching for regexes in code and in filenames. But yes, history would so...

rpedela · on Dec 2, 2018

I believe GitHub uses ES, and currently allowing users to perform regex can bog down the entire cluster if the regex is malformed. This is a problem with Solr too. I believe there was some effort to resolve this at the Lucene level, but I am not sure the status.

In other words, I agree with you but I also know it is an extremely hard problem to solve at Github's scale.

avar · on Dec 2, 2018

If they use ES (or other Lucene) they could already be doing fuzzy search via ES's own ngram support. At that point indexed regex search as described in the article isn't far away. You just need to bridge the gap between a regex and a trigram index.

rpedela · on Dec 2, 2018

Agreed, but its not obvious to me how you would bridge that gap without a lot of custom code.

enriquto · on Dec 2, 2018

yeah, that would be wonderful indeed! The current search functionality of github is nearly useless, which is a shame.

sp1982 · on Dec 3, 2018

I am hacking on something that allows searching based on identifiers. Feel free to play around at https://www.codegrep.com, currently indexing few thousand popular repositories.