Hacker News new | past | comments | ask | show | jobs | submit login

Mostly searching for regexes in code and in filenames. But yes, history would sometimes be useful too.

And this is really the bare minimum. An even better search would e.g. allow searching for identifiers (comments and strings disregarded).




I believe GitHub uses ES, and currently allowing users to perform regex can bog down the entire cluster if the regex is malformed. This is a problem with Solr too. I believe there was some effort to resolve this at the Lucene level, but I am not sure the status.

In other words, I agree with you but I also know it is an extremely hard problem to solve at Github's scale.


If they use ES (or other Lucene) they could already be doing fuzzy search via ES's own ngram support. At that point indexed regex search as described in the article isn't far away. You just need to bridge the gap between a regex and a trigram index.


Agreed, but its not obvious to me how you would bridge that gap without a lot of custom code.


yeah, that would be wonderful indeed! The current search functionality of github is nearly useless, which is a shame.


I am hacking on something that allows searching based on identifiers. Feel free to play around at https://www.codegrep.com, currently indexing few thousand popular repositories.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: