Hacker News new | past | comments | ask | show | jobs | submit login

The short answer is context. The reason why Google's internal code search is so good, is it is tied into their build system. This means, when you search, you know exactly what files to consider. Without context, you are making an educated guess, with regards to what files to consider.



How exactly integration with build system helps Google? Maybe you could give specific example?..


Try clicking around https://source.chromium.org/chromium/chromium/src, which is built with Kythe (I believe, or perhaps it's using something internal to Google that Kythe is the open source version of).

By hooking into C++ compilation, Kythe is giving you things like _macro-aware_ navigation. Instead of trying to process raw source text off to the side, it's using the same data the compiler used to compile the code in the first place. So things like cross-references are "perfect", with no false positives in the results: Kythe knows the difference between two symbols in two different source files with the same name, whereas a search engine naively indexing source text, or even something with limited semantic knowledge like tree sitter, cannot perfectly make the distinction.


Yes the clicking around on semantic links on source.chromoum.org is served off of an index built by the Kythe team at Google.

The internal Kythe has some interesting bits (mostly around scaling) that aren't open sourced, but it's probably doable to run something on chromium scale without too much of that.

The grep/search box up top is a different index, maintained by a different team.


If you want to build a product with a build system, you need to tell it what source to include. With this information, you know what files to consider and if you are dealing with a statically typed language like C or C++, you have build artifacts that can tell you where the implementation was defined. All of this, takes the guess work out of answering questions like "What foo() implentation was used".

If all you know are repo branches, the best you can do is return matches from different repo branches with the hopes that one of them is right.

Edit: I should also add that with a build system, you know what version of a file to use.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: