Hacker News new | past | comments | ask | show | jobs | submit login

I'm working on an ANN plugin for Elasticsearch. All data is stored on disk, you automatically get horizontal/distributed scaling handled by ES, and you can combine ANN queries with Elasticsearch queries. http://elastiknn.klibisz.com/

It's currently not as fast as the in-memory alternatives. Though it's not a perfect apples/apples comparison. Data is stored on disk, it's a JVM implementation rather than C/C++, and it's optimized for single queries rather than bulk.




YSK that ANN is already in the process of being added to Lucene:

https://github.com/elastic/elasticsearch/issues/42326

A different implementation is already in OpenDistro for Elasticsearch:

https://opendistro.github.io/for-elasticsearch-docs/docs/knn...


Yep, I'm aware.

The Lucene implementation seems early and slow-moving. Seems they are trying to create new storage formats and use graph-based search methods. OpenDistro wrapped a C++ binary that also uses a graph-based method. It works quite well, but only for L2 similarity and comes with the operational burden of running a rather large sidecar process completely disjoint from the JVM.

The approach I've taken is to support five similarity functions (L1, L2, Angular, Jaccard, Hamming), support sparse and dense vectors, implement everything inside the JVM with no sidecar processes and no changes to Lucene, and to use hashing-based search methods (i.e. LSH). IMO the last point has a clear advantage over using graph-based methods, because the hashes are treated just like regular text tokens which is clearly the optimal access pattern in ES/Lucene. Of course it will likely lose to a C++ implementation in terms of raw speed because it's the JVM, but IMO that matters less than making the plugin trivial to run and scale.

I don't think there's a definitively better approach yet. It's an interesting problem and it'll be interesting to see what ends up working well.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: