Aren’t all approaches to search indices essentially based on vector similarity? ...

PaulHoule · on Aug 12, 2021

I worked on a heavily hacked Solr installation that used a neural network that cooked documents down to 50-d vectors which were similarity searched and also used an "information theoretic" similarity function for the ordinary word frequency vectors before that was officially supported.

Performance in terms of indexing time and throughput was awful, but searches were quick enough and performance in terms of "write a paragraph about your invention and find related patents" was fantastic.

IfOnlyYouKnew · on Aug 12, 2021

Hmmm… no?

Consider word embedding, where finding the most similar words given some input, you need to do a NN over 300 dimensions. That’s a common task, and it’s largely unsolved. Postgres, for example, allows you to create indices over their „cube“ datatype, but those are slower than the naive brut-force approach.