You can also get efficient FTS with this method if you implement indexing in use...

piyh · on July 31, 2021

Is there a dumbed down version of this indexing conversation for someone who understands b-trees, but BM25 or user space indexing?

peterhunt · on July 31, 2021

Well, this post is in the context of an e2e encrypted DB, but it's subject to the same constraints: https://medium.com/@ZeroDB_/scalable-full-text-search-over-e...

If you understand btrees you understand the hardest part already :)

Basically, you need to design a search index that examines the fewest DB pages in order to find the result. The Lucene scoring method stores a mapping of term -> document[] sorted in relevance order. The main idea is that you can examine only the first n documents for each term in the search query in order to find the most relevant search results. Picking n is sort of tricky, but if your index is stored in this way it's possible to fulfill a large % of queries efficiently without downloading the whole index.

Here's a little Python implementation of what I mean by a "user space implementation". Note that it's a toy but it performs pretty well on some demo sklearn data sets: https://gist.github.com/petehunt/724eeb77189332db101ad7b0db8...