Major props to the authors of this library. I re-built https://progscrape.com [1...

snorremd · on May 27, 2024

It is a very nice library. I’m using it for a very work in progress incremental email backup CLI tool for email providers using JMAP.

I wanted users to be able to search their backups. As I’m using Rust Tantivy looked like just the right thing for the job. Indexing happens so fast for an email I did not bother to move the work to a separate thread. And search across thousands of emails seems to be no problem.

If anyone wants search for their Rust application they should take a look at Tantivy.

CaptainOfCoit · on May 28, 2024

Tiny bug report: https://progscrape.com/?search=grep shows "Error: PersistError(UnexpectedError("Storage fetch panicked"))"

mmastrac · on May 28, 2024

It looks like there was a bug with certain search queries that wedged a mutex because they failed to parse on my end. Deploying a fix now. Thanks!

OtomotO · on May 27, 2024

Thanks for that! A couple of days ago I used meilisearch for a quick proof of concept, but I'll check out tantivy again via your repo.

I basically just need a fulltext search.

worble · on May 28, 2024

If you just need full text search, assuming you're already using Postgres you can get quite far just using it's own primitives

https://www.postgresql.org/docs/current/textsearch.html

https://www.crunchydata.com/blog/postgres-full-text-search-a...

burntsushi · on May 28, 2024

AFAIK, PostgreSQL doesn't provide a way to get the IDF of a term, which makes its ranking function pretty limited. tf-idf (and its varians, like Okapi BM25) are kinda table stakes for an information retrieval system IMO.

I'm not saying PostgreSQL's functionality is useless, but if you need ranking based on the relative frequency of a term in a corpus, then I don't believe PostgreSQL can handle that unless something has changed in the last few years. Usually the reason to use something like Lucene or Tantivy is precisely for its ranking support that incorporates inverse document frequency.

philippemnoel · on May 28, 2024

Postgres's FTS is actually quite solid! You can get very far with just the built-in tsvector. The ranking could be improved, though, which was one of the reasons for creating pg_search in the first place: https://github.com/paradedb/paradedb/tree/dev/pg_search (disclaimer: I work on pg_search @ ParadeDB)

burntsushi · on May 29, 2024

Okay, but I didn't say it wasn't solid. I just said its ranking wasn't great because it lacks IDFs. It seems like we must be in violent agreement, given that you work on something that must be adding IDFs to PostgreSQL FTS. :P