Hacker News new | past | comments | ask | show | jobs | submit login

> Finally, building HNSW indices in Postgres is still extremely slow (even with parallel index builds), so it is difficult to experiment with index hyperparameters at scale

For anyone coming across this without much experience here, for building these indexes in pgvector it makes a massive difference to increase your maintenance memory above the default. Either as a separate db like whakim mentioned, or for specific maintenance periods depending on your use case.

``` SHOW maintenance_work_mem; SET maintenance_work_mem = X; ```

In one of our semantic search use cases, we control the ingestion of the searchable content (laws, basically) so we can control when and how we choose to index it. And then I've set up classic relational db indexing (in addition to vector indexing) for our quite predictable query patterns.

For us that means our actual semantic db query takes about 10ms.

Starting from 10s of millions of entries, filtered to ~50k (jurisdictionally, in our case) relevant ones and then performing vector similarity search with topK/limit.

Built into our ORM and zero round-trip latency to Pinecone or syncing issues.

EDIT: I imagine whakim has more experience than me and YMMV, just sharing lesson learned. Even with higher maintenance mem the index building is super slow for HNSW




Thanks for sharing! Yes, getting good performance out of pgvector with even a trivial amount of data requires a bit of understanding of how Postgres works.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: