Hacker News new | past | comments | ask | show | jobs | submit login

In my experience, storing RAG chunks with a little bit of context helps a lot when doing the retrieval, then you can skip the whole "rerank" bit and halve your cost and latency.

With embedding/generative models becoming better with time, the need for a rerank step will be optimized away.




Huh? Rerank is always a boost on top of retrieval. So regardless of the chunking method or model you use, reranking with a good model will always result in higher MRR. And improvements in embedding models also will never solve the problem of merging lexical and vector search results. Rank/score fusion are flawed since both are hardly comparable and boosting only works sometimes. Whereas rerankers generally do a pretty good job at this. Performance is indeed the biggest issue here. Rerankers are slow as hell and simply not feasible for some use cases.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: