In my experience, storing RAG chunks with a little bit of context helps a lot when doing the retrieval, then you can skip the whole "rerank" bit and halve your cost and latency.
With embedding/generative models becoming better with time, the need for a rerank step will be optimized away.
Huh? Rerank is always a boost on top of retrieval. So regardless of the chunking method or model you use, reranking with a good model will always result in higher MRR.
And improvements in embedding models also will never solve the problem of merging lexical and vector search results. Rank/score fusion are flawed since both are hardly comparable and boosting only works sometimes. Whereas rerankers generally do a pretty good job at this.
Performance is indeed the biggest issue here. Rerankers are slow as hell and simply not feasible for some use cases.
With embedding/generative models becoming better with time, the need for a rerank step will be optimized away.