Thanks for sharing the article. How exactly are you combining BM25 and fastText? Are you combining the TF-IDF score + embedding distance? What are the weights for each of these?
Neat. I wonder how GPT-4’s query expansion might compare with SPLADE or similar masked BERT methods. Also if you really want to go nuts you can apply term expansion to the document corpus.
Very cool! Glad to see continued research in this direction. I’ve really enjoyed reading the Mixedbread blog. If you’re interested in retrieval topics, they’re doing some cool stuff.
Sounds a lot like BM25 weighted word embeddings (e.g. fastText).