Sorry random question - do vector dbs work across models? I'd guess no, since embeddings are models specific afaik, but that means that a vector db would lock you into using a single LLM and even within that, a single version, like Claude-3.5 Sonnet, and you couldn't move to 3.5 Haiku, Opus etc., never mind ChatGPT or Llama without reindexing.
The vector databases are here to store vectors and calculating distance between vectors.
The embeddings model is the model that you pick to generate these vectors from a string or an image.
You give "bart simpson" to an embeddings model and it becomes (43, -23, 2, 3, 4, 843, 34, 230, 324, 234, ...)
You can imagine it like geometric points in space (well, it's a vector though), except that instead of being 2D, or 3D-space, they are typically in higher-number of dimensions (e.g: 768).
When you want to find similar entries, you just generate a new vector "homer simpson" (64, -13, 2, 3, 4, 843, 34, 230, 324, 234, ...) and send it to the vector database and it will return you all the nearest neighbors (= the existing entries with the smallest distance).
To generate these vectors, you can use any model that you want, however, you have to stay consistent.
It means that once you are using one embedding model, you are "forever" stuck with it, as there is no practical way to project from one vector space to another.
that sucks :(. I wonder if there are other approaches to this, like simple word lookup, with storing a few synonyms, and prompting the LLM to always use the proper technical terms when performing a lookup.
Back of the book index or inverted indexes can be stored in a set store and give decent results that compare to vector lookups. The issue with them is you have to do an extraction inference to get the keywords.
The sibling comments seem to be correct in their technical explanations, but miss the meaning I'm getting from your question.
My understanding is you want to know "are vector DBs compatible with specific LLMs, or are we stuck with a specific LLM if we want to do RAG once we've adopted a specific vector store?"
And the answer to that is that the LLM never sees the vectors from your DB. Your LLM only sees what you submit as context (ie the "system" and "user" prompts in chat-based models).
The way RAG works is:
1 - end-user submits a query
2 - this query is embedded (with the same model that was used to compile the vector store) and compared (in the vector store) with existing content, to retrieve relevant chunks of data
3 - and then this data (in the form of text segments) is passed to the LLM along with the initial query.
So, in a sense you're "locked in" in the sense that you need to use the same embedding model for storage and for retrieval. But you can definitely swap out the LLM for any other LLM without reindexing.
An easy way to try this behavior out as a layperson is to use AnythingLLM which is an open-source desktop client, that allows you to embed your own documents and use RAG locally with open-weight models or swap out any of the LLM APIs.
Embedding is a transformation which allows us to find semantically relevant chunks from a catalogue given a query. Through some nearness criteria, you would retrieve "semantically relevant" chunks which along with query would be fed to LLMs and ask them to synthesize the best answer. Vespa docs are very great if you are thinking of building in this space. Retrieval part is independent of synthesis, hence it has its separate leaderboard on huggingface.