I want to have the memory part of langchain down, vector store + local database + client to chat with an LLM (gpt4all model can be swapped with OpenAI api just switching the base URL)
Sorry for my ignorance. But memory refers to the process of using embeddings for QA right?
The process roughly is:
Ingestion:
- Process embeddings for your documents (from text to array of numbers)
- Store your documents in a Vector DB
Query time:
- Process embeddings for the query
- Find documents similar to the query using distance from other docs in the Vector db
- Construct prompt with format:
"""
Answer question using this context:
{DOCUMENTS RETRIEVED}
Question: {question}
Answer:
"""
Is that correct? Now, my question is, can the models be swapped easily? Or that requires a complete recalculation of the embedding (and new ingestion)?
The embeddings can be based on a different model to the one you pass them as context to. So you could upgrade the summmariser model without upgrading the embeddings.
I want to have the memory part of langchain down, vector store + local database + client to chat with an LLM (gpt4all model can be swapped with OpenAI api just switching the base URL)
https://github.com/aldarisbm/memory
It's still got ways to go, if someone wants to help let me know :)