Yep, that's the way it's currently implemented in langchain.
The 4 is a hyperparameter you can change, though, so you could set it to 10 as well.
The way it works is that it first looks up the N most relevant documents (N being 4 in the default case) in the FAISS store relevant to the question, so it uses distance of embedding vectors for this lookup.
Then it uses GPT3 to get summaries of the 4 entries related to the question and finally all the summaries together with the question will lead to the answer.
In doing so, you can trace the source where the answer came from and can also point to that URL in the end.
When you make N larger it just gets more expensive in terms of your API costs.
Looks interesting! Have you considered a proper vector database like Qdrant (https://qdrant.tech)? FAISS runs on a single machine, but if you want to scale things up, then a real database makes it a lot easier. And with a free 1GB cluster on Qdrant Cloud (https://cloud.qdrant.io), you can store quite a lot of vectors. Qdrant is also already integrated with Langchain.
Using something like Weaviate, which can be started in Docker with a one-liner, will give the ability to move away or toward dense vectors by concept. While doing dot product with manual code is fairly easy, using Weaviate to do the lifting (for embeddings as well) makes things super simple.
The 4 is a hyperparameter you can change, though, so you could set it to 10 as well.
The way it works is that it first looks up the N most relevant documents (N being 4 in the default case) in the FAISS store relevant to the question, so it uses distance of embedding vectors for this lookup.
Then it uses GPT3 to get summaries of the 4 entries related to the question and finally all the summaries together with the question will lead to the answer.
In doing so, you can trace the source where the answer came from and can also point to that URL in the end.
When you make N larger it just gets more expensive in terms of your API costs.