RAG really is not the right tool for that. It does not event prevent hallucinations. It is useful because it can retrieve information from documents the model never saw during its training phase and does not require a costly re-training. It is not a fine tuning either, and it cannot fundamentally change the model’s behaviour.
In my experience, it really does reduce the risk of hallucination, especially if paired with a prompt which explicitly instruct the model to use only facts from the context window.
Another strategy is to provide unique identifiers for the rag chunks dropped into the context and ask the model to cross reference in its response. You can then check that the response is exciting evidence from the context with Simple pattern match.
not OP, but for the "unique identifiers", you can think of it like the footnote style of markdown links. Most of these models are fine-tuned to do markdown well, and a short identifier is less likely to be hallucinated (my philosophy anyway), so it usually works pretty well. For the examples, something like this can work
Instead of having the LLM generate the links couldn’t you use a combination of keyword matching and similarity on the model output and the results to automatically add citations? You could use a smaller NLP model or even a rule based system to extract entities or phrases to compare. I’m sure this is already being done by bing for example.
You definitely can do that. It’s just sometimes simpler to dump lots of stuff in context and then check it wasn’t made up.
It like the idea of using markdown footnotes. I think that would word well - ChatGPT does handle markdown really well.