"Mayo’s LLM split the summaries it generated into individual facts, then matched those back to source documents. A second LLM then scored how well the facts aligned with those sources, specifically if there was a causal relationship between the two."
It doesn't sound novel from the article. I built something similar over a year ago. Here's a related example from Langchain "How to get a RAG application to add citations" https://python.langchain.com/docs/how_to/qa_citations/
I don't think you're getting it, it's not traditional RAG citations.
They are checking the _generated_ text by trying to find documents containing the facts, then rating how relevant (casually related) those facts are. This is different from looking up documents to generate an answer for a prompt. It's the reverse. Once the answer has been generated they essentially fact check it.
in the publishing industry we call that "cooking a press release". the "news" article was entirely written and mailed by the PR of the subject (mayo clinic here) and the "journalist" just copy and paste. at most they will reword a couple paragraphs not for fear of looking bad, but just to make it fit in their number of words required for the column they are publishing under.
> where the model extracts relevant information, then links every data point back to its original source content.
I use ChatGPT. When I ask it something 'real/actual' (non-dev) I ask it to give me references in every prompt. So when I ask it to tell me about "the battle of XYZ" I ask it within the same prompt to give me websites/sources, that I click and check if the quote is actually from there (a quick Ctrl+F will bring up the name/date/etc.)
Since I've done this I get near-zero hallucinations. They did the same.
I was waiting so long for this to finally arrive in Firefox (and now I can't seem to unsubscribe from Bugzilla for some reason -- I guess "because Bugzilla"). However, in true FF fashion, I'm sure it'll be another 10 years before the "Copy link to selection" arrives like its Chrome friend, so I have an extension to tide me over :-/
TBH, I also previously just popped open dev-tools and pasted the copied text into console.log("#:~:text="+encodeURIComponent(...)) which was annoying, for sure, but I didn't do it often enough to enrage me. I believe there's a DOM method to retrieve the selected text which would have made that much, much easier but I didn't bother looking it up
I have an application that does this. When the AI response comes back, there's code that checks the citation pointers to ensure they were part of the request and flags the response as problematic if any of the citation pointers are invalid.
The idea is that, hopefully, requests that end up with invalid citations have something in common and we can make changes to minimize them.
This sounds like a good technique that can be fully automated. I wonder why this isn't the default behavior or at least something you could easily request.
There was an article about Sam Altman that stated that ex/other OAI employees called him some bad_names and that he was a psychopath...
So I had GPT take on the role of an NSA cybersecurity and crypto profiler and read the thread and the article and do a profile dossier of Altman and have it cite sources...
And it posted a great list of the deep psychology and other books it used to make its claims
Which basically was that Altman is a deep opportunist and showed certain psychopathological tendencies.
Frankly - the statement wasn't as interesting of how it cited the expert sources and the books it used in the analysis.
however, after this OAIs newer models were less capable of doing this type of report, which was interesting.
It sounds like they go further by doing output fact extraction & matching back to the RAG snippets. Presumably this is addition to matching back the citations. I've seen papers write about doing that with knowledge graphs, but at least for our workloads, it's easy to verify directly.
As a team who has done similar things for louie.ai - think real-time reporting, alerting, chatting, & BI on news, social media, threat intel, operational databases etc - I find it interesting less on breaking new ground but confirming the quality benefit when being more broadly used in serious contexts. Likewise, hospitals are quite political internally for this stuff, so seeing which use cases got the greenlight to go all the way through is also interesting.
It doesn’t solve the biggest problem with RAG, which is retrieving the correct sources in the first place.
It sounds like they just use a secondary LLM to check if everything that was generated can be grounded in the provided sources. It might help with hallucinations, but it won’t improve overall performance of proper retrieval
This is just standard practice AFAICT. I’ve done it. Everybody I know who’s built apps for unstructured document retrieval etc. is doing it. It works better than The naïve approach, but there are plenty of issues and tuning with this approach too.
What they're describing as "reverse RAG" sounds a lot to me like "RAG with citations", which is a common technique. Am I misunderstanding?