Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can someone more versed in the field comment on whether this is just an ad or actually something unique or novel.

What they're describing as "reverse RAG" sounds a lot to me like "RAG with citations", which is a common technique. Am I misunderstanding?




"Mayo’s LLM split the summaries it generated into individual facts, then matched those back to source documents. A second LLM then scored how well the facts aligned with those sources, specifically if there was a causal relationship between the two."

It doesn't sound novel from the article. I built something similar over a year ago. Here's a related example from Langchain "How to get a RAG application to add citations" https://python.langchain.com/docs/how_to/qa_citations/


I don't think you're getting it, it's not traditional RAG citations.

They are checking the _generated_ text by trying to find documents containing the facts, then rating how relevant (casually related) those facts are. This is different from looking up documents to generate an answer for a prompt. It's the reverse. Once the answer has been generated they essentially fact check it.


A consultant sold them something with a high margin, they need to justify the bill.


What I imagine

1. Use LLM, possibly already grounded by typical RAG results, to generate initial answer.

2. Extract factual claims / statements from (1). E.g. using some LLM.

3. Verify each fact from (2). E.g using separate RAG system where the prompt focuses on this single fact.

4. Rerun the system with the results from (3) and possibly (1).

If so, this (and variations) isn't really anything new. These kind of workflows have been utilized for years (time flies!).


The article is too high level to figure out exactly what they are doing.


in the publishing industry we call that "cooking a press release". the "news" article was entirely written and mailed by the PR of the subject (mayo clinic here) and the "journalist" just copy and paste. at most they will reword a couple paragraphs not for fear of looking bad, but just to make it fit in their number of words required for the column they are publishing under.

so, yes, an advertisement.


Isn't that essentially how the AP has functioned for over a century? (Consume press release, produce news article, often nearly verbatim.)


You’re thinking of PR Newswire.

The AP pays reporters to go out and report.


I read a lot of AP articles that aren't verbatim press releases.. you must be in the classifieds or something.


> where the model extracts relevant information, then links every data point back to its original source content.

I use ChatGPT. When I ask it something 'real/actual' (non-dev) I ask it to give me references in every prompt. So when I ask it to tell me about "the battle of XYZ" I ask it within the same prompt to give me websites/sources, that I click and check if the quote is actually from there (a quick Ctrl+F will bring up the name/date/etc.)

Since I've done this I get near-zero hallucinations. They did the same.


> (a quick Ctrl+F will bring up the name/date/etc.)

Have you tried asking for the citation links to also include a WebFragment to save you the searching? (e.g. https://news.ycombinator.com/item?id=43372171#:~:text=a%20qu... )


I feel that this is under{rated,used}.


I was waiting so long for this to finally arrive in Firefox (and now I can't seem to unsubscribe from Bugzilla for some reason -- I guess "because Bugzilla"). However, in true FF fashion, I'm sure it'll be another 10 years before the "Copy link to selection" arrives like its Chrome friend, so I have an extension to tide me over :-/


Do you know how to use this feature in, say, Vivaldi if it is even possible? I want to select text and have it appended to the URL.


I don't use Vivaldi in order to know what its limitations are, but as I mentioned I have to run with https://addons.mozilla.org/en-US/firefox/addon/link-to-text-... (source: https://github.com/GoogleChromeLabs/link-to-text-fragment Apache 2) due to FF itself not offering that functionality natively

TBH, I also previously just popped open dev-tools and pasted the copied text into console.log("#:~:text="+encodeURIComponent(...)) which was annoying, for sure, but I didn't do it often enough to enrage me. I believe there's a DOM method to retrieve the selected text which would have made that much, much easier but I didn't bother looking it up

For posterity, I also recently learned that Chrome just appends the :~: onto any existing anchor, e.g. <https://news.ycombinator.com/item?id=43336609#43375605:~:tex...> but of course Firefox hurp-derps that style link


I installed "link-to-text-fragment", but: https://news.ycombinator.com/reply?id=43376342&goto=threads%... does not seem to work. :(

It only highlights once I copied link to selected text, but if I click on the link, it does not seem to work, yours did.


I have an application that does this. When the AI response comes back, there's code that checks the citation pointers to ensure they were part of the request and flags the response as problematic if any of the citation pointers are invalid.

The idea is that, hopefully, requests that end up with invalid citations have something in common and we can make changes to minimize them.


This sounds like a good technique that can be fully automated. I wonder why this isn't the default behavior or at least something you could easily request.


I do this as well.

There was an article about Sam Altman that stated that ex/other OAI employees called him some bad_names and that he was a psychopath...

So I had GPT take on the role of an NSA cybersecurity and crypto profiler and read the thread and the article and do a profile dossier of Altman and have it cite sources...

And it posted a great list of the deep psychology and other books it used to make its claims

Which basically was that Altman is a deep opportunist and showed certain psychopathological tendencies.

Frankly - the statement wasn't as interesting of how it cited the expert sources and the books it used in the analysis.

however, after this OAIs newer models were less capable of doing this type of report, which was interesting.


Well, the title said "secret" after all ...


Reverse RAG sounds like RAG with citations and then also verify the citations (e.g. go in reverse).


It sounds like they go further by doing output fact extraction & matching back to the RAG snippets. Presumably this is addition to matching back the citations. I've seen papers write about doing that with knowledge graphs, but at least for our workloads, it's easy to verify directly.

As a team who has done similar things for louie.ai - think real-time reporting, alerting, chatting, & BI on news, social media, threat intel, operational databases etc - I find it interesting less on breaking new ground but confirming the quality benefit when being more broadly used in serious contexts. Likewise, hospitals are quite political internally for this stuff, so seeing which use cases got the greenlight to go all the way through is also interesting.


It doesn’t solve the biggest problem with RAG, which is retrieving the correct sources in the first place.

It sounds like they just use a secondary LLM to check if everything that was generated can be grounded in the provided sources. It might help with hallucinations, but it won’t improve overall performance of proper retrieval


Can’t fool the patent inspectors if they don’t name it like that


There's probably a patent for: "Just double-checking before answering to the user".


I wish somebody would release an AI that did it.


Is it not what "Reason" or "Thinking" features are for? Sort of...


This is just standard practice AFAICT. I’ve done it. Everybody I know who’s built apps for unstructured document retrieval etc. is doing it. It works better than The naïve approach, but there are plenty of issues and tuning with this approach too.


They leverage https://en.wikipedia.org/wiki/CURE_algorithm alongside many subsequent LLMs to do ranking and scoring.


It does sound like that.

I guess they have data they trust.

If that data ever gets polluted by AI slop then you have an issue.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: