Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Alerting in realtime RAG: spot changes to LLM answers, using few tokens (github.com/pathwaycom)
8 points by janchorowski on Nov 17, 2023 | hide | past | favorite | 5 comments
Hi I am Jan, CTO @ Pathway.

A use case we have been working on with LLMs is to let people know when an answer to their query changes due to revisions of source documents. Obviously, we want to avoid periodically re-computing all queries for the LLM.

Why I think it’s cool? - We don’t spin in a loop to repeat with the LLM. - Alerts are LLM-deduplicated - no spamming users with typo fixes - And the best - our framework, Pathway takes care of handling the updates, the example looks nearly like a regular, static RAG chatbot.

More context + GIF of how it works for Google Drive document alerts: https://pathway.com/developers/showcases/llm-alert-pathway

Happy to have your thoughts!




Thanks for sharing, Jan!

This real-time alerting use case can be also useful in many other areas. I am thinking of fraud detection, customer support, medical diagnosis, and treatment, or in manufacturing to predict when equipment will fail and alert if maintenance is needed. Or even monitoring model performance when LLMs can occasionally produce unexpected or undesirable outputs.


Jan, can you explain briefly how the deduplicator checks if the new answer is significantly different? Is there code in the repository we can take a look at?


Sure: when a new response is produced because some source documents have changed we ask an LLM to compare the responses and tell if they are significantly different. Even a simplistic prompt, like the one used in the example would do:

    Are the two following responses deviating?
    Answer with Yes or No.

    First response: "{old}"

    Second response: "{new}"
(used in https://github.com/pathwaycom/llm-app/blob/69709a2cf58cdf6ea...)


Couldn't you just compare the similarity of the embeddings? I imagine that would work in the vast majority of cases and save a lot of LLM calls.


That's a good idea, the deduplication criterion is easy to change, using an llm is faster to get started, but after a while a corpus of decisions is created and can be used to either select another mechanism, or e.g. train one on top of bert embeddings.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: