Is it it even technically for the model to know it’s hallucinating?

simonbutt · on June 1, 2023

You won't catch everything but I've found if you specify that it's okay to respond that it doesn't know the answer and essentially accuracy on a smaller subset of answers > responding to all, it often will say it doesn't know rather than making something up.

Another option could be - assuming this is an LLMChain that adds the most similar (n) embedding(s) to the prompt before passing to LLM - basic entity recognition to find the asset name from the query initially and dynamic prompt that if the most similar results don't contain the asset name, don't provide examples and respond saying you don't know.

__loam · on June 1, 2023

So no then

jsight · on June 1, 2023

I feel like "hallucinating" is the wrong word. It is essentially predicting the next word based upon its neural net and training. If this training does not have the right data, it will predict things that are wrong.

While knowing might be impossible, it seems like the model could provide a confidence level and only provide answers that exceed some threshold. It'd be a bit like asking a human "are you sure about that?"

And in practice, I really don't think it is that different. We humans effectively make things up all the time. Sometimes we are well aware of our educated guesses and sometimes we are less aware.

It isn't realistic to expect an artificial intelligence to be vastly better than human intelligence in this regard.

yreg · on June 1, 2023

The catch is that the models don't have confidence. They cannot distinguish between knowing something for sure, guessing something and hallucinating false knowledge.

Perhaps researchers will find some clever solution to mitigate this, but for now hallucinating is a pretty good word precisely because the model doesn't give confidence.

T-A · on June 1, 2023

The Internal State of an LLM Knows When its Lying

https://arxiv.org/abs/2304.13734

yreg · on June 1, 2023

It's not surprising that this information is somehow saved in the internal state, but it is surprising that they were able to read it to some level.

I don't think the model can access this information, but an external "lie detector" would be interesting. Thanks for the paper.

shreyshnaccount · on June 2, 2023

that seems wrong.. don't llms runs on probability distributions? Won't they have confidence built in?

yreg · on June 2, 2023

I don't think probability of tokens can be mistaken for confidence of knowledge being correct. I'd take it more as a fitness whether the tokens fit naturally in the sentence.

shreyshnaccount · on June 2, 2023

Makes sense, but i wonder if there's some way to retrofit that functionality onto it