If I had a senior member of the team that was incredibly knowledgeable but occasionally lied, but in a predictable way, I would still find that valuable. Talking to people is a very quick and easy way to get information about a specific subject in a specific context, so I could ask them targetted questions that are easy to verify, the worst thing that happens is I 'waste' a conversation with them.
Sure, but LLMs don't lie in a predictable way. Its just their nature that they output statistical sentence continuations, with a complete disregard for the truth. Everything that they output is suspect, especially the potentially useful stuff that you don't know whether it's true or false.
They do lie in a predictable way: if you ask them for a widely available fact you have a very high probability of getting the correct answer, if you ask them for something novel you have a very high probabilty of getting something made up.
If I'm trying to use some tool that just got released or just got a big update, I wont use AI, if I want to check the syntax of a for loop in a language I don't know I will. Whenever you ask it a question you should have an idea in your mind of how likely you are to get a good answer back.
I suppose, but they can still be wrong on the common facts like number of R's in strawberry that are counter-intuitive.
I saw an interesting example yesterday of type "I have 3 apples, my dad has 2 more than me ..." where of the top 10 predicted tokens, about 1/2 led to the correct answer, and about 1/2 didn't. It wasn't the most confident predictions that lead to the right answer - pretty much random.
The trouble with LLMs vs humans is that humans learn to predict facts (as reflected in feedback from the environment, and checked by experimentation, etc), whereas LLMs only learn to predict sentence soup (training set) word statistics. It's amazing that LLM outputs are coherent as often as they are, but entirely unsurprising that they are often just "sounds good" flow-based BS.
I think maybe this is where the polarisation of those who find chatGPT useful and those who don't comes from. In this context, the number of r's in strawberry is not a fact: its a calculation. I would expect AI to be able to spell a common word 100% of the time, but not to be able to count letters. I don't think in the summary of human knowledge that has been digitised there are that many people saying 'how many r's are there in strawberry', and if they are I think that the common reply would be '2', since the context is based on the second r. (people confuse strawbery and strawberry, not strrawberry and strawberry).
Your apples question is the same, its not knowledge, it's a calculation, it's intelligence. The only time you're going to get intelligence from AI at the moment is to ask a question that a significantly large number of people have already answered.
True, but that just goes to show how brittle these models are - how shallow the dividing line is between primary facts present (hopefully consistently so) in the training set, and derived facts that are potentially more suspect.
To make things worse, I don't think we can even assume that primary facts are always going to be represented in abstract semantic terms independent of source text. The model may have been trained on a fact but still fail to reliably recall/predict it because of "lookup failure" (model fails to reduce query text to necessary abstract lookup key).