It is frustrating to read through demos which feature a hypothetical AI that greatly exceeds the capacities of any actual LLM, and which does not consider the serious risks of learners getting misled by confabulations.
It is especially frustrating when I recently tested GPT-4o on a factual question and got 1000 words which were all completely wrong, including fake citations.
It is especially frustrating to read this sci-fi daydreaming after talking to a high school science teacher who was forced to use generative AI tutors in their classes this year, even though these tutors are poorly tested and seem to have a ~20% confabulation rate. This particular teacher is technically sophisticated but even they sometimes get confused and misled by the chatbot. Students don't have a chance.
I think Matuschak has valuable insights on learning in general. But it seems incomplete to go through this AI thought experiment without discussing how inadequate current AI is to the task. "Technology will get better" but what if it takes 50 years?
No, I tested the paid GPT-4 last year on similar questions (animal cognition) and it was so bad I decided it was a waste of money. I actually don't care if it's maybe gotten better in the past year, and I'm certainly not spending money to find out. Last I checked the best LLMs still have a 5-15% confabulation rate on simple document summarization. In 2023 GPT-4 had a ~75% confabulation rate on animal cognition questions, but even 5% is not reliable enough for me to want to use it.
The high school AI tutor probably wasn't using GPT-4, but the district definitely paid a lot of money for the software.
I also hate this entire argument, that AI confabulations don't matter for free products. Unreliable software like GPT-4o shouldn't be widely released to the public as a cool new tech product, and certainly not handed out for free.
Humans have been doing that for years. The AI problem is so prevalent because it seems to put a magnifying lense up to the worst portions of ourselves, namely how we process information and deceive each other. As it turns out, liars and cheats tend to build more liars and cheats, also known as "garbage in, garbage out," which leaves me scratching my head as to what anyone thought was going to happen as LLMs got more powerful. Seems like many are afraid to have that conversation, though.
I have tried some chemistry problems on the latest models and they still get simple math wrong (mess up conversion between micro and milligrams for example) unless you tell them to think carefully.
It is especially frustrating when I recently tested GPT-4o on a factual question and got 1000 words which were all completely wrong, including fake citations.
It is especially frustrating to read this sci-fi daydreaming after talking to a high school science teacher who was forced to use generative AI tutors in their classes this year, even though these tutors are poorly tested and seem to have a ~20% confabulation rate. This particular teacher is technically sophisticated but even they sometimes get confused and misled by the chatbot. Students don't have a chance.
I think Matuschak has valuable insights on learning in general. But it seems incomplete to go through this AI thought experiment without discussing how inadequate current AI is to the task. "Technology will get better" but what if it takes 50 years?