I actually find the first example strikingly similar to what would happen if a human was given a secretly broken tool.
If somebody asked me: "Hey, what's the area of a room 1235 x 738m, here's a calculator?" and the calculator gave me 935420, I would just say "935420 meters squared" as long as the rough number of digits seemed correct (and maybe as long as the last digit was 0, but I think with a less mathy person that wouldn't matter).
Should the calculator give me "3984" instead I would be like "wait, that's not right, let my calculate this by hand", which is what GPT-3 tried to do. It's just way worse at doing math "by hand".
A very interesting anecdote and another article which makes me wonder how a statistical prediction model can come so close to actual reasoning while seemingly using completely different tools to get there.
> how a statistical prediction model can come so close to actual reasoning while seemingly using completely different tools to get there.
There is a theory of brain function (predictive coding, predictive processing), that posits that the brain works by predicting expected input from the senses and then comparing that with actual input. This doesn't seem a million miles from a language model operating on words rather than sense data.
But that's just the thing. If we were having a normal conversation, I would react the same and ask you where you had found a room this big.
However, in the context of receiving the same run of text that GPT-3 did, printed on a sheet of paper, I would probably just go "eh, this is a math question, don't be a smartass".
The Aerium Hangar in Germany is the largest hanger in the world (it's not used as a hanger). It's 66,000 square metres, less than 1/10th of a 1235 x 738m space!
More commonly found than a private airplane hanger, for Brits at least, is a private train
If somebody asked me: "Hey, what's the area of a room 1235 x 738m, here's a calculator?" and the calculator gave me 935420, I would just say "935420 meters squared" as long as the rough number of digits seemed correct (and maybe as long as the last digit was 0, but I think with a less mathy person that wouldn't matter).
Should the calculator give me "3984" instead I would be like "wait, that's not right, let my calculate this by hand", which is what GPT-3 tried to do. It's just way worse at doing math "by hand".
A very interesting anecdote and another article which makes me wonder how a statistical prediction model can come so close to actual reasoning while seemingly using completely different tools to get there.