I quizzed ChatGPT about math a few days back. It correctly answers that 1023 is odd, correctly answers that the square of an odd number is also odd, and is about 50/50 on whether it decides 1023^2 is even or odd.
The author of the linked post repeatedly refers to ChatGPT's "reasoning". But ChatGPT doesn't reason, it just generates text. Sometimes that text might look like reasoning to a human--it's the sort of text that a human who reasoned through the question would generate--but there was no reasoning involved. It's important to understand this.
> Based on my teaching experience, I’d say chatGPT scored like a second year math major.
Based on my experience, ChatGPT scored worse than my 5-year-old, who gets the 1023^2 question right every time.
> 1023 squared is even because it is equal to 1,046,529, which is an even number. To determine if a number is even or odd, you can check whether it is divisible by 2. If it is divisible by 2, it is even. If it is not divisible by 2, it is odd.
> In general, any number that is a perfect square will be even if the number being squared is odd, and it will be odd if the number being squared is even. This is because odd numbers, when squared, always result in an odd number, and even numbers, when squared, always result in an even number.
> 1023 squared is 1048129, which is an odd number. To determine whether a number is even or odd, you can check whether it is divisible by 2. If it is, it is an even number. If it is not, it is an odd number. In this case, 1048129 is not divisible by 2, so it is an odd number.
I reminds me of study methods that involve "memorizing" as opposed to thinking about the problem, and applying various methods to solve it. It can solve some algorithm problems at a certain level, but struggles at a detailed answer in some scenarios. I guess it's not surprise, given ML is essentially exploring the local solution space of some abstract problem; it reminds me of a Taylor series expansion or a breadth-first search of an abstract problem space.
One day it will "dual major" in math and then develop a determination for when the situation demands rigor (hard calculation) or when it demands "correct word estimation" (its current interaction mode).
The author of the linked post repeatedly refers to ChatGPT's "reasoning". But ChatGPT doesn't reason, it just generates text. Sometimes that text might look like reasoning to a human--it's the sort of text that a human who reasoned through the question would generate--but there was no reasoning involved. It's important to understand this.
> Based on my teaching experience, I’d say chatGPT scored like a second year math major.
Based on my experience, ChatGPT scored worse than my 5-year-old, who gets the 1023^2 question right every time.