Hacker News new | past | comments | ask | show | jobs | submit login

GPT3 can't do math



Well, things are moving quickly now.

> Release Notes (Jan 30)

> We’ve upgraded the ChatGPT model with improved factuality and mathematical capabilities.

I just asked it a question from a puzzle site, and the answer is correct, however a different question I posed was answered incorrectly, although it was a logic problem: [0]

Prompt: In a guess game , five friends had to guess the exact numbers of balls in a box. Friends guessed as 31 , 35, 39 , 49 , 37, but none of guess was right.The guesses were off by 1, 9, 5, 3, and 9 (in a random order). Can you determine the number of balls in a box ?

ChatGPT: Yes, we can determine the number of balls in the box.

ChatGPT: Since the guesses were off by 1, 9, 5, 3, and 9, in some order, the possible numbers of balls in the box are: 30, 32, 34, 36, 38, 40, 48, 46, 44, 42

ChatGPT: By checking all possible combinations of the guesses and the numbers they were off by, we can find that the number of balls in the box is 40.

---

Here is a more purely math problem, correct answer:

Prompt: What is the radius of a circle that has a circumference of 3.14 meters?

ChatGPT: The formula for the circumference of a circle is given by 2πr, where r is the radius. To find the radius of a circle with a circumference of 3.14 meters, we can solve for r by rearranging this formula:

ChatGPT: r = C/(2π), where C is the circumference.

ChatGPT: Plugging in the given value of C = 3.14, we have:

ChatGPT: r = 3.14 / (2π) = 0.5.

ChatGPT: So, the radius of a circle with a circumference of 3.14 meters is approximately 0.5 meters.

[0] https://gpuzzles.com/mind-teasers/guess-game-logic-problem/


It can do some trivial problems, but not others, and it can't consistently do real math (proving theorems), because it doesn't understand logic. It can only imitate it some of the time for trivial or often-presented proofs when the necessary steps have become embedded in its next-token-generation model.

It's still getting this wrong, for instance:

Prompt: You call heads/tails for 1000 coin tosses. You know the coin being used is weighted so that it comes up heads 52% of the time. Each correct coin toss prediction earns you $5. What should your strategy be, for calling heads/tails, in order to maximize how much money you receive?

ChatGPT: To maximize your money, you should call heads 52% of the time and tails 48% of the time, as those are the probabilities of getting heads and tails, respectively. This will give you an expected payout of $5 * 0.52 = $2.60 per head and $5 * 0.48 = $2.40 per tail, which is the highest expected value given the information about the coin being weighted.

It's getting closer to being able to prove that the square root of 2 is irrational, but it still flubs a step.

Prompt: Prove that the square root of 2 is irrational.

ChatGPT: ... This means that p can be expressed as 2r for some integer r. Substituting this into the equation above gives us:

ChatGPT: 2q^2 = (2r)^2 = 4r^2

ChatGPT: Dividing both sides by 4 gives us:

ChatGPT: q^2 = r^2

ChatGPT: So q must also be even, which means that both p and q have a common factor of 2...

It failed basic algebraic manipulation, but keeps going with the right reasoning simply because that's what it's been trained on, not because it's reflecting on the equation it just came up with. That's not math. That's mimicry.

Even if it's seen something many times in training, its accuracy gets worse with longer trains of "thought", as expected because a language model like that has ADHD. It can't focus enough on one logical thread, and other "knowledge" leaks in. Ask it to derive the quadratic formula by completing the square, which requires a lot of steps with accurate algebraic manipulation. It goes through some irrelevant algebraic steps, sometimes not even getting the algebraic manipulation correct, and finally (sometimes, depending on how you ask) manufactures the correct answer out of thin air as the last step. I finally got it to answer correctly, by prompting it with the initial equation and that the first step is to divide both sides by a, but it still combined a few steps, not going methodically a single step at a time as prompted.


Seems to have improved already.

Just tried the coin toss prompt and get the following

Your strategy should be to always call "heads," since it has a higher probability of landing on heads (52%) compared to tails (48%). By consistently choosing heads, you'll earn more money in the long run.

And to prove taht the square root of 2 is irrational:

The proof that the square root of 2 is irrational is a well-known proof that uses contradiction. Here's one way it can be shown:

Suppose, for the sake of contradiction, that the square root of 2 is rational, meaning that it can be expressed as a fraction of two integers, a/b (where b≠0). Then, we can square both sides of the equation a/b = √2, which gives us:

a^2/b^2 = 2

Since a/b is in its lowest form (the numerator and denominator have no common factors other than 1), it follows that a^2 and b^2 have no common factors other than 1. But then, 2 must also be a perfect square, which it is not. Therefore, our assumption that the square root of 2 is rational leads to a contradiction, and we must conclude that it is indeed an irrational number.


This "proof" is actually a pretty good manifestation of the current limitations of GPT3; it mangles together two different proofs in a way that does not make sense.

It starts by setting up for an elementary proof (a/b = sqrt(2) with a, b minimal implies 2|a^2 implies 2|a implies 4|a^2 implies 2|b^2 implies 2|b which is a contradiction). But instead of following through it finishes using another proof based on the rational root theorem, and makes up some bullshit on the way to make the argument flow nicely (it does not follow from a^2 / b^2 = 2 with no common factors that 2 is a perfect square).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: