A few of the comments in this thread seem to be misusing mathematics in order to lend more credence to themselves. At the risk of responding to low quality flamebait here are some problems with your statements.
1. P = NP refers to a two very specific sets of problems (which might actually be the same set) not any general question. There are problems that we know don't fall into P or NP, (for example the Halting Problem). Also whether or not P=NP is an open question almost the opposite of a fact.
2. You claim:
"Evaluating if an answer is correct or not is easier than coming up with a correct answer from scratch." This is the right idea but not quite correct.
The correct statement is as follows:
"Evaluating if an answer is correct is not harder than the difficulty of coming up with a correct answer from scratch."
This is because evaluating some answer can still be just as hard as the original problem. In fact sometimes it's uncomputable (if the original problem is also uncomputable). To use an example from above consider the question: "Does a program x halt?" If I tell you "no" it could be impossible to verify my answer unless you have solved the halting problem.
To bring this back to reality, again if GPT-4 is wrong about some complex medical question it doesn't mean it's mathematically easier to figure that out than solving the problem from scratch.
1. P = NP refers to a two very specific sets of problems (which might actually be the same set) not any general question. There are problems that we know don't fall into P or NP, (for example the Halting Problem). Also whether or not P=NP is an open question almost the opposite of a fact.
2. You claim: "Evaluating if an answer is correct or not is easier than coming up with a correct answer from scratch." This is the right idea but not quite correct.
The correct statement is as follows: "Evaluating if an answer is correct is not harder than the difficulty of coming up with a correct answer from scratch."
This is because evaluating some answer can still be just as hard as the original problem. In fact sometimes it's uncomputable (if the original problem is also uncomputable). To use an example from above consider the question: "Does a program x halt?" If I tell you "no" it could be impossible to verify my answer unless you have solved the halting problem.
To bring this back to reality, again if GPT-4 is wrong about some complex medical question it doesn't mean it's mathematically easier to figure that out than solving the problem from scratch.