> really dumb [...] take over all our jobs Perhaps worse than the vacillation be...

> really dumb [...] take over all our jobs

Perhaps worse than the vacillation between getting terrible answers and great answers: When you simply can't tell which kind of answer it is, not until you've sunk a bunch of effort validating or implementing it. (Perhaps finding that the system invented some core fake APIs, non-existent citations, or algebra errors.)

Almost an echo of P/NP categorizations: It's tough when the effort of fully verifying a proposed answer is too close to the effort of just solving it normally.