Roughly I'd agree, although I don't have hard numbers, and I'd say GPT-4 in 2023...

dwohnitmok · 2025-11-27T06:04:58 1764223498

> I'd say GPT-4 in 2023 vs GPT-3 as the last major "wow" release from a purely-model perspective. But they've also gotten a lot faster, which has its own value. And the tooling around them has gotten MASSIVELY better

Tooling vs model is a false dichotomy in this case. The massive improvements in tooling are directly traceable back to massive improvements in the models.

If you took the same tooling and scaffolding and stuck GPT-3 or even GPT-4 in it, they would fail miserably and from the outside the tooling would look abysmal, because all of the affordances of current tooling come directly from model capability.

All of the tooling approaches of modern systems were proposed and prototypes were made back in 2020 and 2021 with GPT-3. They just sucked because the models sucked.

The massive leap in tooling quality directly reflects a concomitant leap in model quality.

azinman2 · 2025-11-27T01:48:12 1764208092

How do you avoid overfitting with the automated prompts? It seems to add lots of exceptions from what I've seen in the past versus generalize as much as a human would.

adastra22 · 2025-11-27T01:53:21 1764208401

Ask the agent "Is this over-fitting?"

I'm not joking.