The wiki is wrong or at least misleading. Branch prediction is a form of speculative execution.
Even 486 (and possibly 386) had branch prediction (although a trivial one).
P6 was a huge deal because it added out of order execution.
Edit: I guess it is a matter of semantics: in the classical 5 stage in order RISC, instructions after the speculated branch are fetched, decoded, etc, but they won't reach the execution stage before the speculation is resolved, so only the branch is technically "executed" speculatively at the fetch stage. So there is less state to unwind, compared to a true OoO machine that can run ahead.
according to this thing I found [0] the 486 has a predictor for the branch not taken. Not sure what that means but it looks like it's mostly for the instruction fetch/decode based on other notes. Ironically sounds close to your 5 stage RISC description, extra ironic because 486 is 5 stages too...
P5 has a more 'real' BP but also had the U/V Pipelining.
But yes as far as OoO goes, I think for x86 it was Nx586 first, then PPro (P6), then Cyrix 6x86 [1] and AMD K6 (courtesy of Nextgen tech) the next year.
As far as I know 486 predicted all branches as not taken, so it would fetch and decide instructions after a branche and throw that away when the speculation was proven wrong.
I guess calling this speculative execution is a stretch.
Even 486 (and possibly 386) had branch prediction (although a trivial one).
P6 was a huge deal because it added out of order execution.
Edit: I guess it is a matter of semantics: in the classical 5 stage in order RISC, instructions after the speculated branch are fetched, decoded, etc, but they won't reach the execution stage before the speculation is resolved, so only the branch is technically "executed" speculatively at the fetch stage. So there is less state to unwind, compared to a true OoO machine that can run ahead.