Except modern architectures don't have "pipelines"... at least not like ARM's pipelines are done. Modern Architectures have decoders to translate the assembly into micro-ops which are then stored into a cache (beyond the L1 cache) and then computed.
Heck, the entire concept of Intel's "Hyperthreading" kills any pipeline determinism you would hope to achieve.
Consider your typical ARM read-modify stall: add r1, r2 / add r1, r3. A stall needs to be inserted into the pipeline because r1's value will change.
In contrast, the typical Out-of-Order (OoO) x86 computer (Steamroller, Ivy Bridge, and Bay Trail) will attempt to look at future instructions to execute during the pipeline stall. In the case of Hyperthreaded CPUs (Intel i7, i3, older Atoms), Intel will grab instructions from _another thread_ to execute while the stall is undergoing.
And worse yet, as Intel and AMD upgrade their systems, sometimes there are massive architectural differences which cause regressions in some code. The infamous AMD Bulldozer (2011 chip) regressed in FPU and Vectorized code for instance, while AMD tried to up the core count and improve integer performance.
If you really want to step into this arena, just use Agner Fog's optimization manual.
But... it isn't as simple as ARM because x86 CPUs are far more complicated. Out-of-order processors just kill any determinism you have.... let alone hyperthreading or the other tricks these bigger CPUs do.
Out of order processors do not "kill" determinism. If you can guarantee your initial state is the same you will get the same result in the same number of cycles repeatably. The fact that time it takes is non-obvious does not make it non-deterministic.
Now hyperthreading does introduce real non-determinism from the standpoint of a single threaded calculation because it doesn't control the other thread using processor resources, but hyper threading introduces that sort of non-determinism even on in order processors.
Well sure, different implementations behave differently. It still has nothing to do with being OoOE...
A 486 and an original Atom are both in order implementations the i386 architecture (though both also implement other extensions), but knowing how code runs on one tells me nothing about the other, and if I just have an ISA reference the i386 without additional documentation or hardware to test on I won't be able to predict how code I write performs on either of them.
Heck, the entire concept of Intel's "Hyperthreading" kills any pipeline determinism you would hope to achieve.
Consider your typical ARM read-modify stall: add r1, r2 / add r1, r3. A stall needs to be inserted into the pipeline because r1's value will change.
In contrast, the typical Out-of-Order (OoO) x86 computer (Steamroller, Ivy Bridge, and Bay Trail) will attempt to look at future instructions to execute during the pipeline stall. In the case of Hyperthreaded CPUs (Intel i7, i3, older Atoms), Intel will grab instructions from _another thread_ to execute while the stall is undergoing.
And worse yet, as Intel and AMD upgrade their systems, sometimes there are massive architectural differences which cause regressions in some code. The infamous AMD Bulldozer (2011 chip) regressed in FPU and Vectorized code for instance, while AMD tried to up the core count and improve integer performance.
If you really want to step into this arena, just use Agner Fog's optimization manual.
http://www.agner.org/optimize/instruction_tables.pdf
But... it isn't as simple as ARM because x86 CPUs are far more complicated. Out-of-order processors just kill any determinism you have.... let alone hyperthreading or the other tricks these bigger CPUs do.