Hacker News new | past | comments | ask | show | jobs | submit login

Which has to be done after every instruction (http://boston.conman.org/2015/09/05.2) but it quite slow. Using a conditional jump after each instruction is faster than using INTO (http://boston.conman.org/2015/09/07.1).



My guess would be a pipelining issue where `INTO` isn't treated as a `Jcc`, but as an `INT` (mainly because it is an interrupt). Agner Fog's instruction tables[0] show (for the Pentium 4) `Jcc` takes one uOP with a throughput of 2-4. `INTO`, OTOH, when not taken uses four uOPs with a throughput of 18! Zen 3 is much better with a throughput of 2, but that's still worse than `JO raiseINTO`.

[0]: https://www.agner.org/optimize/instruction_tables.pdf


It's more complicated than shows up in micro benchmarks like that. Since when you do it, it's pretty much every add, you end up polluting your branch predictor by using jo instructions everywhere and it can lead to worse overall perf.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: