Hacker News new | past | comments | ask | show | jobs | submit login

You need to predict the target (with BTB) if want zero-bubble calls (1 cycle latency).

Otherwise it takes 3 cycles (on the P550) to take a direct call. Doesn't matter that it's unconditional, it can't see the call until after decoding. Sure, it's only two extra cycles, but that's a full six instructions on this small 3-wide core.

And an indirect call? Even if the target is known, it's going to need to fetch it from the register file, which requires going through rename. And unless you put a fast-path in with an extra register-file read-port, you need to go through dispatch and the scheduler too. Probably takes 6-10 cycles to take an indirect branch without a BTB.

On bigger designs it's even more essential to predict unconditional branches, as their instruction caches take multiple cycles, and then there are quite substantial queues between fetch and decode, and between predict and fetch.






Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: