Would be interested in seeing the profile. Re-ordering it to the following (which the compiler might do too)
low&high&1 + low>>1 + high>>1
Should compute in a single cycle if the register coloring is working in the pipeline. The <reg> >> 1 come out of the barrel shifter stage, the low&high&1 resolves in the load, so you end up with a single sum of three operands. Since its being stored in a separate register that would avoid a write stall in the pipeline as well.
Gimmie a sec; I've got bugger all to do at work. I'll compile it and profile it.