So why isn't the compiler smart enough to see that the 'optimised' version is the same?
Surely it understands "step()" and can optimize the "step()=0.0" and "step()==1.0" cases separately?
This is presumably always worth it, because you would at least remove one multiplication (usually turning it into a conditional load/store/something else)
It may very well be, it is the type of optimisation where it is quite possible that some compilers may do it some of the time, but it is definitely also possible to write a version that the compiler can't grok.
The other part of the optimization issue is that you can't take to long to try anything and everything. Most of the optimizations happen on the driver side, and anything that takes to long will show up as shader compilation stutter. I can't say currently if this is or isn't done, it's just always something you have to think about.
Surely it understands "step()" and can optimize the "step()=0.0" and "step()==1.0" cases separately?
This is presumably always worth it, because you would at least remove one multiplication (usually turning it into a conditional load/store/something else)