How do you know it would only be a few percent? What if such a switch made your code ten times slower? Without actually reading the clang code and seeing what all the optimization passes do, I don't see any a priori reason for assuming it would be a few percent rather than 10x.