> In C++ with operator overloading it's easy to do stupid things like in-place addition (e.g. operator+= for a vec4f) or start transposing matrices in place. This will make the compiler emit memory load/store instructions when you'd want to have these values in registers.
That sounds like a failure of inlining. If you're at the point where a single stack spill makes a difference, you won't want to pay the cost of the calling convention spills either. And if you are inlining, then SROA and mem2reg will easily remove those load/store instructions. Modern compiler optimizations make what you describe not a problem anymore.
> Modern compiler optimizations make what you describe not a problem anymore.
Yes, it should get inlined, but I've seen this fail in a recent-ish GCC (4.6 to 4.8 or so). And there's still the issue of debug builds being slower even if the compiler works perfectly in optimized builds.
Operator overloading will only work with SIMD if you write your vector class using intrinsics or SIMD extensions anyway. If you're doing scalar loads and stores, you can't rely on getting SIMD instructions in the output.
> Yes, it should get inlined, but I've seen this fail in a recent-ish GCC (4.6 to 4.8 or so)
Seems like a pretty bad GCC bug then, one that should be fixed upstream.
I really dislike it when code avoids functions because of fear that they won't be inlined (or to try to work around compiler bugs to that effect), because doing this dramatically reduces code maintainability and safety in exchange for very little benefit, given the inline hint keyword and __attribute__((always_inline)).
GCC's inlining has been a bit brittle, especially when dealing with vector arguments, and also depending on the ABI (4x double vectors without AVX, etc). It's much better in GCC 5.x now.
I generally use always_inline for vector arithmetic functions, just to be sure. You never want to have a function call to do just a few SIMD instructions.
That sounds like a failure of inlining. If you're at the point where a single stack spill makes a difference, you won't want to pay the cost of the calling convention spills either. And if you are inlining, then SROA and mem2reg will easily remove those load/store instructions. Modern compiler optimizations make what you describe not a problem anymore.