The fence instruction that Intel recommends (lfence) is way slower than the techniques described here. We measured a 5x slowdown on Web Assembly trying to use it.
Also we have been working on these mitigations since well before Intel made their suggestion.
Max() is not a CPU instruction. It's an abstraction, a function that could be implemented either using a branch (which defeats the whole purpose) or, on some architectures, with something like cmovXX.
Perhaps they wanted a solution that works on Arm, which I think doesn't have cmovXX. Or maybe Intel does speculation with cmovXX used on array index.
Also we have been working on these mitigations since well before Intel made their suggestion.