I used to use Atlas quite a bit, which effectively benchmarks your CPU and memory configuration at installation time to empirically determine where to switch from 'small n' algorithms to asymptomatically optimal algorithms, like Strassen.
Of course, if you only some resources into optimizing the cubic time version, Strassen will look slow by comparison. Better we take a serious look at both and empirically figure out where the trade off point is...
For small, dense n you're not going to beat the cubic algorithm. This is the most common case in a lot of applications.
For sparse n, use a sparse version of the cubic algorithm. For large dense n, those sub-cubic algorithms might be better. Here n would have to be so large that you'd have to factor in the cost of doing it out of core.