I use fast RNGs for kernel projections, FHT-accelerated JL transforms, and data generation for numerical experiments. I don’t need cryptographic security for these purposes.
And if you generate floating point numbers maybe you don’t have to worry about lsbs alone either? The “failed” tests drop away the bits that matter the most when floating point randomness is constructed.
I don’t know the details on which bits matter in the ziggurat algorithm, which is the one I use. Is this the case for all floating point random number generators?
The PRNGs you mentioned before all generate integers. The conversion from the integer PRNG to the floating point, and the conversion from one integer range to another range needed in the ziggurat algorithm both need to be done "right" to give correct results (I can imagine that even using splitmix64 the wrong implementation could be programmed by somebody not knowing what has to be done), so if you aren't sure about these steps you should surely check their quality yourself. If these are done right, I'd personally expect xoroshiro128+ could be "good enough" even when having that specific weakness (poorer quality when using only lsb bits) that you worried about. It's important, of course, not to drop the highest bits away.
And on another side, 2om3r questions the speed measurements of the author, and I think he has a point: the speed measurements should be made in the context of the real use, otherwise the compilers are able to "cheat" (optimize pieces of the code away) if the example is too simple.