Just the straight/naive rewrite was ~3 times faster for my benchmark (which was running the program on the real dataset) and then I went down the rabbit hole and optimized it further and ended up ~5 times faster. Then slapped Rayon on top and got another ~2-3x depending on the number of cores and disk speed (the problem wasn't embarrassingly parallel, but still got a nice speedup).
Of course, all of this was mostly unneeded, but I just wanted to find out what am I getting myself into, and I was very happy with the result. My move to Rust was mostly not because of speed, but I still needed a fast language (where OCaml qualifies). This was also before the days of multicore OCaml, so nowadays it would matter even less.
How much of that do you think comes from reduced allocations/indirections? Now I really want to try out OxCaml and see if I can approximate this speedup by picking up low hanging fruits.
Of course, all of this was mostly unneeded, but I just wanted to find out what am I getting myself into, and I was very happy with the result. My move to Rust was mostly not because of speed, but I still needed a fast language (where OCaml qualifies). This was also before the days of multicore OCaml, so nowadays it would matter even less.