I think what the GP is arguing is that if you want to be really fast you should forget using Java arrays and just wrap BLAS, LAPACK, etc., which are written in "close to the metal" languages, optimized within an inch of their life, have been around for decades, and are used by others with similar goals (numpy/scipy, etc.). As the GP says, Java libraries that already do this are probably available, so this may be a pretty trivial task.
I don't know why no one has answered this (it is brought up a few places in this thread) but if I had to guess why they didn't want to go this route I would say it's the trade-off of not not having your data be native. They presumably have a somewhat highly involved pipeline/topology of computations that data flows through. In the interests of good readable and maintainable code, having a nice declarative data representation is a big plus, and doing the computation with native Java data structures is apparently fast enough for their needs.