But your post makes more claims than merely computers not being able to arbitrage data structures.
In your first example, while I don't have any experience with Scala, any competent compiler for C++, Haskell, Rust, etc. will inline a function like "map", so there is no "function call overhead" whatsoever. (This example is too simple to benefit from stream fusion as such.) The resulting machine code will look fairly similar; if there is a performance difference, it would be far more subtle than your 10x, and would probably have to do with whether the map implementation in question is specialized for the Array->Array case. One that is will allocate the right number of elements up front and perhaps even skip bounds checks that the compiler might or might not be able to optimize out from the explicit version, but one that isn't will not only include bounds checks but repeatedly have to grow the result array and copy the elements so far to the new allocation.
Yes, in the conclusion I remark that a Sufficiently Smart Compiler can do all of this. I don't know that either Haskell or C++ compilers will inline your maps, but if so, that's very cool.
And yes, specializing for the Array -> Array case and eliminating bounds checks will be necessary to get real performance.
In your first example, while I don't have any experience with Scala, any competent compiler for C++, Haskell, Rust, etc. will inline a function like "map", so there is no "function call overhead" whatsoever. (This example is too simple to benefit from stream fusion as such.) The resulting machine code will look fairly similar; if there is a performance difference, it would be far more subtle than your 10x, and would probably have to do with whether the map implementation in question is specialized for the Array->Array case. One that is will allocate the right number of elements up front and perhaps even skip bounds checks that the compiler might or might not be able to optimize out from the explicit version, but one that isn't will not only include bounds checks but repeatedly have to grow the result array and copy the elements so far to the new allocation.