Did you read what I wrote? I'm not making any claims about numerical performance. I'm saying there are better choices (in terms of being easy for the programmer to write and debug) for programming other aspects, like network, asynchronous coordination, etc.
Pretty much all array operations in numpy are as I understand are calling into cpp libraries for cpu and GPU operations.