Last I checked, accessing global variables is slow in python because it goes through a hash table (or something) on every lookup. On the other hand, the interpreter is able to optimize accesses to stuff on the stack.
I suspect wrapping the benchmark code (and all the variables) in a method, and then invoking that once would substantially improve the raw python numbers.
By the time I figured this out, I’d already written a pile of python code that was mysteriously >10x slower than it would be in perl 5.
The while loop looks up i, n, z, x, y in every iteration.
The for loop looks up i, z, x and y in every iteration.
The list comprehension looks up i, x and y in every iteration.
The numpy one doesn't look up anything.
If you check the proportions you'll find this is compatible with this guy's data.