In CPython, a list comprehension looks like this internally: dis.dis(lambda y: [...

In CPython, a list comprehension looks like this internally:

     dis.dis(lambda y: [x + 1 for x in y])
        1    0 LOAD_CONST               1 (<code object <listcomp> at 0x106f441e0, file "<ipython-input-2-1f82f6afa2d5>", line 1>)
              2 LOAD_CONST               2 ('<lambda>.<locals>.<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_FAST                0 (y)
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

    Disassembly of <code object <listcomp> at 0x106f441e0, file "<ipython-input-2-1f82f6afa2d5>", line 1>:
       1      0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                12 (to 18)
              6 STORE_FAST               1 (x)
              8 LOAD_FAST                1 (x)
             10 LOAD_CONST               0 (1)
             12 BINARY_ADD
             14 LIST_APPEND              2
             16 JUMP_ABSOLUTE            4
        >>   18 RETURN_VALUE

You get very similar disassembled code to this:

    def foo(y):
        def bar():
            l = []
            a = l.append
            for x in y:
                a(x + 1)
            return l
        return bar()

Obviously you can't use the `LIST_APPEND` primitive in a for loop. Also, note that CPython doesn't try to preallocate the list. `l = [None] * len(y)` and then iterating over a range might beat a comprehension for long lists.

The performance story with numba and numpy is basically that you have a boundary between Python and their internal representation of data, and crossing that boundary is slow.

For instance, in numpy, even if you specify a dtype, `np.array` must check and cast every element to that type. Likewise, `tolist` must construct new python objects for every value in the array.

Once you're dealing with ndarrays, though, operations are fast as the array has a single type and the code is heavily optimized. But iterating over ndarrays from python tends to be worse than native Python containers.

Numba is a mixed bag. It's no worse than numpy at interpreting ndarrays, and bad with Python containers. It has to inspect arguments to call a JITted function, and this can be far more surprising than numpy.

But if you can combine many JITted functions, you can combine a lot of logic and completely avoid introspection in a way you can't do with numpy alone. The difficulty of doing that is simply that it's having to reimplement many of the numpy features in an early-binding language.