> Alternatively: do you know how much your computer could do in a second, but is...

CydeWeys · on Oct 25, 2015

Presumably they passed flags to gcc (or whichever compiler they used) to disable these optimizations, otherwise, you're right, they wouldn't make much sense if the code wasn't actually being run as presented.

kbenson · on Oct 25, 2015

Interestingly, there's quite a bit that's needed to ensure you bypass some compiler optimizations. Here's an ey-opening (for me) talk[1] on benchmarking C++ and compilers which touched on it.

1: https://www.youtube.com/watch?v=nXaxk27zwlk

radicalbyte · on Oct 25, 2015

I had the same problem: of course an empty loop will run at the maximum because any worthwhile compiler would optimize the loop away.

logicallee · on Oct 25, 2015

The real head-scratcher is why the Python version, whose loop is:

  def f(NUMBER):
      for _ in xrange(NUMBER):
          pass

runs one tenth as fast! Is it due to the xrange construct? I'm sure that there must be a saner way to make a loop in Python that doesn't bloat to ten times as many cycles as in C. . .

ProblemFactory · on Oct 25, 2015

It runs at 1/10 speed because it's interpreted bytecode, and it allocates a new PyInt object for each number in the loop.

You can get it back close to native speed by adding a type annotation and compiling the python file with Cython: http://docs.cython.org/src/userguide/language_basics.html#in...

    def f(NUMBER):
        cdef int i
        for i in range(NUMBER):
            pass

andreasvc · on Oct 25, 2015

Not just close to native speed, this will get you the exact same C code for the inner loop.

mulander · on Oct 25, 2015

I wonder if [x for x in xrange(NUMBER)] would be any faster. Yes it allocates an array but list comprehension is executed differently by python. I recall a talk by some Dropbox guy stating that it was one of their optimization strategies.

Didn't find the original talk but found this: https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Loo...

andreasvc · on Oct 25, 2015

Nitpick: that creates a list, not an array. It could be faster, but it's not the same benchmark. When you get to the point where the speed of loops matters it's better to reach for Cython, Numba, PyPy, etc.

dalke · on Oct 25, 2015

Why wonder when you can test?

    % python -mtimeit '[x for x in xrange(1000)]'
    10000 loops, best of 3: 70.2 usec per loop
    % python -mtimeit 'for x in xrange(1000): pass'
    10000 loops, best of 3: 29 usec per loop

    % python -mtimeit '[x for x in xrange(100000)]'
    100 loops, best of 3: 7.03 msec per loop
    % python -mtimeit 'for x in xrange(100000): pass'
    100 loops, best of 3: 2.5 msec per loop

reubenmorais · on Oct 25, 2015

10x is about what you'd expect for the interpreter overhead in a simple snippet like that.

cma · on Oct 25, 2015

Unfortunately python doesn't use a tagged pointers optimization even though I believe pointers are always aligned so there are some spare bits that could be used to indicate pointer vs. int.

I think the only reason they haven't is the legacy C interface; it's a shame they didn't do it for Python 3, but I guess it could have delayed conversion of c libraries even more.

andreasvc · on Oct 25, 2015

Specifically, an interpreter without JIT. I'd reckon that LuaJIT or Javascript could come much closer to C.

warbiscuit · on Oct 25, 2015

For me, running their sample script using the PyPy JIT resulted in a 12x speedup compared to cpython.

Though that still wasn't as fast as gcc on my system (core i5 3.4ghz; using 1e8 rounds):

* cpython 2.7.4 - 1.131s

* pypy 2.6.0 - 0.090s

* gcc w/ "-O2" - 0.050s

reubenmorais · on Oct 25, 2015

Right, those are usually called JIT compilers, not interpreters.

andreasvc · on Oct 26, 2015

The distinction is not all that useful, and is blurred nowadays. You could say that CPython compiles code to bytecode, but the bytecode is then interpreted; however, none of that matters, because it's doing very little optimization and that means it's slow. Javascript is typically also not compiled ahead-of-time to native code, but it can still be very fast because of JIT; at the same time it's still very much a dynamic language which makes it share features with interpreted languages.

I'd say JIT is neither about compilation nor interpretation, but about optimization.