In Python 2.7, shouldn't you be using xrange() rather than range()? xrange() is a generator whereas range() will actually create the entire list and iterate it.
In case anyone isn't aware: in Python 3, range()'s implementation was effectively replaced with that of xrange().
For me, range (the posted source) took 95 seconds, and xrange was only a little better at 85 seconds.
I think most of the benefit of xrange comes from the decreased memory usage, not from lower CPU usage. But xrange is definitely closer to what the other code is doing.
In case anyone isn't aware: in Python 3, range()'s implementation was effectively replaced with that of xrange().