OK, so for fun I did a ran this on my machine (MacBook Pro, 2.53GHz, 10.8.4, 8GB ram, Python 2.7.2 shipped by Apple). I had to emulate the old timing module using code from [1].
Method 1: 0.115 seconds
Method 2: I gave up after >120s
Method 3: 0.265
Method 4: 0.160
Method 5: 0.220
Method 6: 0.098
I ran each one a few times to make sure the times were roughly correct. Method 6 is still the fastest, but the naive method one is really close. it obviously got optimized. Actually, they're all pretty close with the obvious (and hideous) outlier of using MutableString.
EDIT:
I just remembered I have an old version of PyPy (1.8) on my Mac. Thought I'd give that a try.
Method 1: I gave up after >120s
Method 2: I gave up after >120s
Method 3: 0.090 seconds
Method 4: 0.102
Method 5: 0.430
Method 6: 0.102
Method one is a problem again, and method 5 (the pseudo file) is noticeably slower. Otherwise the results aren't too far off.
Similar relative results, except that method 1 is always the fastest and even better than method 6. Ran loop count with large numbers (10 million & 30 million) to reduce measurement noise. Profiling was with cProfile on Core i3 2.53Ghz, 6 GB ram, Python 2.7.3 on Ubuntu 12.04
for 10M loop count, method 1 -> 1.599 s, method 6 -> 1.91 s
for 30M loop count,method 1 -> 4.967 s, method 6 -> 5.871 s
If you really care about efficiency (and are using PyPy), an actually efficient way to concat strings:
from __pypy__.builders import StringBuilder
builder = StringBuilder(<size estimate, optional>)
for i in xrange(10000):
builder.append(str(i))
return builder.build()
Note these tests were run on Python 2.2 which makes the results old enough to almost certainly be invalid. Last time I checked, straight string concatenation was faster than everything else for the types of inputs I was dealing with.
straight string concatenation was faster than everything else
As I understand it, this is implementation dependent. CPython has gone to some lengths to optimize its string concatenation code to do things behind the scenes that are basically equivalent to some of the faster methods given in the article. Other Python implementations have not necessarily done that, so string concatenation can be a lot slower.
How would a method 7 using generator expressions fare on the same system, like:
return ''.join(`num` for num in xrange(loop_count))
On one hand, it avoids creating a temporary list in memory. On the other, it can't know in advance how long the final output of the loop will be and so couldn't use tricks like preallocating enough RAM.
$ python -m timeit '"-".join(str(n) for n in range(100))'
10000 loops, best of 3: 40.3 usec per loop
$ python -m timeit '"-".join([str(n) for n in range(100)])'
10000 loops, best of 3: 33.4 usec per loop
$ python -m timeit '"-".join(map(str, range(100)))'
10000 loops, best of 3: 25.2 usec per loop
I just ran those with Python 2.7 (ASCII and Unicode) and Python 3.2 (Unicode). "-".join(map(str, range(100))) was always the fastest, with ASCII (2.7) < Unicode (3.2) < Unicode (2.7). Makes sense, but worth keeping in mind when porting code from Python 2 to 3.
Does join scan the entire input to know the exact size and allocate that at the start? Doesn't it do the usual thing for resizing and that is to double the capacity? A generator will save a temporary list, it will however result in a lot of overhead for the generator switches. It might be beneficial if the input is really big.
Does join scan the entire input to know the exact size and allocate that at the start?
No, because that would make the algorithm quadratic again (one loop for the scan, second loop for the concatenation), and the whole point of the join idiom, as recommended by the Python documentation, is to avoid a quadratic algorithm.
In Python 2, itertools.imap will be faster since it doesn't realize a list. Also, if you are memory constrained, xrange doesn't realize a list either, whereas range does (but the article notes that range was slightly faster than xrange in his tests--I don't know if that's still true in Python 2, the article is 9 years old).
In Python 3, the map builtin is basically equivalent what itertools.imap was in Python 2, so it is the best choice. (Also, the range builtin in Python 3 no longer realizes a list.)
I tried it with n=1000 and n=10000, and found that using xrange made it slightly faster, as did using itertools.imap, but the difference was pretty small.
(Here I'm using the stock Python 2.7.1 on a 2011 MacBook Pro.)
That's what I would expect, since xrange works like a generator; but the internals of its implementation in CPython back when the article ran its original tests were evidently less efficient than they are now.
the difference was pretty small
I suspect that's because the strings being concatenated are really small (the largest will only be 5 bytes for n=10000). I would expect the difference to get bigger as the individual strings get larger.
Doing stuff efficiently is great when there's some benefit from efficiency. But I've seen a lot of fragile code (and implementation specific hacks) created based on benchmarks, when the time savings were meaningless in practice.