Hacker News new | past | comments | ask | show | jobs | submit login
Efficient String Concatenation in Python (skymind.com)
78 points by _fsbf on Sept 7, 2013 | hide | past | favorite | 26 comments



Note that since this article was written (2004) CPython performs an in-place optimisation for assignments of the form

    s1 += s2
There are details at point six of http://docs.python.org/2/library/stdtypes.html#sequence-type..., where it also says that str.join() is preferable.


OK, so for fun I did a ran this on my machine (MacBook Pro, 2.53GHz, 10.8.4, 8GB ram, Python 2.7.2 shipped by Apple). I had to emulate the old timing module using code from [1].

    Method 1: 0.115 seconds
    Method 2: I gave up after >120s
    Method 3: 0.265
    Method 4: 0.160
    Method 5: 0.220
    Method 6: 0.098
I ran each one a few times to make sure the times were roughly correct. Method 6 is still the fastest, but the naive method one is really close. it obviously got optimized. Actually, they're all pretty close with the obvious (and hideous) outlier of using MutableString.

EDIT:

I just remembered I have an old version of PyPy (1.8) on my Mac. Thought I'd give that a try.

    Method 1: I gave up after >120s
    Method 2: I gave up after >120s
    Method 3: 0.090 seconds
    Method 4: 0.102
    Method 5: 0.430
    Method 6: 0.102
Method one is a problem again, and method 5 (the pseudo file) is noticeably slower. Otherwise the results aren't too far off.

[1] http://effbot.org/librarybook/timing.htm


Similar relative results, except that method 1 is always the fastest and even better than method 6. Ran loop count with large numbers (10 million & 30 million) to reduce measurement noise. Profiling was with cProfile on Core i3 2.53Ghz, 6 GB ram, Python 2.7.3 on Ubuntu 12.04

for 10M loop count, method 1 -> 1.599 s, method 6 -> 1.91 s

for 30M loop count,method 1 -> 4.967 s, method 6 -> 5.871 s

Summary: The KISS s1 += s2 always wins


Does PyPy have the same heuristic? If not, I wouldn't recommend relying on it.


If not, I would recommend submitting a bug to PyPy.


No it doesn't, and that heuristic isn't possible. It's based on looking at the reference counting and mutating an immutable object. It's pretty awful.


If GP is referring to CPython patch #980695, it appears to be available in PyPy, but requires an additional flag.

"We have it, just not enabled by default. --objspace-with-strbuf I think" [1]

[1] https://mail.python.org/pipermail/python-dev/2013-February/1...


A very old article from 2004 (please edit the title). The tests are only for cpython2 and for a very old one.


If you really care about efficiency (and are using PyPy), an actually efficient way to concat strings:

    from __pypy__.builders import StringBuilder
    builder = StringBuilder(<size estimate, optional>)
    for i in xrange(10000):
        builder.append(str(i))
    return builder.build()


Note these tests were run on Python 2.2 which makes the results old enough to almost certainly be invalid. Last time I checked, straight string concatenation was faster than everything else for the types of inputs I was dealing with.


straight string concatenation was faster than everything else

As I understand it, this is implementation dependent. CPython has gone to some lengths to optimize its string concatenation code to do things behind the scenes that are basically equivalent to some of the faster methods given in the article. Other Python implementations have not necessarily done that, so string concatenation can be a lot slower.


How would a method 7 using generator expressions fare on the same system, like:

    return ''.join(`num` for num in xrange(loop_count))
On one hand, it avoids creating a temporary list in memory. On the other, it can't know in advance how long the final output of the loop will be and so couldn't use tricks like preallocating enough RAM.


Interestingly enough, this is pretty much the example given for the timemit module:

http://docs.python.org/2/library/timeit.html

Results:

    $ python -m timeit '"-".join(str(n) for n in range(100))'

    10000 loops, best of 3: 40.3 usec per loop

    $ python -m timeit '"-".join([str(n) for n in range(100)])'

    10000 loops, best of 3: 33.4 usec per loop

    $ python -m timeit '"-".join(map(str, range(100)))'

    10000 loops, best of 3: 25.2 usec per loop


I just ran those with Python 2.7 (ASCII and Unicode) and Python 3.2 (Unicode). "-".join(map(str, range(100))) was always the fastest, with ASCII (2.7) < Unicode (3.2) < Unicode (2.7). Makes sense, but worth keeping in mind when porting code from Python 2 to 3.

Full results: https://gist.github.com/dbarlett/6479378


Does join scan the entire input to know the exact size and allocate that at the start? Doesn't it do the usual thing for resizing and that is to double the capacity? A generator will save a temporary list, it will however result in a lot of overhead for the generator switches. It might be beneficial if the input is really big.


Does join scan the entire input to know the exact size and allocate that at the start?

No, because that would make the algorithm quadratic again (one loop for the scan, second loop for the concatenation), and the whole point of the join idiom, as recommended by the Python documentation, is to avoid a quadratic algorithm.


Two loops are still linear as long as they are not nested ;-)


Oops, yes, good point. :)


I believe that the current most efficient way of doing this is

    ''.join(map(str, range(n)))


In Python 2, itertools.imap will be faster since it doesn't realize a list. Also, if you are memory constrained, xrange doesn't realize a list either, whereas range does (but the article notes that range was slightly faster than xrange in his tests--I don't know if that's still true in Python 2, the article is 9 years old).

In Python 3, the map builtin is basically equivalent what itertools.imap was in Python 2, so it is the best choice. (Also, the range builtin in Python 3 no longer realizes a list.)


I tried it with n=1000 and n=10000, and found that using xrange made it slightly faster, as did using itertools.imap, but the difference was pretty small.

(Here I'm using the stock Python 2.7.1 on a 2011 MacBook Pro.)


using xrange made it slightly faster

That's what I would expect, since xrange works like a generator; but the internals of its implementation in CPython back when the article ran its original tests were evidently less efficient than they are now.

the difference was pretty small

I suspect that's because the strings being concatenated are really small (the largest will only be 5 bytes for n=10000). I would expect the difference to get bigger as the individual strings get larger.


How have I never heard of backtick quotes? Apparently they are an abomination [1], so I will promptly force-forget them.

[1] http://stackoverflow.com/a/7490311/807674


Doing stuff efficiently is great when there's some benefit from efficiency. But I've seen a lot of fragile code (and implementation specific hacks) created based on benchmarks, when the time savings were meaningless in practice.


For comparison, PHP has mutable strings. Some totally unscientific averaged benchmarks (Atom D2700, PHP 5.4.4-14 amd64):

    $buff = '';
    for ($i = 0; $i != 100000; ++$i) {
            $buff .= $i;
    }
Runs in about 0.06s

    $buff = array();
    for ($i = 0; $i != 100000; ++$i) {
            $buff[] = $i;
    }
    $ret = implode('', $buff);
Runs slower, in about 0.10s

    $ret = implode('', range(0, 100000));
Takes roughly the same time, 0.10s


How about using Python 3.3.2?

I did many kind of performance tests when implementing my PyClockPro caching lib, even if I don't use strings in it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: