Efficient String Concatenation in Python

inglesp · on Sept 7, 2013

Note that since this article was written (2004) CPython performs an in-place optimisation for assignments of the form

    s1 += s2

There are details at point six of http://docs.python.org/2/library/stdtypes.html#sequence-type..., where it also says that str.join() is preferable.

MBCook · on Sept 8, 2013

OK, so for fun I did a ran this on my machine (MacBook Pro, 2.53GHz, 10.8.4, 8GB ram, Python 2.7.2 shipped by Apple). I had to emulate the old timing module using code from [1].

    Method 1: 0.115 seconds
    Method 2: I gave up after >120s
    Method 3: 0.265
    Method 4: 0.160
    Method 5: 0.220
    Method 6: 0.098

I ran each one a few times to make sure the times were roughly correct. Method 6 is still the fastest, but the naive method one is really close. it obviously got optimized. Actually, they're all pretty close with the obvious (and hideous) outlier of using MutableString.

EDIT:

I just remembered I have an old version of PyPy (1.8) on my Mac. Thought I'd give that a try.

    Method 1: I gave up after >120s
    Method 2: I gave up after >120s
    Method 3: 0.090 seconds
    Method 4: 0.102
    Method 5: 0.430
    Method 6: 0.102

Method one is a problem again, and method 5 (the pseudo file) is noticeably slower. Otherwise the results aren't too far off.

[1] http://effbot.org/librarybook/timing.htm

pkj · on Sept 8, 2013

Similar relative results, except that method 1 is always the fastest and even better than method 6. Ran loop count with large numbers (10 million & 30 million) to reduce measurement noise. Profiling was with cProfile on Core i3 2.53Ghz, 6 GB ram, Python 2.7.3 on Ubuntu 12.04

for 10M loop count, method 1 -> 1.599 s, method 6 -> 1.91 s

for 30M loop count,method 1 -> 4.967 s, method 6 -> 5.871 s

Summary: The KISS s1 += s2 always wins

anaphor · on Sept 7, 2013

Does PyPy have the same heuristic? If not, I wouldn't recommend relying on it.

zokier · on Sept 7, 2013

If not, I would recommend submitting a bug to PyPy.

kingkilr · on Sept 7, 2013

No it doesn't, and that heuristic isn't possible. It's based on looking at the reference counting and mutating an immutable object. It's pretty awful.

ewoodrich · on Sept 8, 2013

If GP is referring to CPython patch #980695, it appears to be available in PyPy, but requires an additional flag.

"We have it, just not enabled by default. --objspace-with-strbuf I think" [1]

[1] https://mail.python.org/pipermail/python-dev/2013-February/1...

1st1 · on Sept 7, 2013

A very old article from 2004 (please edit the title). The tests are only for cpython2 and for a very old one.

kingkilr · on Sept 7, 2013

If you really care about efficiency (and are using PyPy), an actually efficient way to concat strings:

    from __pypy__.builders import StringBuilder
    builder = StringBuilder(<size estimate, optional>)
    for i in xrange(10000):
        builder.append(str(i))
    return builder.build()

zeeboo · on Sept 7, 2013

Note these tests were run on Python 2.2 which makes the results old enough to almost certainly be invalid. Last time I checked, straight string concatenation was faster than everything else for the types of inputs I was dealing with.

pdonis · on Sept 7, 2013

straight string concatenation was faster than everything else

As I understand it, this is implementation dependent. CPython has gone to some lengths to optimize its string concatenation code to do things behind the scenes that are basically equivalent to some of the faster methods given in the article. Other Python implementations have not necessarily done that, so string concatenation can be a lot slower.

kstrauser · on Sept 7, 2013

How would a method 7 using generator expressions fare on the same system, like:

    return ''.join(`num` for num in xrange(loop_count))

On one hand, it avoids creating a temporary list in memory. On the other, it can't know in advance how long the final output of the loop will be and so couldn't use tricks like preallocating enough RAM.

gamegoblin · on Sept 7, 2013

Interestingly enough, this is pretty much the example given for the timemit module:

http://docs.python.org/2/library/timeit.html

Results:

    $ python -m timeit '"-".join(str(n) for n in range(100))'

    10000 loops, best of 3: 40.3 usec per loop

    $ python -m timeit '"-".join([str(n) for n in range(100)])'

    10000 loops, best of 3: 33.4 usec per loop

    $ python -m timeit '"-".join(map(str, range(100)))'

    10000 loops, best of 3: 25.2 usec per loop

dbarlett · on Sept 7, 2013

I just ran those with Python 2.7 (ASCII and Unicode) and Python 3.2 (Unicode). "-".join(map(str, range(100))) was always the fastest, with ASCII (2.7) < Unicode (3.2) < Unicode (2.7). Makes sense, but worth keeping in mind when porting code from Python 2 to 3.

Full results: https://gist.github.com/dbarlett/6479378

idank · on Sept 7, 2013

Does join scan the entire input to know the exact size and allocate that at the start? Doesn't it do the usual thing for resizing and that is to double the capacity? A generator will save a temporary list, it will however result in a lot of overhead for the generator switches. It might be beneficial if the input is really big.

pdonis · on Sept 7, 2013

Does join scan the entire input to know the exact size and allocate that at the start?

No, because that would make the algorithm quadratic again (one loop for the scan, second loop for the concatenation), and the whole point of the join idiom, as recommended by the Python documentation, is to avoid a quadratic algorithm.

thomasahle · on Sept 7, 2013

Two loops are still linear as long as they are not nested ;-)

pdonis · on Sept 7, 2013

Oops, yes, good point. :)

gamegoblin · on Sept 7, 2013

I believe that the current most efficient way of doing this is

    ''.join(map(str, range(n)))

pdonis · on Sept 7, 2013

In Python 2, itertools.imap will be faster since it doesn't realize a list. Also, if you are memory constrained, xrange doesn't realize a list either, whereas range does (but the article notes that range was slightly faster than xrange in his tests--I don't know if that's still true in Python 2, the article is 9 years old).

In Python 3, the map builtin is basically equivalent what itertools.imap was in Python 2, so it is the best choice. (Also, the range builtin in Python 3 no longer realizes a list.)

pjscott · on Sept 8, 2013

I tried it with n=1000 and n=10000, and found that using xrange made it slightly faster, as did using itertools.imap, but the difference was pretty small.

(Here I'm using the stock Python 2.7.1 on a 2011 MacBook Pro.)

pdonis · on Sept 8, 2013

using xrange made it slightly faster

That's what I would expect, since xrange works like a generator; but the internals of its implementation in CPython back when the article ran its original tests were evidently less efficient than they are now.

the difference was pretty small

I suspect that's because the strings being concatenated are really small (the largest will only be 5 bytes for n=10000). I would expect the difference to get bigger as the individual strings get larger.

acjohnson55 · on Sept 8, 2013

How have I never heard of backtick quotes? Apparently they are an abomination [1], so I will promptly force-forget them.

[1] http://stackoverflow.com/a/7490311/807674

dbecker · on Sept 7, 2013

Doing stuff efficiently is great when there's some benefit from efficiency. But I've seen a lot of fragile code (and implementation specific hacks) created based on benchmarks, when the time savings were meaningless in practice.

mappu · on Sept 7, 2013

For comparison, PHP has mutable strings. Some totally unscientific averaged benchmarks (Atom D2700, PHP 5.4.4-14 amd64):

    $buff = '';
    for ($i = 0; $i != 100000; ++$i) {
            $buff .= $i;
    }

Runs in about 0.06s

    $buff = array();
    for ($i = 0; $i != 100000; ++$i) {
            $buff[] = $i;
    }
    $ret = implode('', $buff);

Runs slower, in about 0.10s

    $ret = implode('', range(0, 100000));

Takes roughly the same time, 0.10s

Sami_Lehtinen · on Sept 8, 2013

How about using Python 3.3.2?

I did many kind of performance tests when implementing my PyClockPro caching lib, even if I don't use strings in it.