Hacker News new | past | comments | ask | show | jobs | submit login
Python: Faster Way (uni.me)
144 points by kozlovsky on Dec 30, 2014 | hide | past | favorite | 64 comments



And this, ladies and gentlemen, is how you not make speed comparisons.

#7 surprised me: why would 'if a:' be slower than 'if not a:'? So I ran the code myself, and ran it a few times. And lo and behold, the third digit varies greatly between runs. And the timing between the first and second variant only differ in the third digit.

I ran the first and second variant ten times, and got these values:

first: 0.138 +- 0.05

second: 0.137 +- 0.04

So there is no significant difference in my timings, though individual runs differed as much as 0.132 vs. 0.150.

Though kudos for providing the full code and the raw output. It made quite easy to reproduce!

These are the timings I measured, in octave/matlab syntax:

    t1 = [0.132781028748, 0.140866041183, 0.13587808609, 0.138177871704, 0.137129068375, 0.150414943695, 0.138676166534, 0.137069940567, 0.13484787941, 0.136043071747];
    mean(t1)
    std(t1)

    t2 = [ 0.134783983231, 0.143576860428, 0.138277769089, 0.142880916595, 0.140572071075, 0.136868000031, 0.131128787994, 0.134744167328, 0.132730007172, 0.137500047684];

    mean(t2)
    std(t2)


I was going to post this: If you do an average of `N` independent trials, your confidence interval is `O(1/sqrt(N))`.

Hence with the 10^6 trials done in the article, we must expect a couple of points deviation on the third digit.


Taking the average is a bad idea for performance benchmarks like this running under an operating system. The minimum is much more stable. (Easy to verify experimentally: do 10x10 runs of a short benchmark and compare the variance of 10 minima to the variance of 10 averages.)


True, but the minimum might also be dependent on the various caches being primed from previous runs of the same code - a condition that will not appear during normal use.


Caches being primed is a concern, but not when choosing between minimum and average. Neither is affected much if the first run takes longer than the other nine.


IPython's %timeit magic works by running the function in three loops. It returns the minimum of the three averages. This seems like a good compromise.


... and since you don't know how big the constant factor is, you still have to do the experiment a few times.

Also there might be factors (that you are not aware of) that differ between several executions of the same program (maybe address space randomization got extremely lucky or unlucky in one execution? Or your OS has a bug the prioritizes processes differently when the PID is a power of 2? Who knows?), or some background processes (cron?) woke up, so even if your number of in-process repetitions is sufficiently high, you still run the thing a few times and observe if the numbers are stable.


Don't forget CPU frequency, which may slowly step up during the first few runs.


You're not supposed to run benchmarks on anything but the 'performance' CPU governor.


Though, sometimes you want to benchmark with 'ondemand' just to see how the cpu scaling affects your code.

It is known that 'ondemand' may make multi-threaded programs run slower when run on multi-core system. The problem which I have myself observed and reproduced is when two threads are alternately scheduled to run - one thread runs for some time, then wakes up the other and goes to sleep, ad infinitum. With such a workload, the scheduler/governor may enter a situation where the workload moves from one CPU core to the other (*), with no core reaching enough average load to switch it to a higher frequency.

I can't find a reference for this, but it was easy to reproduce by writing a simple program with pthreads doing what is described above. My actual application where I encountered this problem consisted of an event loop thread and a worker thread doing computational tasks (encryption). The most interesting observation was how disabling one of the two CPU cores (yes Linux can do that at runtime) sped up my application and benchmark considerably.


I don't think it's implied that all of the listed differences are significant. I think it's designed to show you that sometimes alternatives make a difference, and other times they simply do not.

It makes the page a good resource if you're wondering if doing something a certain way has a performance impact. If it's all the same to use {} vs. dict(), why not be in the habit of using the one that's a little faster?

Moreover, the tests are organized in a meaningful way. Tests 13 and 14 are quite interesting when seen right next to each other. Showing just test 13 could be deceptive, leading you to think that operators are faster than methods. Seeing 14 right afterward, however, shows you that isn't the case, and provides some clues as to why each works the way it does. It's just enough of a hint to make you curious about what's actually going on, which I really enjoy.

I don't think the fault is with the author, but with your assumptions about the intent.

You may argue that exposition would improve this, but I think the lack of explanation is not only elegant, but better at providing insight for the reader. I don't think there's a right answer here, though.


> I don't think it's implied that all of the listed differences are significant. I think it's designed to show you that sometimes alternatives make a difference, and other times they simply do not.

The heading, "Python: faster way" doesn't suggest that to me.

And when I see comments like https://news.ycombinator.com/item?id=8814309 I feel I'm not the only one who came to conclusion that the first column was meant to be the fastest.

And how should the reader judge the significance of timing differences without knowing any variance or spreading of the measurements?

Maybe some short headings like "80% faster", "no significant difference" would make things clearer. Or color coding.


A single read from disk destroys all of them

Source: http://www.dis.uniroma1.it/~demetres/didattica/ae/upload/lec... , slide 24


Examples b and c on test #6 is something you should NEVER do in Python. Using "is" checking for integers is a huge no-no, and will only work for smaller integers (I think between -5 and 256) that are cached by the interpreter. "is" does identity checking, not equality checking.

    >>> a = 500
    >>> b = 500
    >>> a is b
    False
    >>> a == b
    True
    >>> a = 2
    >>> b = 2
    >>> a is b
    True
    >>> a == b
    True


I honestly do not get the point of this page.

None of the time differences in all the test cases are significant at all. Concerning yourself with this is premature optimization of the highest level especially in a language like Python. One should definitely be more concerned about writing clearer and more idiomatic code.


I have to agree, though there is some information to be gleaned precisely because this guy has gone very far to show us that there is no point to try to get clever with alternative, non-idiomatic ways of doing things. Mind at ease to use the cleanest idioms.

It reminds me a bit of some of my big data jobs. Sometimes you spend days and days torturing data to find interesting aspects, only to find nothing and end in "failure". Only it isn't failure because you've eliminated a bunch of dead-end avenues that you might otherwise have wondered about, freeing you to pursue more promising uses for your resources.


This is mainly what I took from it. The faster way here is almost always the most idiomatic way to do it in the examples, and the reasons why are usually pretty obvious.

And while the individual differences are small, in a language already as slow as Python, starting from the faster, idiomatic code from the start is likely to save a lot of headache down the line.


If anything then he has proven exactly what you're saying, hence that it is OK to being explicit rather than try and be picky about these things, and that code readability can easily be a priority in python.

There are also a few exceptions (the update() method on a dict and list comprehensions).

As a junior python dev < me < senior python dev, this was very cool to read, also very quick to take in, kuddos on the presentation


Agreed, although it's interesting to see the disassemblies.

Python is a language about readability and conciseness first, performance second.

I would have liked to have seen `map` used in the list comp examples, as well as seeing the tests run on Python 3.


It's just a reference, it's not making a value judgement about the different ways to evaluate the truthiness of a list or something.


Someone popping up and trotting out the 'no premature optimization' argument every time an article is posted about optimization is getting very tiresome.

Not all optimization is premature. Sometimes you find that you do need to optimize, and then having to wade through a morass of comments complaining about premature optimization to track down actual relevant content and comment is a pain in the arse.


But his first point still stands - none of these optimisations are very important.


Two points:

- The time differences displayed here are just not significant on a desktop OS such as Windows or Linux. Maybe Windows decided to swap or collect your email (or more likely virus check) as the test was running? The error bars on these numbers will make them insignificant.

- When using Python, readability>>performance. If you want performance, use C; don't even try to optimise Python, its not worth the effort.


In Python, it's worth to optimize your algorithm, e.g. replace an O(n²) algorithm with a O(n log n) one, but trying to shave 0.1% of performance in a tight loop is a wrong approach. Instead, pick Numpy / Numba, Shedskin, etc down to plain C.


I've written a few specific python apps where optimization at this level was worth contemplating in a couple spots after profiling (basically, tight loops). That said, replacing the algorithm has usually been a much smarter play.


There are a few benchmarks where the difference is a factor of two or so, which is actually significant. It does require a careful reading and a healthy degree of skepticism about the close results (the majority) to get something useful from this--could do with a summary.


Those are the quirks of CPython 2.7. The best optimisation is to run with PyPy — those benchmarks run about 10 times faster.

After that, if you want to optimise, you need an applicative benchmark. A microbenchmark won't work because type specialisation runs for (traces of) the program as a whole.


This just feels like picking up pennies in front of a steam roller.


Could someone please explain #Test 11. How is using %s instead of %d double as fast even though it is the same thing is being calculated? I always use %d because it seems like the prooper way to do it.


I looked at the source of 2.7:

The function PyString_Format (Objects/stringobject.c) does the formatting for the % operator. For %s it calls _PyObject_Str which in turn calls "str()" on the object. For %d it calls formatint (located in the same file).

The str() implementation for ints is in int_to_decimal_string (Objects/intobject.c) and it's incredibly simple:

    do {
        *--p = '0' + (char)(absn % 10);
        absn /= 10;
    } while (absn);
The code for formatint is way more complex, and it contains two call to the native snprintf:

        PyOS_snprintf(fmt, sizeof(fmt), "%s%%%s.%dl%c",
                      sign, (flags&F_ALT) ? "#" : "",
                      prec, type);
        // ...
        PyOS_snprintf(buf, buflen, fmt, -x);
The native snprintf is heavier because it handles precision, zero-padding and stuff like that.

I believe this is why %d is slower. %s is a straight "devide-by-10-and-subtract" loop while %d are two library calls to the full-blown sprintf. However I didn't actually profile the code because I don't have a debug build, so I might be completely wrong.

Edit: Link to code: https://github.com/python/cpython/blob/a5c7a4257507a77699a1a...


I met someone who was specifically using `dict()` instead of `{}` because, he told me, the CPython parser was much slower on the later than on the former and that it was noticeable on larger code base.

(That being said I think using Python should be about writing maintainable code, not about micro-optimization.

I don't claim the purpose of that site is to push you to use those micro-optimizations, even if the title seems to suggest so. Including the disassembly makes for a nice visualization.)


> I met someone who was specifically using `dict()` instead of `{}` because, he told me, the CPython parser was much slower on the later than on the former and that it was noticeable on larger code base.

I've been told by code reviewers to use micro-optimisations like this before.

My default reply is "I think this way is faster, since the shorter code results in fewer cache misses. I'd be happy to change it if you show me some benchmarks."

Of course, the cache may or may not affect anything; the point is to burst their philosophical bubble with an example of why measurements are needed.

There's also the good old fashioned "If we cared that much about performance, we wouldn't be using the branching logic of a templating system in an interpreted language."


Optimize for readability. Optimize for maintainance. Optimize for consistency. Once those three are done, you can optimize for performance.


But, you can also possibly habitually change a few things in your initial coding syntax that produces less computational effort to begin with....

I thought it was a good article.


1) We don't know which one takes more computational effort, hence the need for realistic benchmarks with decent confidence intervals.

2) If benchmarks show one style to be slightly faster than another, then I will take note. I've already noted that these benchmarks have no associated statistical information; they're just numbers quoted to an arbitrary precision.

3) If I ever find myself needing such slight increases in speed, it's probably a sign that there's a looming disaster, since it means that a) there's a bottleneck in the program, b) I can't improve the algorithm, c) porting that section to native code hasn't been enough.

If it's a case of death by a thousand papercuts, where slight slow-downs throughout an application are actually making it unusable, then I'll roll up my sleeves and patch the interpreter/compiler (this is easier in self-hosting languages, but doable in all).


No need to get all carried away talking about rewriting compilers or pulling out statistical confidence intervals...

If a certain style has a slight performance improvement, and this can be manifested, and that style is equally as readable and maintainable, there is simply no reason not to adopt it as habit. That's all I'm saying.


If those few things do not break redability, maintaince, consistence, or correctness, why not?

So, first you get good enough on the language to know you are not breaking those. Then you don't adopt micro-optimizations because you are already good enough on it to know what they'll break.


Why? Because most of the ones they show is that a global lookup is more costly than a local one.

Something that may -very- well change quickly. dict() slower than {}, and so on...


Meh. My opinion is that it's easier to just change the code. The reality is there's probably no noticeable difference, and if it's important enough to them that they marked it on a code review they might actually try the benchmark, and then you're just wasting everybody's time.


It depends on the team I guess, but in my situation all changes had to be checked in to git, rebased against the live branch, pass code review, have the pass logged against that git commit, have paperwork filed in triplicate indicating the pass, etc. so it was a PITA to restart that process for such insignificant changes.

Much easier to argue with the reviewer; plus, if it stops them doing it again in the future, all the better ;)

It's funny that even with all of this process in place, things broke routinely when trivial syntax errors would slip through. There were no automated tests or monitoring systems, of course. It seemed like the process was to keep developers in check rather than ensure a working product; I think someone watched Superman 3 or Office Space and got paranoid.

Needless to say, I'm not there anymore ;)


    >>> timeit(a, number=1000000)
    ... 0.102125167847
Measurements shouldn't be quoted without some kind of range, confidence interval, error bar, etc.

I've not come across timeit before, but maybe they could learn a thing or two from systems like Criterion http://www.serpentine.com/criterion/


I found the most enlightening part being the module 'dis' which I was totally oblivious of, but will partake of often from hereon in.


Profile dammit!

Things that improve by few tenths of a second in a million iterations are NOT WORTH YOUR TIME.


Make It Work -> Make It Right -> Make It Fast


These all seem like simple peephole optimizations to me. So why doesn't the Python compiler (and yes, Python has a compiler) do these optimizations?

(I am aware that some of these are not in fact equivalent, but at least some of them are)

On an unrelated note, the website would be easier to grok if it did diff-like highlighting of changed lines.


Most of them are not equivalent. dict() is not the same as {} as someone might have overwritten the dict function to do something completely different. Realizing that dict() and {}, in fact, are equivalent in this particular case requires whole program analysis which maybe PyPy is capable of.


At least tests 3, 4, 5, 6, the first 3 cases of test 7, the first and third cases of test 11, 13, 16, 17, 18, 20, 21 cases 1 and 2, 22, the first two cases of 24, 25, 26, 27, 28, 29 tests 2 and 4, 30, and 32 can be optimized without this issue, if not more.

As I said, "some of these are not in fact equivalent, but at least some of them are".


That's not right, #5 is dependent on what True is defined to:

    True = 0
    def a():
        a = True
        if a:
            return True
        return False
    print a()
There are likely similar subtleties with the other examples, but I don't know the Python spec well enough to see them. You asked why the Python compiler didn't optimize these snippets more and that is the answer.


    >>> True = False
    SyntaxError: can't assign to keyword
...


The examples are for Python 2, not 3. But it's blatantly obvious that you are more interested in "not being wrong" than discussing Python's compiler so I'll stop here.


Almost none of them are significant.


I'm more surprised that the disassemblies differ at all with half of these tests.


I am wondering what is exactly measured here (by timeit). For example for test 5, there is noticeable difference between timings of variants b and c despite them evaluating to the same list of instructions (and I did tests by myself).


The first has

    12 COMPARE_OP               8 (is)
while the second has

    12 COMPARE_OP               2 (==)


You are right! I overlooked this. Thanks!


You mean test 6?


For the tests where the speed difference is significant, these are really areas where the python optimizer (I assume the bytecode compiler does some optimization?) needs to improve. Humans shouldn't be concerned with such small differences in the literal representation of code.


What's up with test 11, variant 1 vs 3?


I guess that the float formatter has more work to do that the string one.


This is so easy and quick to read, yet very interesting. Good job and thanks for making it


Here are my annotations for how I'm reading these. Feel free to correct or update:

1. Faster to use dict literals than the `dict()` constructor

2. Faster to sort a list in-place than to use `sorted()`

3. Faster to use multiple assignment than multiple individual assignments

4. Faster to evaluate multiple individual conditions than a single multi-comparison condition

5. Faster to use idiomatic implied truth value than to compare to `True`

6. Faster to use `!=` than `is not`

7. Fastest way to check for empty list `a` is the idiomatic `if not a`

8. Fastest way to check that an object is not null is `if a`

9. `enumerate` is faster than doing it yourself

10. Constructing strings with `+=` is faster than string surgery

11. Fastest way to place a number in a string is "%s" (not "%d")

12. For built-in types, using built-in functions (e.g. `len`) is faster than calling methods (e.g. `a.__len__`)

13. For built-in types, using built-in operators (e.g. `+`) is faster than calling methods (e.g. `a.__add__(b)`)

14. But for custom-defined classes, the opposite of #13 is true.

15. Adding up a bunch of numbers using `+` is faster than using `sum`

16. When initializing a large list, it is faster to use list comprehensions than to append each element individually

17. The fastest way to build a large dict is to use a dictionary comprehension

18. Same as #17

19. Not sure what the author is trying to demonstrate here. That `bool(a)` is faster when `a` is an empty list?

20. Faster to use implied truth value of an expression than to compare explicitly to `True` or `False`

21. The fastest way to refer to a list is to use the list literal (duh...)

22. It is faster to perform individual comparisons than to check for membership with `in`

23. Faster to `map` the `str` function to a list of values and join, than to manually build up the string. Also, when doing this, faster to use `xrange` than `range`

24. Faster to unpack than to access each list element individually. Also, as in #1, faster to use dict literal than the `dict()` constructor.

25. Faster to just return the boolean value of an expression than to check for it and then decide what to return.

26. Faster to give explicit values to keyword arguments than to expand a dict with kwargs.

27. To switch the values of `a` and `b`, it is faster to use a third variable than to do it in-place using unpacking.

28. Multiple assignment is faster than unpacking, which is faster than multiple individual assignments.

29. Fastest way to get integer part of a real is to use `math.floor`

30. Similar to #26 - it is faster to give explicit values to positional arguments than to expand a list of args.

31. Faster to access local vars than globals.

32. When trying to safely access a key in a dictionary, it is faster to use the idiomatic `D[k] if k in D` than to use the `get` method.


With PyPy: - `is` is very slightly faster - '%d' is slightly faster

Yet another caveat: the differences are often less that 1%, statistically not significant.


I'd expand on #9.

It's generally faster and more readable to use an idiomatic Python iteration than to use indexing inside the loop. The difference was small over 5 elements, but `enumerate(a)` will become dramatically faster than `range(len(a))` if you have a large list. A better comparison would be `enumerate` vs `xrange`.


I'd call any difference less than 1% to be equivalent. Even if it was statistically significant, it's not worth paying attention to.

My interpretation of many of these tests is that there isn't a difference.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: