And this, ladies and gentlemen, is how you not make speed comparisons.
#7 surprised me: why would 'if a:' be slower than 'if not a:'? So I ran the code myself, and ran it a few times. And lo and behold, the third digit varies greatly between runs. And the timing between the first and second variant only differ in the third digit.
I ran the first and second variant ten times, and got these values:
first: 0.138 +- 0.05
second: 0.137 +- 0.04
So there is no significant difference in my timings, though individual runs differed as much as 0.132 vs. 0.150.
Though kudos for providing the full code and the raw output. It made quite easy to reproduce!
These are the timings I measured, in octave/matlab syntax:
Taking the average is a bad idea for performance benchmarks like this running under an operating system. The minimum is much more stable. (Easy to verify experimentally: do 10x10 runs of a short benchmark and compare the variance of 10 minima to the variance of 10 averages.)
True, but the minimum might also be dependent on the various caches being primed from previous runs of the same code - a condition that will not appear during normal use.
Caches being primed is a concern, but not when choosing between minimum and average. Neither is affected much if the first run takes longer than the other nine.
... and since you don't know how big the constant factor is, you still have to do the experiment a few times.
Also there might be factors (that you are not aware of) that differ between several executions of the same program (maybe address space randomization got extremely lucky or unlucky in one execution? Or your OS has a bug the prioritizes processes differently when the PID is a power of 2? Who knows?), or some background processes (cron?) woke up, so even if your number of in-process repetitions is sufficiently high, you still run the thing a few times and observe if the numbers are stable.
Though, sometimes you want to benchmark with 'ondemand' just to see how the cpu scaling affects your code.
It is known that 'ondemand' may make multi-threaded programs run slower when run on multi-core system. The problem which I have myself observed and reproduced is when two threads are alternately scheduled to run - one thread runs for some time, then wakes up the other and goes to sleep, ad infinitum. With such a workload, the scheduler/governor may enter a situation where the workload moves from one CPU core to the other (*), with no core reaching enough average load to switch it to a higher frequency.
I can't find a reference for this, but it was easy to reproduce by writing a simple program with pthreads doing what is described above. My actual application where I encountered this problem consisted of an event loop thread and a worker thread doing computational tasks (encryption). The most interesting observation was how disabling one of the two CPU cores (yes Linux can do that at runtime) sped up my application and benchmark considerably.
I don't think it's implied that all of the listed differences are significant. I think it's designed to show you that sometimes alternatives make a difference, and other times they simply do not.
It makes the page a good resource if you're wondering if doing something a certain way has a performance impact. If it's all the same to use {} vs. dict(), why not be in the habit of using the one that's a little faster?
Moreover, the tests are organized in a meaningful way. Tests 13 and 14 are quite interesting when seen right next to each other. Showing just test 13 could be deceptive, leading you to think that operators are faster than methods. Seeing 14 right afterward, however, shows you that isn't the case, and provides some clues as to why each works the way it does. It's just enough of a hint to make you curious about what's actually going on, which I really enjoy.
I don't think the fault is with the author, but with your assumptions about the intent.
You may argue that exposition would improve this, but I think the lack of explanation is not only elegant, but better at providing insight for the reader. I don't think there's a right answer here, though.
> I don't think it's implied that all of the listed differences are significant. I think it's designed to show you that sometimes alternatives make a difference, and other times they simply do not.
The heading, "Python: faster way" doesn't suggest that to me.
Examples b and c on test #6 is something you should NEVER do in Python. Using "is" checking for integers is a huge no-no, and will only work for smaller integers (I think between -5 and 256) that are cached by the interpreter. "is" does identity checking, not equality checking.
>>> a = 500
>>> b = 500
>>> a is b
False
>>> a == b
True
>>> a = 2
>>> b = 2
>>> a is b
True
>>> a == b
True
None of the time differences in all the test cases are significant at all. Concerning yourself with this is premature optimization of the highest level especially in a language like Python. One should definitely be more concerned about writing clearer and more idiomatic code.
I have to agree, though there is some information to be gleaned precisely because this guy has gone very far to show us that there is no point to try to get clever with alternative, non-idiomatic ways of doing things. Mind at ease to use the cleanest idioms.
It reminds me a bit of some of my big data jobs. Sometimes you spend days and days torturing data to find interesting aspects, only to find nothing and end in "failure". Only it isn't failure because you've eliminated a bunch of dead-end avenues that you might otherwise have wondered about, freeing you to pursue more promising uses for your resources.
This is mainly what I took from it. The faster way here is almost always the most idiomatic way to do it in the examples, and the reasons why are usually pretty obvious.
And while the individual differences are small, in a language already as slow as Python, starting from the faster, idiomatic code from the start is likely to save a lot of headache down the line.
If anything then he has proven exactly what you're saying, hence that it is OK to being explicit rather than try and be picky about these things, and that code readability can easily be a priority in python.
There are also a few exceptions (the update() method on a dict and list comprehensions).
As a junior python dev < me < senior python dev, this was very cool to read, also very quick to take in, kuddos on the presentation
Someone popping up and trotting out the 'no premature optimization' argument every time an article is posted about optimization is getting very tiresome.
Not all optimization is premature. Sometimes you find that you do need to optimize, and then having to wade through a morass of comments complaining about premature optimization to track down actual relevant content and comment is a pain in the arse.
- The time differences displayed here are just not significant on a desktop OS such as Windows or Linux. Maybe Windows decided to swap or collect your email (or more likely virus check) as the test was running? The error bars on these numbers will make them insignificant.
- When using Python, readability>>performance. If you want performance, use C; don't even try to optimise Python, its not worth the effort.
In Python, it's worth to optimize your algorithm, e.g. replace an O(n²) algorithm with a O(n log n) one, but trying to shave 0.1% of performance in a tight loop is a wrong approach. Instead, pick Numpy / Numba, Shedskin, etc down to plain C.
I've written a few specific python apps where optimization at this level was worth contemplating in a couple spots after profiling (basically, tight loops). That said, replacing the algorithm has usually been a much smarter play.
There are a few benchmarks where the difference is a factor of two or so, which is actually significant. It does require a careful reading and a healthy degree of skepticism about the close results (the majority) to get something useful from this--could do with a summary.
Those are the quirks of CPython 2.7.
The best optimisation is to run with PyPy — those benchmarks run about 10 times faster.
After that, if you want to optimise, you need an applicative benchmark.
A microbenchmark won't work because type specialisation runs for (traces of) the program as a whole.
Could someone please explain #Test 11. How is using %s instead of %d double as fast even though it is the same thing is being calculated? I always use %d because it seems like the prooper way to do it.
The function PyString_Format (Objects/stringobject.c) does the formatting for the % operator. For %s it calls _PyObject_Str which in turn calls "str()" on the object. For %d it calls formatint (located in the same file).
The str() implementation for ints is in int_to_decimal_string (Objects/intobject.c) and it's incredibly simple:
do {
*--p = '0' + (char)(absn % 10);
absn /= 10;
} while (absn);
The code for formatint is way more complex, and it contains two call to the native snprintf:
The native snprintf is heavier because it handles precision, zero-padding and stuff like that.
I believe this is why %d is slower. %s is a straight "devide-by-10-and-subtract" loop while %d are two library calls to the full-blown sprintf. However I didn't actually profile the code because I don't have a debug build, so I might be completely wrong.
I met someone who was specifically using `dict()` instead of `{}` because, he told me, the CPython parser was much slower on the later than on the former and that it was noticeable on larger code base.
(That being said I think using Python should be about writing maintainable code, not about micro-optimization.
I don't claim the purpose of that site is to push you to use those micro-optimizations, even if the title seems to suggest so. Including the disassembly makes for a nice visualization.)
> I met someone who was specifically using `dict()` instead of `{}` because, he told me, the CPython parser was much slower on the later than on the former and that it was noticeable on larger code base.
I've been told by code reviewers to use micro-optimisations like this before.
My default reply is "I think this way is faster, since the shorter code results in fewer cache misses. I'd be happy to change it if you show me some benchmarks."
Of course, the cache may or may not affect anything; the point is to burst their philosophical bubble with an example of why measurements are needed.
There's also the good old fashioned "If we cared that much about performance, we wouldn't be using the branching logic of a templating system in an interpreted language."
1) We don't know which one takes more computational effort, hence the need for realistic benchmarks with decent confidence intervals.
2) If benchmarks show one style to be slightly faster than another, then I will take note. I've already noted that these benchmarks have no associated statistical information; they're just numbers quoted to an arbitrary precision.
3) If I ever find myself needing such slight increases in speed, it's probably a sign that there's a looming disaster, since it means that a) there's a bottleneck in the program, b) I can't improve the algorithm, c) porting that section to native code hasn't been enough.
If it's a case of death by a thousand papercuts, where slight slow-downs throughout an application are actually making it unusable, then I'll roll up my sleeves and patch the interpreter/compiler (this is easier in self-hosting languages, but doable in all).
No need to get all carried away talking about rewriting compilers or pulling out statistical confidence intervals...
If a certain style has a slight performance improvement, and this can be manifested, and that style is equally as readable and maintainable, there is simply no reason not to adopt it as habit. That's all I'm saying.
If those few things do not break redability, maintaince, consistence, or correctness, why not?
So, first you get good enough on the language to know you are not breaking those. Then you don't adopt micro-optimizations because you are already good enough on it to know what they'll break.
Meh. My opinion is that it's easier to just change the code. The reality is there's probably no noticeable difference, and if it's important enough to them that they marked it on a code review they might actually try the benchmark, and then you're just wasting everybody's time.
It depends on the team I guess, but in my situation all changes had to be checked in to git, rebased against the live branch, pass code review, have the pass logged against that git commit, have paperwork filed in triplicate indicating the pass, etc. so it was a PITA to restart that process for such insignificant changes.
Much easier to argue with the reviewer; plus, if it stops them doing it again in the future, all the better ;)
It's funny that even with all of this process in place, things broke routinely when trivial syntax errors would slip through. There were no automated tests or monitoring systems, of course. It seemed like the process was to keep developers in check rather than ensure a working product; I think someone watched Superman 3 or Office Space and got paranoid.
Most of them are not equivalent. dict() is not the same as {} as someone might have overwritten the dict function to do something completely different. Realizing that dict() and {}, in fact, are equivalent in this particular case requires whole program analysis which maybe PyPy is capable of.
At least tests 3, 4, 5, 6, the first 3 cases of test 7, the first and third cases of test 11, 13, 16, 17, 18, 20, 21 cases 1 and 2, 22, the first two cases of 24, 25, 26, 27, 28, 29 tests 2 and 4, 30, and 32 can be optimized without this issue, if not more.
As I said, "some of these are not in fact equivalent, but at least some of them are".
That's not right, #5 is dependent on what True is defined to:
True = 0
def a():
a = True
if a:
return True
return False
print a()
There are likely similar subtleties with the other examples, but I don't know the Python spec well enough to see them. You asked why the Python compiler didn't optimize these snippets more and that is the answer.
The examples are for Python 2, not 3. But it's blatantly obvious that you are more interested in "not being wrong" than discussing Python's compiler so I'll stop here.
I am wondering what is exactly measured here (by timeit). For example for test 5, there is noticeable difference between timings of variants b and c despite them evaluating to the same list of instructions (and I did tests by myself).
For the tests where the speed difference is significant, these are really areas where the python optimizer (I assume the bytecode compiler does some optimization?) needs to improve. Humans shouldn't be concerned with such small differences in the literal representation of code.
Here are my annotations for how I'm reading these. Feel free to correct or update:
1. Faster to use dict literals than the `dict()` constructor
2. Faster to sort a list in-place than to use `sorted()`
3. Faster to use multiple assignment than multiple individual assignments
4. Faster to evaluate multiple individual conditions than a single multi-comparison condition
5. Faster to use idiomatic implied truth value than to compare to `True`
6. Faster to use `!=` than `is not`
7. Fastest way to check for empty list `a` is the idiomatic `if not a`
8. Fastest way to check that an object is not null is `if a`
9. `enumerate` is faster than doing it yourself
10. Constructing strings with `+=` is faster than string surgery
11. Fastest way to place a number in a string is "%s" (not "%d")
12. For built-in types, using built-in functions (e.g. `len`) is faster than calling methods (e.g. `a.__len__`)
13. For built-in types, using built-in operators (e.g. `+`) is faster than calling methods (e.g. `a.__add__(b)`)
14. But for custom-defined classes, the opposite of #13 is true.
15. Adding up a bunch of numbers using `+` is faster than using `sum`
16. When initializing a large list, it is faster to use list comprehensions than to append each element individually
17. The fastest way to build a large dict is to use a dictionary comprehension
18. Same as #17
19. Not sure what the author is trying to demonstrate here. That `bool(a)` is faster when `a` is an empty list?
20. Faster to use implied truth value of an expression than to compare explicitly to `True` or `False`
21. The fastest way to refer to a list is to use the list literal (duh...)
22. It is faster to perform individual comparisons than to check for membership with `in`
23. Faster to `map` the `str` function to a list of values and join, than to manually build up the string. Also, when doing this, faster to use `xrange` than `range`
24. Faster to unpack than to access each list element individually. Also, as in #1, faster to use dict literal than the `dict()` constructor.
25. Faster to just return the boolean value of an expression than to check for it and then decide what to return.
26. Faster to give explicit values to keyword arguments than to expand a dict with kwargs.
27. To switch the values of `a` and `b`, it is faster to use a third variable than to do it in-place using unpacking.
28. Multiple assignment is faster than unpacking, which is faster than multiple individual assignments.
29. Fastest way to get integer part of a real is to use `math.floor`
30. Similar to #26 - it is faster to give explicit values to positional arguments than to expand a list of args.
31. Faster to access local vars than globals.
32. When trying to safely access a key in a dictionary, it is faster to use the idiomatic `D[k] if k in D` than to use the `get` method.
It's generally faster and more readable to use an idiomatic Python iteration than to use indexing inside the loop. The difference was small over 5 elements, but `enumerate(a)` will become dramatically faster than `range(len(a))` if you have a large list. A better comparison would be `enumerate` vs `xrange`.
#7 surprised me: why would 'if a:' be slower than 'if not a:'? So I ran the code myself, and ran it a few times. And lo and behold, the third digit varies greatly between runs. And the timing between the first and second variant only differ in the third digit.
I ran the first and second variant ten times, and got these values:
first: 0.138 +- 0.05
second: 0.137 +- 0.04
So there is no significant difference in my timings, though individual runs differed as much as 0.132 vs. 0.150.
Though kudos for providing the full code and the raw output. It made quite easy to reproduce!
These are the timings I measured, in octave/matlab syntax: