I used to have the exact same problem with Python, until Python 3.7 made maintai...

c2h5oh · on Aug 21, 2022

Go actually went in the other direction for a bunch of reasons (e.g. hash collision dos) and made key order quasi-random when iterating. Small maps used to maintain order, but a change was made to randomize that so people didn't rely on that and get stung when their maps got larger: https://github.com/golang/go/issues/6719

tialaramex · on Aug 21, 2022

Right, the startling thing about Python's previous dict was that it was so terrible that the ordered dict was actually significantly faster.

It's like if you did such a bad job making a drag racer that the street legal model of the same car was substantially faster over a quarter mile despite also having much better handling and reliability.

In some communities the reaction would have been to write a good unordered dict which would obviously be even faster, but since nobody is exactly looking for the best possible performance from Python, they decided that ordered behaviour was worth the price, and it's not as though existing Python programmers could complain since it was faster than what they'd been tolerating previously.

Randomizing is the other choice if you actually want your maps to be fast and want to resist Hyrum's law, but see the absl experience - they initially didn't bother to randomize tiny maps but then the order of those tiny maps changed for technical reasons and... stuff broke. Because hey, in testing I made six of this tiny map, they always had the same order therefore (ignoring the documentation imploring me not to) I shall assume the order is always the same...

chippiewill · on Aug 21, 2022

> In some communities the reaction would have been to write a good unordered dict which would obviously be even faster

Actually an ordered dictionary has improved performance over an unordered dictionary for the kinds of common Python workloads you encounter in the real world. The reason why is that the design is only incidentally ordered, the design arises from trying to improve memory efficiency and iteration speed. The dict ends up ordered because they stash the real k/v pairs in a regular array which is indexed by the hash table, populating the array is most efficient in insertion order. For pure "unordered map" type operations the newer implementation is actually a tiny bit slower.

tialaramex · on Aug 22, 2022

The main thrust of your claim obviously can't be true and I'm not sure what confusion could lead you to believe that.

Maybe it's easier to see if we're explicit about what the rules are: OrderedDict (now the Python dict) is exactly the same features as a hypothetical UnorderedDict except OrderedDict has the additional constraint that if we iterate over it we get the key/values in the order in which they were inserted, while UnorderedDict can do as it pleases here.

This means OrderedDict is a valid implementation of UnorderedDict. So, necessarily OrderedDict does not have, as you claim, "improved performance over an unordered dictionary". At the very worst it's break even and performance is identical. This is why it's remarkable that Python's previous dict was worse.

But, that's a pretty degenerate case, we can also see that after deletion OrderedDict must use some resources ensuring the ordering constraint is kept. An UnorderedDict needn't do that, and we can definitely do better than OrdererDict.

nemothekid · on Aug 22, 2022

>The main thrust of your claim obviously can't be true

It's surprising that iterating a dense array is faster than iterating a hashmap? I don't think you are parsing the parent post correctly.

If dictionaries are commonly iterated in python, then iterating an array of 100 items that fits in one cache-line will be faster than iterating a hashmap which might have 100 items in 100 cache lines.

tialaramex · on Aug 22, 2022

The claim was that: "Actually an ordered dictionary has improved performance over an unordered dictionary"

Having a dense array is not, as it seems both you and chippiewill imagine, somehow a unique property of ordered dictionaries. An unordered dictionary is free to use exactly the same implementation detail.

The choice to preserve order is in addition to using dense arrays. The OrderedDict must use tombstones in order to preserve ordering, and then periodically rewrite the entire dense array to remove tombstones, while a hypothetical UnorderedDict needn't worry because it isn't trying to preserve ordering so it will be faster here despite also having the dense arrays.

"iterating an array of 100 items that fits in one cache-line will be faster"

On today's hardware a cache line is 64 bytes, so fitting 100 "items" (each 3x 64-bit values, so typically total 2400 bytes with today's Python implementation) in a cache line would not be possible. A rather less impressive "almost three" items fit in a cache line.

But to be sure the dense array is faster for this operation, the problem is that that's not an optimisation as a result of being ordered. It's just an implementation choice and the UnorderedDict is free to make the same choice.

kalefranz · on Aug 22, 2022

Enjoy and appreciate the discussion. Is this in the same neighborhood of why the reworked dict implementation in python 3.6 had insertion order as a detail, but explicitly claimed it was not a feature that should be relied upon? At least until python 3.7 cemented the behavior as a feature.

nerdponx · on Aug 22, 2022

The problem with Python here is that CPython is not only the reference implementation but the de-facto specification. So dicts are still "supposed to be" unordered collections, but now dicts must also preserve insertion order as per the docs and the reference implementation, so now all alternative implementations must also conform to this even if it doesn't make sense for them to conform to it, or they must specifically choose to be non-comformant on this point.

Of course in this case, the order-preserving optimization was actually first implemented by an alternative implementation (PyPY), but I don't think that changes the issue.

lemagedurage · on Aug 22, 2022

Since Python 3.7 preservering insertion-order is part of the language specification.

"the insertion-order preservation nature of dict objects has been declared to be an official part of the Python language spec."

https://docs.python.org/3/whatsnew/3.7.html

nerdponx · on Aug 22, 2022

Right, but that's kind of my point. Adding it to the language spec now creates an additional and frankly somewhat unnecessary point of compliance for other implementations. Python is already so damn big and complicated, my opinion is that we shouldn't makes its spec even more complicated, even if its reference implementation adds more features like this.

alecthomas · on Aug 21, 2022

> Right, the startling thing about Python's previous dict was that it was so terrible that the ordered dict was actually significantly faster.

I've never heard that before and it would be really surprising, given that Python's builtin dict is used for everything from local symbol to object field lookup. Do you have more information?

aaronbee · on Aug 21, 2022

Here’s a description of the new map implementation and why it’s more efficient: https://www.pypy.org/posts/2015/01/faster-more-memory-effici...

adgjlsfhk1 · on Aug 22, 2022

Note that this applies more for python than efficient languages. In python, objects are big, and require an indirection. In faster languages, many objects can be smaller than a pointer and stored inline. As such, dictionaries that have vectorized lookups generally can be made faster.

Groxx · on Aug 21, 2022

`select{..}` cases with multiple valid channel operations also select randomly.

I really like it, it helps you discover (and fix) order-dependent logic WAY earlier. Though I would really like some way to influence how long it blocks before selecting one (to simulate high load scenarios, and trigger more logical races).

reactordev · on Aug 22, 2022

you'll need an interrupt chan for that if you're in a select{..}

hoppla · on Aug 22, 2022

This burnt me when I wrote an algorithm. I depended on the order of keys in dicts as it allowed me reference the value both by index and key.

I wrote the code in python 3.7+, and ended up spending a good amount of time debugging it when I ran it in a earlier python version.