Hacker News new | past | comments | ask | show | jobs | submit login
Python dicts are now ordered (softwaremaniacs.org)
734 points by signa11 on Feb 7, 2020 | hide | past | favorite | 436 comments



I live in the Python bubble, so I haven't realized until this post that so few people knew about that.

This post is massively popular despite talking about a feature we had since Python 3.6, in 2016, that was posted on HN at the time and that is featured in most popular tutorials.

A good reminder that most of the world doesn't revolve about my favorite language. And that information is not that fast to spread.


> about a feature we had since Python 3.6, in 2016

No, you had that feature in one implementation. Now it's in the language specification. That's vastly different because only now you can rely on it without fearing it can go away with the next release.


First, cPython is 99% of Python deployments. So much that if a script works on it but not on another implementation, people often consider the later broken.

As this post proves that most people don't even know about this feature, you can be pretty sure the vast majority of people don't know about pypy, micropython, etc.

Secondly, even if you want to nit pick, Python 3.7 made it official more than one year ago. We are currently in the 3.9 alpha.

In fact, 3.7 didn't touch the implementation, just merely declared "yep, good idea, let's keep it that way".


There's also the indication that this is official, and not just an implementation quirk. I still used OrderedDict in 3.6 because there was no guarantee 3.7 (or later) wouldn't silently break ordering on the standard dict. At least now it's unlikely to change because it will be considered a breaking change.


BDFL declared Python dict to be ordered in 2017 — It can't be more official than that for Python https://mail.python.org/pipermail/python-dev/2017-December/1... i.e., Python 3.7 (whatever implementation must keep the insertion order). CPython keeps the order since Python 3.6. Pypy even before that.


A barely legible email dump with 50 concurrent answers is supposed to be the clear official statement?

Not to mention that developers just begin migrating to python 3 in 2017 and code still had to work on 2.7 that doesn't order.


The official release notes for the following release is certainly a clear official statement.

https://www.python.org/downloads/release/python-370/

“The insertion-order preservation nature of dict objects is now an official part of the Python language spec.” (June 27, 2018)


I knew about it, because everyone was talking about it. The origination of the actual information is irrelevant. It was basically interesting gossip for a few weeks.


The ruling is the very first and only line from the link:

> Make it so. "Dict keeps insertion order" is the ruling. Thanks!


There's a great talk by David Beazley[0], which I don't remember its title right now, where he bashes all previous python versions except the latest at the time[1], which was 3.6. And he says "since I'm not a core developer, I can tell you this: rely on dicts being ordered so much that eventually they'll make it official". Guess he "won" in the end ;)

[0] Well, he usually gives great talks anyway

[1] Alright, he literally started many talks doing that


Raymond Hettinger does or did a few talks on dictionaries too. He covers the history of dictionaries in python and how they evolved. It was mentioned in the article's comment section.

https://www.youtube.com/watch?v=p33CVV29OG8


Yes, David Beazley's talks are always wonderful!


Whoa, calm your horses angry boy.


That still was announced a while ago. When reading the title I though it was something new (maybe now Python decided to keep the keys sorted? ;)


> rely on it without fearing it can go away with the next release

This is Python we're talking about.


There is no language spec for Python, the CPython implementation defines the language.


The language reference is a specification for the language. The library reference is a specification for the standard library.

https://docs.python.org/3/reference/index.html


I have lived in Python ~80% of my career (~50% now) currently), although a large chunk of my career was writing 2.7 compatible with 2.6 (thanks RHEL). For the last year and a half I have written 3.6+ exclusively and I didn't realize that regular dicts were now ordered.

I don't use dicts if I require ordering, but if I did I would probably still reach for collections.OrderedDict...

I wonder what the implications are of dict ordering with the "new" (new for me...) function args and *kwargs...from the hip it sounds pretty handy.


The implications are that now you get to see the order in which keyword args were specified by the caller. And in classes, you get to see the order in which attributes were defined. Both can come in very handy.


Dict ordering started from the need to have class attributes, args and kwargs ordered if I recall.


No, that was a happy coincidence. The goal of the dict implementation was efficiency. Ordering was a side-effect.


Most of the python I've written and maintain (which, honestly, isn't too much) was 2.7, so it is news to me too. Nice!


This is awesome, because the ordered map is the best data structure out there for easy & predictable programming, possibly only barring the array.

There's so many cases where it's a benefit for map entries to retain order (and none where it's a problem). PHP really got this one right (and immediately messed it up by mixing ordered maps with arrays into a big soup, but hey, PHP). And so did, JS, sorta-kinda-by-accident.¹

When I went from PHP to Ruby back in the day, Hashes not being ordered definitely was my biggest gotcha. Everything about Ruby (except the deployment) was nicer, but the hashes... I've spent serious amounts of extra thinking just to make code work that would've worked out of the box in PHP.

Yay for ordered maps! Every language should have them, they're the best! There's just so much ergonomics packed in such a simple API.

1) JS objects are ordered maps, iff the keys are strings that cannot be parsed as an integer (really) (yeah it's nuts) (but still better than unordered maps!!)


> JS objects are ordered maps

So currently (now that there are Symbols in JS), the order is: array-indexed properties first (integers), then string-key properties, in insertion order, then Symbol-keyed properties in insertion order.

The reason for the funny behavior in treating integer keys differently is that property keys are always treated as strings, so obj["3"] and obj[3] can't be distinguished, and arrays are also ordinary objects, so setting obj[3] was made to do the same thing on any object rather than special-casing arrays and non-array plain objects...

Today, JavaScript is a small, reasonably elegant scripting language with reasonable semantics, buried in a medium-sized, reasonably expressive scripting language with reasonable but different semantics, added 20 years later. The kitchen-sink disease is far enough along that it'll probably never recover the appeal it once had as a beginners' language. It still has the advantage of backwards compatibility, and it's become more practical as a compilation target.


> The reason for the funny behavior in treating integer keys differently is that property keys are always treated as strings, so obj["3"] and obj[3] can't be distinguished, and arrays are also ordinary objects, so setting obj[3] was made to do the same thing on any object rather than special-casing arrays and non-array plain objects...

Note that special-casing arrays is precisely what implementations used to do (i.e., for non-Array objects, the order in most VMs was simply "properties in insertion order").

V8 was the first JS VM to stop special-casing Array (and this was done before Chrome went public, and was the behaviour in the first Chrome beta). It turned out that virtually no websites relied on "array index" properties appearing in enumeration order (there was some breakage, but it was relatively few and far between, and believed to be worthwhile for the gains in cache-hits when accessing properties), and this also allowed the more compact array representations to be used for them.


I forgot about that when writing my comment but yes, V8 made that change along with "shadow classes" or whatever the object specialization stuff was called. Heady days for JS performance.

In retrospect if something was going to be standardized I'd have preferred it to be the older behavior which was simpler to explain, but so it goes.


SpiderMonkey actually had inline caches before V8 shipped if I'm not mistaken; the motivating factor for the behaviour change in V8 was just to improve the cache-hit ratio.


JavaScript is a small, reasonably elegant scripting language consisting entirely of edge cases.

/s


I did not even know that integer is a good key for an object. Is this being used anywhere outside arrays?


You can use it anywhere you want. Whether it's a good idea or not, like most anything in JS, is a matter of opinion.


> PHP really got this one right (and immediately messed it up by mixing it with arrays, but hey, PHP).

No, it didnt. What PHP calls "array" is actually an Ordered Map, as you alluded to:

https://yaml.org/type/omap

it literally says that in the documentation, paragraph one, sentence one:

> An array in PHP is actually an ordered map.

https://php.net/types.array

The issue, if one exists, is that PHP doesnt have a sequence type:

https://yaml.org/type/seq

but one is provided as a package:

https://php.net/class.ds-sequence

https://github.com/php-ds/extension


Sure, PHP arrays are actually ordered maps, but it's super confusing when used as arrays. I mean, look at code like this:

    $arr = [];
    $arr[2] = "two";
    $arr[0] = "zero";
    $arr[1] = "one";
    echo join($arr, ", ");
    // two, zero, one
In any other language, the result of similar code would be "zero, one, two". You'd need to `ksort` this thing to get it to behave like a normal bog-standard boring array.

Everybody's writing PHP code pretending that it has arrays when really, it does not. The gotchas are way bigger than the gotchas in Python <3.7's unordered maps. It's a mess.


Why wouldn‘t you just push/append to the end of the array with '$arr[]'? There is no need for explicit keys in PHP. The gotcha is writing code in PHP pretending you are using a different language.


I literally just learned "bog-standard" 10 minutes ago and now I see it here. And no, don't Baader-Meinhof me, I would have noticed this weird term in random text in the past.


Would you though? Appears at least once a week: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...


>I would have noticed this weird term in random text in the past.

Well, no so weird, I'm not a native english speaker, and I've seen the term hundreds of times over the years...

It's also quite common on HN:

https://www.google.com/search?q=site%3Anews.ycombinator.com+...


I'm aware of the terms "bog" and "bog-standard", but not being from the UK, I'm not sure what is so standard about bogs.


Baader-Meinhof denial!


Baader-Meinhof Dunning-Kruger?


I have to disagree also.

The order inserted makes most sense in most cases. Hand crafting integer based array indexes out of order is rare. And as you say a simple ksort call guarantees sort order.

    ksort($arr);
    echo join($arr, ", ");
    // one, two, three


Exactly. Of you hand craft an array you probably need it in that order.


Then why not put it in order while you're hand crafting it?


I disagree. The behavior you exhibit makes more sense than the behavior you imagine.

You start with an _empty_ array, then (somehow) set the third element of that array? That makes very little sense.

Had you _initialized_ your example as an array with a length of 3, your desired behavior would, indeed, manifest.


The problem is PHP calling an ordered map an array. Array in pretty much every other language means a sequence indexed by integers.

Ideally the maintainers of PHP would rename it and deprecate the use of `array()` over a long period of time.


The PHP documentation refers to them as "associative arrays", which is technically correct, but I agree I'm not sure how they could have conflated these concepts so badly.


I believe the "associative array" terminology came from Perl. Though Perl's associative arrays don't have any guaranteed order. And Perl also has normal arrays.


>You start with an _empty_ array, then (somehow) set the third element of that array? That makes very little sense.

Well, there's no reason a language could not initialize an empty array to some initial capacity, either on declaration or when you add an indexed element.


I guess my reasoning is that:

$arr = []

doesn't appear to be initializing to any capacity. So that after:

$arr[2] = "two"

I would expect:

count($arr) === 1

And honestly, sure. A language could be designed to pad the array with empty values if you specify an index greater than the upper bound (or lesser than the lower bound). I believe a sibling mentioned matlab did something like that.

But is that really better behavior or just different behavior? I don't really see an important difference. My point was that the criticism was fairly weak. PHP, indeed, has a very useful abstraction in its concept of "array". Most of the hate should be directed at the name, not the construct.


In that situation, Matlab allocates an array of length 3, fills it with “empty” values (depending on the type), and then sets the third element to the value. That’s what I’d have expected to happen here too...


Would you expect it to do that if somebody wrote the following?

    $id = 12835151;
    $arr[$id] = Get_thing_with_id( $id );


If $arr were an array I would, though I'd cringe at the wasted space (unless it were a sparse array).

Obviously, I'd agree that a map/associative array/dict/etc is a better choice here; I'm just annoyed about the name.


In my bog standard PHP I would do $thing = Get_thing_with_id( $id ); if(!empty($thing)){process($thing);}


yes.


So how can you access, say, the third element of your ordered map in PHP ?


    array_values($arr)[2];


I mean youd probably have to iterate. Either that or use a package with proper array support.


That's messed up


In practice, it's almost never an issue. The standard lib has push/pop/shift/pad methods available.

So example, if you had numeric indices that you wanted to guarantee that index order is equal to insertion order, you'd create an array using $my_array = array_pad([], 20, null); If it's an array that you didn't create, the ksort(...) method will order the array for you as expected.

Edit: after reading through the documentation on array_pad, it looks like it will even correct key order in existing arrays. https://www.php.net/manual/en/function.array-pad.php#87735


>There's so many cases where it's a benefit for map entries to retain order (and none where it's a problem)'

Don't know if this could actually be the case in practice, but theoretically the ordering could allow for a timing attack to glean some bit of information when performing a linear scan of the map (size of the map, relative location of the data, etc).

Just a contrarian thought given the definitive statement of "none where it's a problem". I'm generally of the same opinion as you though; mostly if not entirely harmless, potentially helpful in certain applications.


>Don't know if this could actually be the case in practice, but theoretically the ordering could allow for a timing attack to glean some bit of information when performing a linear scan of the map (size of the map, relative location of the data, etc).

An attack where the attacker has access to your ...program code and can run instructions there? In that case, leaks from a "timing attack" would be the least of your worries...

If they just provide an input to your program somehow externally, then whether you put that input into a map or an ordered map or not is an implementation detail. You could make your program rid of the "timing attack" in 100s of ways... (or have one, in 100s of ways). That doesn't make an ordered map more unsafe than any of the 1000s of ways to have a timing attack.


I think it's also worth noting that requiring an ordering is preventing a performance optimization

This doesn't impact most high level python code, and an OrderedDict is a very reasonable default. But there's a reason why Google's c++ map intentionally randomizes iteration order

(Hint: it allows the hash map and hash function to be extremely high performance while allowing themselves the flexibility to change the hash function)


Python dicts are not ordered maps, they are dicts preserving insertion order.


The title of the post is misleading: it means "ordered by insertion order" (not sorted).

Previously it was unpredictable, and that's why collections have OrderedDict; which makes sense instead of trusting an implementation detail of CPython from 3.6.

Is not mentioned in the docs: https://docs.python.org/3/library/stdtypes.html#mapping-type...

EDIT: but the behaviour is documented in OrderedDict itself! https://docs.python.org/3/library/collections.html?highlight...

Interesting!


I wonder why OrderedDict isn't outright deprecated (or reimplemented as a trivial wrapper over dict).

You can get the first/last key of a 3.8 dict with dict.keys() and reversed(dict.keys()) which you can then delete to reimplement OrderedDict.popitem.

Deleting a key and reinserting it will let you reimplement move_to_end.

The only cool thing that OrderedDict's implementation might be useful for is to move to front (or anywhere else not the back) but it doesn't expose that api.


When OrderedDict is compared with another OrderedDict, it also checks whether the order of items is the same (this way also violating LSP, but that's another story), dict instances don't do that.


Thanks, I just learned something.


...thats the same thing. Maybe youre thinking about "sorted maps"?


It's not the same thing. Compare two OrderedDicts that weren't constructed in the same order and the comparison will fail. Compare two dicts that weren't constructed in the same order and the comparison will pass.

The difference is whether order is part of its identity. A dict is still just a set of pairs, not a list of them. It just happens that it also guarantees now that if you _iterate_ over the set you'll walk the keys in insertion order.


Good point, but close enough! My points about how it makes the data structure more awesome stands.

FWIW I don't think people agree on whether "element order is part of the value's identity" is part of the definition of "ordered map". There's plenty other languages & libs with ordered maps where the order isn't part of its identity but they still use the term "ordered map" in the name or the docs. Eg PHP's arrays or ImmutableJS's OrderedMap.


Could you explain some examples of what this is useful for? What sort of algorithms or operations do you have in mind where you both want insertion order, and also key-based lookup in the same data structure?

I've made heavy use of all kinds of maps, and of queues and channels and arrays, but I don't recall ever noticing a situation where I wanted the properties of both mixed into the same data structure.

I'd love to learn more about useful tools to add to my toolbox!


For "algorithms", I mean, some really do care about insertion order. Like idk, if you have a priority queue, then you generally want FIFO ordering when the priority is the same? It like a pretty obvious desire in most cases... imagine a thread scheduler or a packet scheduler or what have you.

But generally it's about determinism and avoiding loss of information, not just whether a particular algorithm needs it. For example, you'd want serialize(obj) and serialize(deserialize(serialize(obj))) to produce the same output, otherwise you e.g. might not be able to cache stuf. But for a data structure like a hashtable, it's pretty tough (not logically impossible, but rather pointlessly difficult) to make that happen without preserving insertion order.

As another example, it's incredibly handy for a user to see items in the order in which they were inserted. Like say you're parsing command-line arguments, and the command is ./foo --x=y --w. If the user sees {x: y, w: None} then that tells them w was passed after x. That can be extremely useful for debugging; e.g. maybe you expected the caller to specify w earlier, in a different context and for an entirely different reason. Seeing that it came afterward immediately tells you something is wrong. But when such information is lost it's harder to debug code.


If --w can be specified in two places for "an entirely different reason", then an unordered data structure is simply inappropriate, full stop. That's not a question of debugging, that's a question of correctness.


  console.log(object)

  {
    keys
    in
    headache
    less
    order
  }
I find this use case severely undervalued in this thread and have no idea why. It helps so much in logging and/or debugging.


I wrote a program that read in a proprietary data format, ran a few transformations, then outputted CSVs. Using an ordered map was super-useful, because it gave me deterministic output between runs. Otherwise I ended up with CSVs with differently ordered rows...


I think there is a lot of counters you might want both.

For example, supposed you have a list of ids accessing some system, you may want to count the raw number, but then also have some transformations thereof that are still ordered by count.

Example

  l = getListOfAccessIds()
  counts = Counter(l)
  total = np.sum(counts.values())
  fractions = {k:v/total for k,v in  counts.most_common()}

  #use the fact new dictionary is still ordered by most common
  plt.semilogy(fractions.values())
  plt.plot(np.cumsum(fractions.values()))

  #still works like dict
  print(fractions[id_of_interest])


Very few algorithms really care, rather, it's when you're composing them that maintaining insertion order avoids a lot of boilerplate in having to cart that information around.

For the same reason, though, it's potentially a bad idea. All these algorithms that generate dicts generally won't promise to maintain insertion order, they just happen to by chance. Then consumers come to depend on it and be surprised when it inevitably changes.


I use ordered maps to keep track of state snapshots/patches of redux/mobx-state-tree stores.

It allows me to both apply them sequentially as generated, or to "jump to" a particular point in time.


Say, a simple LRU cache:

* Inserting into the cache is just normal insertion.

* To delete excessive items, simply iterate to get the first few items' keys, then delete.

* To lookup, simply do a key-based lookup, then delete and re-insert.


Working with GraphQL.


Point of order: Ruby’s hashes do preserve insertion order. Python dicts are now the same.


Good for Ruby! Nice to hear.

I'm pretty sure they didn't in Ruby ~1.8 though. It's been awhile :-) Can't remember exactly which version, but it bit me more than once so I'm pretty sure :)


Yup, I think Ruby 1.9 or so made hashes ordered.


Awesome! Thanks for the heads-up both of you :-)


Ruby hashes are currently ordered and have been for quite a while now. Ruby 1.9 maybe?

But back in the day they indeed were not, for appropriate values of back in the day. :)

it did make a LOT of things more convenient and less buggy when they made ruby hashes ordered, I recall. I didn't expect it would matter much, by found myself loving it. I believe the ruby maintainers investigated and determined they could do it with very little performance hit. It turns out that having a repeatable and predictable order, that also matches insertion order, is what a lot of people end up assuming whether they realize it or not, just makes everything smoother when it is.


> There's so many cases where it's a benefit for map entries to retain order (and none where it's a problem)

Of course there’s a trade off. There has to be.

In this case it prioritizes usage patterns oriented around insertion and enumeration over usage patterns involving repeated removal Or modification of items.

as items are removed, the insertion-ordered index will either fragment, require removal operations to not be constant time, or be implemented using a data structure that has per-item storage overhead (such as a linked list).

also there are two possible, reasonable things for an ordered list to do when you write a new value to an existing key - leave the key in original insertion position, or move it to the end. Whichever one the implementation chooses, people who want the other will need to implement their own key ordering state alongside that which the collection is doing for them.


> and immediately messed it up by mixing ordered maps with arrays into a big soup, but hey, PHP

And here I thought Lua was the only language insane enough to do something like that.

Just realized Lua tables aren't ordered, but it still mixes arrays and maps / tables into a single data structure.


In JS's defense, it really tried as well. Fortunately though they failed harder at it, which is why arrays and objects are probably more different from one another than Eich had intended there in that mad 2 week language design frenzy. Good for us!

Back when JS got popular and the majority of web programmers knew PHP, there were dozens of tutorials out there trying to convince you not to do stuff like this:

    var a = new Array();
    a["moo"] = "five";
    a["foo"] = "six";
After all, well, it works! Like in PHP! So what's the problem? :-) Most people felt the same way about these tutorials as people feel about monad tutorials nowadays.

Good times!


Also don’t forget about JS Map which is ordered

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...



That's not about JS Map objects, that's about regular objects which don't keep their properties in insertion order.


You missed the point.



Am I the only one that thinks this is a stupid decision? This will silently break code that starts to rely on this behaviour that gets executed on Python3.5 and lower. I would consider changing how a builtin works to be a major breaking change. It would have been fine if this was a change between 2 and 3 but on a minor version? Thats insane.


I'm not clear on how this break existing code.

Code that assumed it was arbitrary, would expect to handle any arbitrary order, including a happens-to-be sorted order.

Code that assumed it was random, like actually inserted by random(), was already broken, because that simply isn't the case.

Code that assumed the order would stay constant was relying on implementation-specific behavior, and could potentially break on any version update; as with any reliance on implementation-specific behavior, you'd break if the dictionary code ever got touched -- even if it were for a bugfix.

Code that ordered the dictionary keys before iterating are now slightly innefficient due to extra work of sorting a sorted list.


It doesn’t break existing code. Code written for Python 3.7 might break on older versions of Python


What you describe is forward compatibility, and Python (and most other programming language) doesn't have it.


Usually languages don’t have it in a very explicit way though. If I try to use f-strings in python 3.5 I get a very explicit syntax error. If I rely on ordered insertions in Python3.5 I get a potentially difficult to diagnose bug.


Why does it matter if it's explicit or not. If python doesn't support forward compatibility, you should know that if you write code for 3.7, it's not gonna work in 3.5. Doesn't seem like a big deal to me.


If you’re the only one writing it and you’re the only one running it, it’s probably fine. But if I’m putting a file out there that will only work in 3.7, it’d be nice if any potential users of that file would get a good error message if they try to run it on 3.5, rather than wrong results.

I could potentially assert a version, but do I really want to do that each time I wrote something that might be used somewhere else?

And yes of course I could add a line of documentation, but there is a 100% chance I’d still get bug reports from people on 3.5.


Setuptools solves this, add a python version specifier to your setup.py or pyproject.toml file.

If you are just distributing raw python files then congratulations you’ve just realised why packaging is valuable.


If I rely on a python version and I expect other people to use it, I add a version if statement on top. I hate those packaging tools that insist on installing stuff in your system and create a frankendebian when really all I want to do is run a single py file standalone once. Often have to do chenanigans like "python3 -c 'from sometool import __app__'".

If you want to install it, go ahead and copy or symlink it in your ~/bin or whatever you fancy (that's your personal preference anyway unless I'd specifically package it for some OS like Debian). I don't want to have to use some setup.py that I have no clue where in my OS it installs things.


Yes, a failure to understand how your tools work or how to use them effectively does indeed make things harder.


Well I know how my tools work, I don't know how this custom file works that is duplicated and delivered with each project.


> I don't know how this custom file works that is duplicated and delivered with each project.

It's not duplicated and in most cases it's not even delivered as part of the installation.

> If you want to install it, go ahead and copy or symlink it in your ~/bin or whatever you fancy

That's exactly what pip will do if invoked with `--user`.

> I don't want to have to use some setup.py that I have no clue where in my OS it installs things.

It installs it to a single place. Run `python3 -m site` and look at `USER_BASE`.

To avoid a lot of this, use pipx[1] to keep things even more isolated.

> Often have to do chenanigans like "python3 -c 'from sometool import __app__'".

You're doing things wrong because you don't know the tooling. You'd also typically just do `python3 -m sometool`.

Things that are distributed as a single file are either so simplistic and have no other dependencies that you can just make do, or written by someone who doesn't know what they are doing and so you're going to have a bad time.

1. https://github.com/pipxproject/pipx


Most projects use setuptools for package management, which ensures that the environment is as expected.


If you write code that doesn't work in 3.5, you should check the version at startup and exit


Yes, that would be a good idea. But if you ever run scripts you didn’t write, there’s the potential people didn’t do this, and you have the potential for hard to discover bugs. The language should be designed such that bugs are difficult to encounter, this is an instance where it wasn’t.


#!/usr/bin/env python3.7


so now you have to install a specific python version for your script to work?


Only if you're using version specific features.


What happens when python 3.8 comes out? Everybody needs to go into your script to change the hashbang every time a new release comes?


you can just hashbang to python3 which is a symlink to whichever python3.X the user has.

If your code won’t work for older versions you can make an explicit check that the version of python is greater than whatever you need.


Major version changes are for API compatibility. If there is a change which makes all other 3.* incompatible, then it should be a major version increase.


You're describing something like semver, but Python doesn't do semver.


And this change would be fine even if Python did.


Who told you that? I'm looking at PEP 606, it says nothing like what you claim. https://www.python.org/dev/peps/pep-0606/


ECMAScript recently got stable array sorting. That can cause precisely the same kind of backwards-incompatible but difficult to diagnose bug.


TIL

I suppose Python doesn't even have backwards compatibility within the same major release as we saw with the addition of the async keyword in Python3.5. Many older Python 3 packages broke because they expected that to be a legal identifier for a variable name.


tensorflow didn't work on 3.7 for a solid 8 months because some people at google very unwisely decided that `async` and `await` were great choices for variable names, despite PEP492 landing in 2015.


that's because tensorflow is advertisement for Google and while it's technically open-source, it doesn't stand for any kind of community-project, it's all there to show off (and ingrain in its users) the way Google wants things to Go (just look at the byzantine Bazel-build-processes - tensorflow taking hours to build and pytorch about 10minutes...).


my torch builds also take hours.

facebook is just as capable of writing hot garbage, sadly.


Yeah, still it's an order of magnitudes faster...


Python has never tried to maintain backwards compatibility within a major release


You're right, I read the gp too quickly.

But in the case of downgrading, I'm fairly sure there's a number of other breaking changes that can't trivially downgrade minor versions. Like f-strings were only introduced in python3.6 as I recall. Async keyword only exists as of 3.4 as well I think?


I think the argument is that if you run code with f-strings and walruses on Python 3.5, the code will break noisily. Whereas if your code implicitly relies on ordered dicts, it could break silently. Syntax errors rarely cause subtle, hard to track bugs.


Introducing things is different than changing things.


Sure, but you can't safely take everything from a higher version to a lower version in any case; if insertion order became gauranteed due to a bugfix, and wasn't backported, you'd be in the same boat.

The only way to consistently code cross-version is to start with the lowest you plan to support (assuming the higher versions are actually backwards-compatible).

Does any language gaurantee that code is both backwards and forwards compatible?


Issue seems to be silent incorrect behavior, what happens if you attempt to run python code containing f-strings using an older python version. Does it raise an exception? That's good! What happens now if you write code for 3.7 which takes advantage of the new ordering and someone grabs it from your repo and runs it using 3.2, it would happily give incorrect results and noone is the wiser.


If you expect this situation you can assert the language version.


But the whole point is that some developer won’t expect that someone would run their code on an older Python, isn’t it?


Both of these would be syntax errors if you tried to execute them in earlier python versions. This change might break software completely silently.


If you know you're supporting old code, use OrderedDict.

you arguably ought to anyway, for explicitness.


OrderedDict is slow and expensive though: it maintains ordering through a doubly linked list.

It has useful features for manipulating ordering but while I've regularly needed had use for maintaining insertion ordering I can't remember ever needing to move items around within a map.


If memory serves me correctly, ever since dicts became ordered the OrderedDict simply became a subclass of dict, so it will have exactly the same performance characteristics.


OrderedDict was always a subclass of dict maintaining additional information (which is not free, it has to store and manipulate two pointers per entry).

It remains so today, ordereddict is not an alias or trivial facade to the normal dict because it has to maintain its doubly linked list and implement a bunch of additional operations based on that e.g. pop first or move to end.


Yup.

    >>> isinstance(OrderedDict(), dict)
    True


That was already the case in python 2.7 or 3.1.


The trouble is I publish a (new) code that advertises itself as working on 3.x and then it turns out it is being used by a person who only had the version prior to this change.

That said, Go made a similar change (from insertion-order to explicitly-randomized) and world didn’t end. So there’s that.


If your code relies on a minimum python version, you can add `python_requires=">=3.5"` to your setup.py [https://packaging.python.org/guides/distributing-packages-us...] to ensure it's not installed on older releases.

That field itself is kinda new; but if needing to block users with older versions, that shouldn't be an issue.


personally, i just drop a f-string in setup.py and that'll filter out any python for which this issue pertains.


This might not work if you’re distributing a wheel.


Python 3.4 is EOL anyway so there's no need to do this. Anybody running 3.4 is already unsupported.


and 3.5 dies in september. hurray!


I thought Go made the change from undefined behaviour with an underlying implementation that was insertion order in a map with 8 or fewer entries, to similarly undefined behaviour with an implementation that randomised lookups in those cases. Any code that has ever relied on any kind of ordering in Go maps will almost certainly be wrong, even random ordering, because the distribution of the "random" ordering is biased.

See https://medium.com/i0exception/map-iteration-in-go-275abb76f...


FWIW, when Go made that change, it was a much less-widely-used language (smaller blast radius).


Yeah I think this is probably my main issue. I don't think it's reasonable to ask users of your code to always use 3.7+ instead of 3.6 if they are usually expected to be compatible. And it's also unnecessary to break such compatibility for something like preferring dict over OrderedDict anyways. At least I would try to avoid any such issues by still using OrderedDict.

That said, I have no idea about the internals of dict. I assume no performance was sacrificed for this change.


It actually improves performance. Or at least, it comes along with a set of performance improvements that give you ordering for free. Raymond Hettinger has a great talk on it: https://www.youtube.com/watch?v=npw4s1QTmPg&t=1s


Using OrderedDict is actually nice in this case, even if the default dict has the same ordering. That way you're explicitly saying you rely on that behaviour and it makes reading the code easier.


right, so making it implicit is bad design


It's part of the zen of Python: Explicit is better than implicit.


Python is nothing like its guiding principles. That's why it's Zen -- the principles are a collection of contradictory statements, given what you will encounter in real world Python. You're meant to be confused and meditate on it.


Well... full support for 3.6 ended in December of 2018 (now it only has security fixes), the older versions are already unsupported.

Also, this change was implemented 3.6, but in 3.7 they officially documented it as a language feature (i.e. that all other Python implementations also need to preserve the order).


Furthermore, there's a lot more stuff that are not backward compatible after 3.6 or 3.7, and if you're writing a library that targets other versions, I would hope that you have tests for all said versions.


> Code written for Python 3.7 might break on older versions of Python

That's a truism. For all versions of Python. If you use feature of python ver X, you should not be surprised that it doesn't run on versions less than X that lack that feature!!!

If you write a Python library and use feature of Python ver X and don't mark library as only >= Python ver X, you are doing it wrong and a horrible person.


What versions of Python didn't have this behaviour? It was there but just not guaranteed.

I don't see how anything could break (unless there are alternative implementations with different behaviour I'm not aware of?)


> It was there but just not guaranteed.

Dicts have been effectively ordered since 3.6. Iteration order was literally randomised (at process start) before 3.6. I'm also not sure whether the behaviour under deletion was changed between 3.6 and 3.7 so it's possible that there are subtle differences there.


CPython is mot the only Python. Portability is an issue also.


This ja why it is now a language feature. Every python implementation that claims python 3.7 compatability must implement this.

The other why around is also true, code that relies on it must specify that it requires python >= 3.7


Portability isn't affected; if they claim compatibility with python3.7, then they claim their dicts have insertion-ordered keys.

If they claim compatibility with only up to python3.6, they can have whatever order they choose.

The only issue with portability is that I think the main reason it was made a gaurantee is that cpython found the new, presumably optimized, implementation came with insertion order for free, so they went ahead and gauranteed it. But that might not be an optimal strategy in other areas, but they're forced to follow along anyways.

But actually moving cpython code to say ironpython should not be impacted, unless ironpython lies about it's compatibility


That also goes both way: Pypy defaulted to ordered dicts a few years before cpython did.


The insertion order dict implementation actually comes from pypy


The insertion order dict implementation comes from Raymond Hettinger who is amongst other things a core CPython developer. pypy pulled the trigger on using it first (and probably has optimisations CPython doesn't). PHP also used it before CPython did, IIRC. And possibly Rust (as the third-party indexmap).


Do you mean to say the future should be constrained by the past?

I get the whole principal of least surprise, but not at the expense of progress.


> Code that ordered the dictionary keys before iterating are now slightly innefficient due to extra work of sorting a sorted list.

Dicts are ordered, specifically insertion-ordered, not sorted.

Sometimes they'll be sorted:

  D = {i:i for i in range(10)}
... specifically, when you insert them in sorted order. But then you can break the sortedness on your next insert:

  D[-1] = -1
What it allows is parallel iteration:

  zip(D.keys(), D.values())
now being synonymous with

  D.items()
This enables, nay, encourages people to write code that is very subtly broken in 3.5 and below.


The property you mention at the end has been true of dicts since long before 3.5, at least since 2.7 if not forever. See https://docs.python.org/2/library/stdtypes.html#dict.items :

> If items(), keys(), values(), iteritems(), iterkeys(), and itervalues() are called with no intervening modifications to the dictionary, the lists will directly correspond. This allows the creation of (value, key) pairs using zip(): pairs = zip(d.values(), d.keys()).


Sorry, yes, I was hasty. Ordered dicts supports such a thing with intervening modifications:

  oldkeys = list(D.keys())
  D[z] = foo(z)
now if you take

  zip(oldkeys, D.values())
then you're guaranteed to iterate over oldkeys, with the proper values associated with those keys -- if z was an oldkey, its value got updated; otherwise, it comes after oldkeys and gets dropped out of zip.

The subtlety of this is what I, and perhaps others, find the most jarring.


Dicts are ordered in certain cases in python 2, and it was relied upon by some users (I saw it in the wild).

Baking this as a language guarantee is the only protection against Hyrum's Law.


This won't break existing code. It will break new, good code that gets back-ported from eg. a 3.7 tutorial to a 3.5 environment, without any syntax errors.

> Changed in version 3.7: LIFO order is now guaranteed. In prior versions, `popitem()` would return an arbitrary key/value pair. [1]

If they added a new `popordereditem()` method, good 3.7 code would use that, and an attempt to run that code on 3.5 would throw a reasonable, useful error message. If they wanted to play it safe like Rust or Go, they'd add the ordered method and make popitem() deprecated or make it artificially use random/arbitrary order so you can't accidentally depend on a new implementation detail and have a test case work.

Also, your case 4 does deserve some protection because while bad code it's hard to test for bad but working code. Implementation-specific or undefined behavior that works is the worst kind of problem, because it's hardest to test against. Compile-time syntax errors are the easiest, runtime errors are testable with a solid test harness, some possible can be identified as warnings with linters, but sloppy code requires manual inspection to detect.

Actually, no, I take that back: Implementation-specific or undefined behavior that works sometimes! is the worst kind of problem. That's what this code enables; if your test case is `dict({'one': True, 'two': True})` and you have a unit test where you popitem() to assert that you get 'two' and then 'one' it will pass the test harness on 3.7, it will pass on 3.6 because of implementation-specific behavior, and it will pass on 3.5 because the hash map arbitrarily puts those particular items in order. But it will silently get it wrong when you pass in user-supplied data. Shudder.

[1]: https://docs.python.org/3/library/stdtypes.html#dict.popitem


> Code that ordered the dictionary keys before iterating are now slightly innefficient due to extra work of sorting a sorted list.

I think TimSort is extremely efficient for presorted lists, so even that isn't a major impediment.


I'm of two minds. In principle, no, I don't love the idea of a change like this happening on a minor version.

But, from a pragmatic angle, it strikes me as a genius move. As the article said, this was a natural by-product of a performance enhancement that was made in 3.6. In principle, that was fine, because there was officially no predictable ordering, so a change to how it was being ordered in practice shouldn't materially affect anyone. And nobody really paid much notice to it then.

And that's where the danger lies - there's risk there, because, as you said, someone who's used to Python 3.6 might have problems switching to 3.5. And that's probably a greater risk if they left it out of the spec, because people would have continued to not pay it much notice. By making it official in 3.7, though, they've sent out a message that the Internet hate machine has ensured that everyone will hear. That probably, in practice, actually reduces the risk.

And, while many understandably see this as violating the spirit of semver, it unambiguously does not break the letter: Minor changes are only supposed to avoid breaking the software's compatibility with code written against an old version. There was never any rule saying that old versions have to be compatible with code written against the new version.


Python (the language) doesn't follow semver anyway.


Go purposely "randomizes" order of a map when it's iterated over, which you can see running this demo:

https://play.golang.org/p/DISpyv0Zuq_j

HN discussed this a little 6 years ago: https://news.ycombinator.com/item?id=7655948

As crawshaw pointed in that discussion, it helps catch people inadvertently relying on map order in tests or other places in their code.


More important (in my opinion), adding a random seed to your hash prevents collision attacks, where an attacker tries to ruin your performance by sending many values (e.g. HTTP params) that would hash to the same bucket.


Many (most?) programming language runtimes now do this, especially the web-focused ones. But it's completely orthogonal to insertion-ordered iteration (which is implemented by a list).


That surprised me, when I first saw it. If someone takes a minute to read the docs (e.g. Python), one would never build something knowing that order is not guaranteed (e.g. order in dictionaries). But seemingly people do.


Had some code break while porting from 2 to 3 because it assumed dict order and it was part of the official Python build system, the old spark parser for the AST module.


My main issue with it is that if someone figures out an even better way to do dictionaries but it doesn't preserve insertion order, then that technique can't be used. But I certainly wouldn't call it a stupid decision, the problem is that once a behaviour is observable then people absolutely will start relying on it.


All they have to do is give it a different name, such as unordered_dict.


I might have preferred odict be added and dict left alone, so that you didn't have to import OrderedDict when really you just want the new dict. If dict and odict mapped to the same thing for a while, fine, but dict could diverge again if desired.


The problem is that when you behave a certain predictable and deterministic way users will leverage that even if it's not specified.

That is actually why the Python maintainers decided to make this behaviour official: the ordering was the consequence of changes in implementation details, but over 3.6's lifecycle they feared it would cause compatibility issues as users would start relying on the ordering properties of CPython and that would be an issue for alternate implementations (though pypy had switched to the same implementation even earlier so was also insertion-ordered even when running in Python 2.7 mode).

Providing stronger guarantees was considered useful and unlikely to be severely detrimental in the long run, so the project decided to make it part of the language spec.


"users will leverage that even if it's not specified" -- reminds me of the classic xkcd, "Workflow": https://xkcd.com/1172


An xkcd in the thousands is now considered "classic"? Man I'm getting old.


Which means no one gets it for free when it is added to the language.


You could add an ordered dict by a different name and also have dict be the more efficient ordered version so that people get the efficiency gain for free but don't have to worry and forward compatibility.


I think this is a key point. Since they advertised this, it became a commitment for all future versions. Sure they can break again in 4.0


Maintaining insertion order is strictly a product of adding an additional list to whatever more efficient map you implement. Add something to a dict, push it onto the end of the list. When iterating over the keys, use the list instead of the dict structure.

It does come at a cost, but I think Python aims for ease of use over runtime speed and memory efficiency, so it seems perfect for them.


> I would consider changing how a builtin works to be a major breaking change.

This one is unusual because it won't break old code being brought forward, only the other way around, and theoretically there are lots of things going the other direction that would break (though most of them are explicit, not silent).


In other words, it breaks backwards compatibility, but not forward compatibility.


That's not what people call backward compatibility.

Backward compatibility is the ability to run old code with new interpreters. This is not broken here.

What's broken is the ability to run new code on old interpreters. But this is already broken at every python update (new operators, methods, syntactic sugar..). We could call it reverse backward compatibility.


I suppose the meaning of the term depends on what we consider to be the subject of the compatibility.

- backwards compatible code: new code can run on an old interpreter

- backwards compatible interpreter: old code can run on a new interpreter

EDIT: After some thought, you're right. The second description is the reasonable interpretation.

"python 3.7 is backwards compatible with python 3.6"


This confusion reminds me of a scene from the 2002 film of Dave Barry's "Big Trouble".

(Driving to an airport) "Okay, we gotta pick a road. Arrivals or departures? We're arriving, but then we're departing."

https://en.wikiquote.org/wiki/Big_Trouble

https://www.youtube.com/watch?v=EjHHzZAUxbM&t=300


No, this change is backward compatible but not forward compatible. Old code (written for old interpreter) works in new interpreter; but new code (written for new interpreter) is broken in old interpreter.


That is an original line of thinking in the world of software, “Let’s not add a feature to a new version because it wouldn’t work if used in an older version”.


It would silently introduce bugs, that's the whole issue.


Right. In fact, if it explicitly broke something there'd be less of an issue. Instead, it will appear to work/interpret but you'll get unreliable results. That makes the bug more nebulous.


I think you are giving it bigger weight than it deserves. Writing Python code I would not want to depend on internal order of dict elements without a very good reason. If I am writing some low level code that could benefit from it, maybe I would use this feature, but I would then encapsulate it and insert a version check to complain.

This is like many evaluation guarantees in latest C++-NN standards. Good to know, might be useful for performance tuning of automatic code generators, etc. , but irrelevant most of the time. My 2c.


You can't walk Python code backwards anyway without already considering the minor version differences. eg. f'strings'.


At least f"strings" are an explicit failure case and are obvious when you run unit tests / coverage


Not only f strings, but any change really.. New operators, new methods on standard library objects... Python never guaranteed compatibility of new code with old interpreters.


If you rely on this feature, surely you know it’s new to 3.7 and document your project accordingly...? It’s not like code magically appears, somebody has to write it.


3.5 is EOL soon, and it's already uncommon to write code using a new interpreter that targets an older interpreter - there's more than just ordered dicts that you'll be missing if you do so.

People will just do what they've always done - they'll be aware of the differences, or they'll test.

There's nowhere I have seen that mentions dicts are ordered without mentioning in the same paragraph the version since which this has been the case, so anyone who knows they're ordered will be aware.

Tldr its fine


It's not breaking, it was implementation defined, now it's guaranteed order.

If a new guarantee or behaviour were a breaking change, every non-patch release of a semversioned tool (which python isn't) would be major.


> It's not breaking, it was implementation defined

And randomised at interpreter start, since Python 3.3.


> It's not breaking

And that's sort of the problem. If it was broken, you'd know it and you'd fix/rearchitect it. Instead, it will appear to work.


Sort of. You'd only know based on the interpreter you were using. You wouldn't know unless you tested other Python interpreters.

For example, if you wrote something in pypy it would be ordered versus cpython.


Code that assumes no order cannot be broken by a specific order unless it was setup to assume everything but that specific order in which case it was already broken, but only randomly showing it’s broken nature.


Actualy, dict are already ordered since a long time(since python 3 I think), but it wasn't certified. It was just the result of a new implementation of dict structure.

I can't imagine how it can break something. In which case can you have an advantage to have an unordered list? Biggest downside can be about perf, but I don't think it's the case here.


python 3.6 is when they reworked the dictionary and had the side effect of being ordered.


For anyone who came to Unit Testing with the XP era and happened to use Java, Java 5 introduced changes to hash table ordering that somehow resulted in Junit 3.0 running some tests in a different order.

It's important to discover coupling between your tests, this just isn't the way most people want to do it.


I don't see exactly how one would rely on dicts not being sorted, as it was already answered by another comments.

And beyond this, even if maintaining backward compatibility is important, I don't think it should prevent software from being improved, and dicts being ordered is pretty awesome. I'm pretty sure relying on unsorted dicts might involve pretty hacky code.

I recently stopped writing a z-order test application because I remembered that dicts are not sorted, so it made my goal a little pointless. Meanwhile C++'s std::map is sorted. I think there was already sorted dict in python, but I don't remember.

EDIT: I'm wrong, I'm mixing "ordered" and "sorted"


I agree, but more from the point of view on what if a new unordered implement if discovered 10 years from now that’s better? Would a new type have to be added?


I think you bring up a good point. If someone hands me a python script and told me it was 3.7 I would only hear 3 and expect it to run or fail, not to run with differing outputs!

Dot-upgrades do allow for breaking changes, but this one can break silently and I’m not sure where such breakages neither belongs in python nor where they should belong.

Personally I’m happy to get ordered ducts sooner rather than later.


I think if you are a library developer targeting different versions of Python, you generally know what features you can and can't use from new versions of Python (ie not many), and ideally you are also testing with each version in CI.


No, I agree, it's stupid. To me it feels like a kind of conceptual backwash from JS. The whole P3 fiasco looks really silly from far enough away.

It's like, you have a great marriage with everything working out and some fine children who are doing great too, but you get a divorce anyway to pursue your dreams of being a conceptual artist.

(BTW, in case anyone is keeping track, I was going to maintain P2 but got caught up in Prolog, of all things, and now Python looks just as crude as everything else and I don't have the requisite love anymore to shoulder the work. I might make a Python 2 interpreter in Prolog though! That would be fun.)


By that logic, any change would be considered "breaking" because people will expect that the change also exists in older versions.


This was my initial thought as well. Seems like people are celebrating, but this is going to be a source of difficult to figure out bugs.


PHP has had this undocumented feature since forever and so did all mainstream Javascript engines.

Big whoop


> Javascript

Javascript doesn't specify performance characteristics of objects and arrays, so even with implementations, one object could be a hashtable, another a balanced tree.


The implementation doesn't matter to EGreg's point, which is that ES standardized that iterating over object keys are generally required to return keys in insertion order.

In the past (from before this standardization) Chrome had in fact changed the object iteration order due to an optimization, and had to revert it after lots of complaining on their bug tracker.

(To be precise, the spec still requires array index properties to be returned ahead of other properties regardless of insertion order. The behavior that Chrome "reverted" to is this new one, so not exactly the same as the original behavior.)


This is an amazing contribution to the language. A mixture of speed and convenience, probably made by volunteers. As for people criticizing a change to what use to be a non-deterministic ordering of a dict iteration; I don't know what to say to them, other than, are you serious? There are people out there who are working for us, they work for free and they did some heavy lifting to give us this. They might read what you wrote and think, "Why bother? Maybe I should spend my weekends playing with my kids instead."


> As for people criticizing a change to what use to be a non-deterministic ordering of a dict iteration; I don't know what to say to them, other than, are you serious? There are people out there who are working for us, they work for free and they did some heavy lifting to give us this. They might read what you wrote and think, "Why bother? Maybe I should spend my weekends playing with my kids instead."

I don't agree with the complaints about ordering ducts, but this is a ludicrous response. Appreciation of volunteers' work on Python isn't diminished by criticism of language features or changes. In fact, it enhances it: if people have opinions about the project you work on, it's a good sign that it's important, significant work. I've contributed to OSS projects before, and the ones in use by more than just my friends and I were naturally the ones that felt the most meaningful.


As it’s widely known, more often than not “criticism” of open source software quickly devolves into hate and toxicity. Helpful criticism is great but be careful with defending the “criticism culture” around foss, it’s often angry unhappy people that want it all for free on a golden platter, and yesterday.


Sure, hate and toxicity themselves should be called out. But that's not what's happening here, and not what GP comment was referring to. It doesn't make any sense to throw out the baby of legitimate criticism with the bathwater of toxicity.


That does not matter, nothing is above criticism.


It does matter. If people get hate for doing work on open source they will stop doing work on open source. Yes you can be critical of everything, but the hate that some people spew toward certain open projects is not only nauseating but actively damaging to the community.


As far as I know, it was implemented by Raymond Hettinger. This is a very interesting talk about the new tech:

https://www.youtube.com/watch?v=p33CVV29OG8&t=1202s


Indeed, here is the mail from Raymond Hettinger explaining the new design on the python-dev list: https://mail.python.org/pipermail/python-dev/2012-December/1... Quite amazing that it is faster, more memory efficient and convenient for the end user.


Raymond came up with the idea, PyPy implemented it, and then INADA Naoki implemented it for CPython.


If I remember correctly his talk. He went to CPython first and it was rejected, but PyPy welcomed it.


It was proposed by Raymond Hettinger, but was actually first implemented by PHP and Pypy.


Other languages don't generally have special order guarantees about standard maps. This seems very idiosyncratic.


For almost all intents and purposes, object keys have been ordered in JavaScript since ES2015. Map has always been ordered.

https://www.stefanjudis.com/today-i-learned/property-order-i...


Users of JSON, probably the most common data interchange format on the planet, frequently have implicit requirements about key ordering. It is highly convenient to be able to parse a JSON string into a native Python data structure, add a field, emit it back, and preserve the ordering.


in what reasonable use case would the order of the properties on an object matter? I can't think of one


when you are diffing the serialized output?


From what I recall of the JSON standard itself, there's no guarantee about key ordering being significant. If you're diffing serialized output to compare two JSON objects you need to be serializing it in a consistent format, otherwise even whitespace is going to throw you off.


It's significant to a human that wants to know what has changed.


If a human is inspecting serialized JSON using pen and paper, the human is presumably clever enough to match up key for key regardless of ordering.

If the human is using a computer to compare two JSON payloads (as the use of a diffing algorithm suggests), the human and computer should be clever enough as a team to realize that they could just deserialize and reserialize each JSON payload such that the keys were lexicographically sorted and the data was pretty-printed in the exact same way before running it through the diffing algorithm. `jq -Sc` would do the trick.


Most diff programs don't have what you describe. And in a lot of cases you don't have the easy ability to "do stuff" before running the input through a diffing algorithm.


> Most diff programs don't have what you describe.

That’s why pipes were invented.

> And in a lot of cases you don't have the easy ability to "do stuff" before running the input through a diffing algorithm.

Well, in that case you’re not going to be able to meaningfully compare two JSON payloads because neither order nor white space have any semantic meaning in JSON. I’m really curious what you’re talking about though, since if you’re working on the shell you can easily use jq to do as I describe.


So you can sort the keys at serialization time rather than paying a performance penalty all the time?


I tracked down a bug once where someone was hashing state that happened to include Python dicts serialized to JSON, which caused different hash values for the same entity. I know, I know, insert tears of blood emoji, and any engineer worth his salt wouldn't be so sloppy. But you don't always get to choose to work only with top talent that understands why you should design things elegantly as a baseline[1]. In these cases, removing implicit gotchas like "the same code has the same behavior" (ie iteration over the same dict) is valuable.

[1] Well you _can_ choose to do so, and I recently have, but it's not without its costs.


I can't speak for all "other" languages but Ruby made the change years ago from 1.8 to 1.9. Ruby calls them hashes but they're standard maps. Obviously Ruby and Python are very similar but I think the point stands.


Golang map iteration is returned in random order specifically to make sure that people don't rely on the order.

I think this feature says a lot about the philosophy of Python vs Go.


Sounds like a fast and idiomatic way to shuffle a deck of cards is then to convert to a map and back.


Convert a deck of cards ([]Card?) to a map (map[Card]bool?) and back just to shuffle? That's unlikely to be faster or more idiomatic than a straightforward implementation of the Fisher-Yates shuffle[1]. Try writing the code to do it both ways and compare.

[1]: https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle


I think "idiomatic" would be using rand.Perm, which implements the Fisher%E2%80%93Yates shuffle. But aside from whether it's idiomatic, converting to a map isn't random enough.


With Golang 1.11, I get orderings like this. S is spades, H is hearts, D is diamonds, and C is clubs, because HN eats the Unicode characters I was actually using.

     S9  SJ  SK  HQ  D9  S4  S5 S10  HJ  S8  H6  DA  D2
     D4  DQ  C6  C8  CJ  H3  H9  DK  C3  CQ  SA  S2  S3
     S6  SQ  H4 H10  D8 D10  C4  CK  S7  H7  D5  D6  D7
     C7  H2  HK  D3  DJ  C2  C5  C9  HA  H5  H8  CA C10
On average you would expect about one card in a shuffled deck to be (cyclically) followed in the deck by its succeeding card, and on about one out of 50 shuffles, that card would be followed by its successor. Here we have S4 S5, DA D2, SA S2 S3, and D5 D6 D7.

This is very strong evidence of bias.

I'm interested to hear if my Golang can be made more idiomatic, or if there are bugs in it: http://canonical.org/~kragen/sw/dev3/mapshuffle.go

In particular it seems like there ought to be a less verbose way to express the equivalent of the Python list({card: True for card in deck}) in Golang.


No, it isn't random enough for that.


IIRC it used to just start iteration at a pseudorandom index and then iterate normally. I looked at it couple years ago, don't know if they changed it.


It seems to do that, but I think it also tweaks the hash function for each newly created map, because in the code linked from https://news.ycombinator.com/item?id=22278753 I'm not getting rotations of the same iteration order when I generate two maps. There doesn't seem to be any randomness in mapiternext() itself.

You could imagine that the hash function itself might do an adequate job of randomizing the order of the cards, though, especially if salted with a per-map salt. SipHash, for example, would probably not have any detectable biases in the distribution of the permutations thus produced. But whatever hash function Golang is using for my structs has an easily visible bias, as described in that comment.


What does it say? That Python is willing to spend more computation time for simplicity, without waiting for the programmer to ask for it? That Python will make semantic changes in minor version bumps of the language?


The new-ish ordered dictionary is actually faster. It is also completely backward compatible for any but the most contrived example.

Now wasting computation on randomizing... that is a problem.


The ordering property is a side effect of the new and more cpu/memory efficient data structure. It would be very surprising to say no to a 2x performance jump to avoid the (sometimes useful) ordering.


Tcl's dicts are ordered.


Java added LinkedHashMap a while ago. I tend to default to it except for small temporary use cases where order absolutely won't matter or have an outside impact...


LinkedHashMap is not Java's "standard maps" though. If you tend to use it by default it's really a personal idiosyncrasy, especially as LLHM tend to be larger and slower than regular hashmaps due to having to maintain a doubly linked list.

Python has had one such in the standard library for a decade or so.


LHM is not slower for iteration (it's faster actually).

LHM indeed pays 2 references per node but they are well worth as it has deterministic ordering/iteration and I have witnessed numerous cases with HashMap that show up in production only due to iteration order (esp after rehashing). The code is broken but the cases did not reproduce during testing...

Now if the two extra references are an issue, consider that HashMap is quite space inefficient with having a dedicated node per each entry - that's the main price. The (simple) node memory footprint is 36bytes on heaps less than 32GB, i.e. compact pointers. The extra references add another 8 bytes for having an insert or access order.

If the goal is getting a compact low memory footprint, HashMap is not fit for purpose. Overall it's the jack of all trades and even got reworked (java8) to support tri-based collision resolution for keys that implement Comparable.

Couple years back I wrote CompactHashMap (under CC0) that has an average cost of 10bytes per entry and it's extremely compact for small sizes with having only 2 fields on its right own, so even small/empty maps are tiny. In microbenchmarks (same used in openjdk) it handily (2x) beats java.util.HashMap on "Traverse key or value", get/put have similar performance, and "Search Absent" is worse.

The point is: LHM should be the go-to hashmap for java as being node based hashmap is justified (unlike java.util.HashMap)


Yep. Also, LinkedHashMap maintains ordering based on insertion order. So, iff you insert your data ordered how you want it, it's behaving as an ordered map. If you want true, self-reordering map, what you want is a TreeMap. Beauty with that one is that you have full control over defining the custom ordering function, because we're not always indexing a map by primitives or autoboxed type. I try to use it sparingly though as you're paying log(n) on pretty much all map operations.

While we're here, another tidbit that's often overlooked in the Java collections: If you reallly care about iteration performance, your data is without nulls, your data is already ordered how you like it or you don't care about ordering, your qty items >= 10, and you don't need random access, then ArrayDeque is gonna be your horse because of how much better it co-locates its contents in memory and how much less overhead is required to maintain it during each operation compared to all the other List implementations, including ArrayList and LinkedList.


In regards to sibling replies, does it feel to anyone else like everything (except golang) is converging towards PHP's array() ?


Interestingly enough here it was kinda but kinda not the other way around: historically PHP used a closed-addressing hash map and threaded a doubly linked list through it to maintain the insertion order.

But the dict TFA talks about doesn't use a doubly linked list, or closed addressing, its ordering is a side-effect of its implementation but not originally a core goal (memory saving and iteration speed were). It'd probably been proposed by others before but it came to wider attention after Raymond Hettinger (a core dev) proposed it a few years back[0]. PHP actually released it first[1], closely followed by pypy[2]. CPython only got around to it some time later[3]

[0] https://mail.python.org/pipermail/python-dev/2012-December/1...

[1] https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implem...

[2] https://morepypy.blogspot.com/2015/01/faster-more-memory-eff...

[3] https://docs.python.org/3/whatsnew/3.6.html?highlight=3.6#wh...


JavaScript Maps are iterated over in insertion order.


std::map in C++ stores keys in order. You have to use std::unordered_map to not get that behavior.


Yes, but that’s different kind of order. Python dicts order is insertion order, while std::map is key order.


People who bother to complain are those who actually care about your thing. People who do not care simply leave without ever telling you why. Your complainers are often your most dedicated and invested users.


Though in this specific case, my sense has been that your complainers are often your most dedicated and invested users of a version of the language that was officially retired 38 days ago.

Python is just bizarrely political. I suspect that it's now cursed for all time to have every future PEP become a battle in a proxy war over PEP 3000.


Sometimes they're just concern trolls though.


Especially with the (lack of) change to sets I'm interested to benchmark some regular things before and after the change.

    set(dict.keys())
    set(dict.values())
    thing1 = dict(...)
    thing2 = copy.deepcopy(thing1)
I feel like there could be some other testcases. I'm wholly in support of the change (regardless of benchmarks) but depending on the results I could see some arguments against.


1. sets don't use the new dict's implementation

2. IIRC the new dicts are no(t significantly) slower than the old dicts, however they use less memory, and iterate faster

The iteration order is actually a side-effect of implementation details, the original goals were a more compact representation and a faster iteration: https://mail.python.org/pipermail/python-dev/2012-December/1...


> 1. sets don't use the new dict's implementation

Yes, of course. That's why I'm wondering if turning dict keys into a set is slower now.


I was under the impression python dict ordering was deterministic, just not in an order recognisable by humans, i.e. ordered by hashes. Was this not the case?


From the docs: "CPython implementation detail: Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions."

So I would claim that equates to non-deterministic


I read that as: if two dicts are constructed in the same way and use the same python implementation then they'd be the same. If you change the way the dicts were generated or the underlying implementation then they're not the same.

To me, nondeterministic is when the input does not change but the output does. The docs essentially say to not rely on the order, not that the result is not reproducible.


CPython hashes are intentionally non-deterministic, so even the same exact Python install will produce different hash orderings on the same code, if you run it twice (although you can make it deterministic by using PYTHONHASHSEED).


I’d call undependable but maybe it’s the same thing in practice?


That's generally what people mean when they say "nondeterministic" in the context of computing. Yeah, in philosophy it generally means something like "the future is not completely determined by the past," but in computing it means something closer to "the programmer cannot reasonably determine the behavior and thus should not depend on specific behavior."


In computing it means a given set of inputs lead to a given set of outputs. It has nothing to do with how difficult it is for a programmer to reason about. Deterministic builds, deterministic tests, etc.


But what counts as "input" will vary based on who you ask and under what context.


> Was this not the case?

From the origin to Python 3.2, iteration order was arbitrary but deterministic (in CPython, it was not guaranteed to be determinstic in other implementations)

From Python 3.3 to Python 3.5, iteration order was non-determinstic (across runs, it was deterministic per-process instance)

In Python 3.6 it's unspecified but deterministic and follows insertion order (in CPython)

In Python 3.7, Python 3.6's behaviour was specified and documented


I don't have any criticism if the performance of this new dict stays the same as earlier and that's a huge if. Considering python3 is already a crawling language compared to peers like java and php, I think devs should focus more on that before handing out features.


> I don't have any criticism if the performance of this new dict stays the same as earlier and that's a huge if.

1. it's been there since 2016, the new dict was released as part of Python 3.6 (though the ordering was only an implementation detail until 3.7)

2. the new dict is much more compact (20~25%) and should have much faster iteration speed, those were actually the original goals, ordered iteration was a side-effect

3. the new dict should not benchmark significantly slower across the board, it wouldn't have been merged if it were a regression


Great decision IMO. I remember maintaining a bunch of Python code that we supported on both OSX and Windows. Out of all the platform-specific bugs we had (where it worked on one OS and not the other), one of the most common causes was code that relied on a certain key order. And we knew that relying on key order was bad, we're supposed to use things like OrderedDict, blah blah blah. It was still a really easy mistake to make, just like it's really easy to have memory safety bugs in C.

At some point, if a human error is common enough, it makes pragmatic sense to change the design so the error is impossible.


One particular annoyance of OrderedDict objects was initialising them with literals. You couldn't just do:

    od = OrderedDict({
        "y": "first",
        "x": "second",
    })
because, by the time the data got to OrderedDict's constructor, it had already been through a dict. Instead you had to supply a list of lists (or tuple of tuples etc.):

    od = OrderedDict([
        ("y", "first"),
        ("x", "second"),
    ])
For hierarchically nested dictionaries, these are quite a bit harder to read.


But you can also just create them with named arguments, like this:

    collections.OrderedDict(y = "first", x = "second")


This also wasn't guaranteed to work since kwargs is a dict under the hood.


FWIW they could have used a special-purpose dict guaranteeing order for that. I don't know if they ended up using the normal dicts but the proposal to keep kwarg ordering predates the new dicts, and the PEP even notes that it's unclear the new dicts would be merged.

Same with class attribute ordering, PEP 520 still talks about using OrderedDict, however it was only merged after the new dicts, and even if ordering was not defined at the time (as part of Python-the-language) PEP 520 could use this internal property just fine and the ordereddict-based version was stripped out, the PEP was essentially accepted as a no-op (the ordering properties would just be officially documented).


kwarg order wasn't guaranteed either until Python 3.6.


The Go designers went the other way (as they often do):

> When iterating over a map with a range loop, the iteration order is not specified and is not guaranteed to be the same from one iteration to the next.

Actually it is not only "not guaranteed to be the same", the runtime actively makes sure that the iteration order is actually different so you don't even start to rely on it...


That’s fairly common, as it is protection against denial-of-service attacks that use hash collisions to make code perform poorly (https://events.ccc.de/2011/12/28/crypto-talk-at-28c3-effecti...)

If I read https://lwn.net/Articles/474912/ correctly, Perl fixed that in 2003. C#, Java and Swift do it, too. As does (or, reading this, did?) Python (https://bugs.python.org/issue13703)


> C#, Java and Swift do it, too.

Welllllllll

In the case of C# it is/was more a matter of picking the -correct- Dictionary.

'OrderedDictionary' has been around since Net 2.0 (2005 or so,) as has 'SortedDictionary<TKey, TValue>'

But for some ungodly reason, there is no 'OrderedDictionary<TKey,TValue>' included in the framework that you can use. All you get normally is the non-generic one (where you have to type-check everything...) There are internal implementations used by the framework, but you'd have to do some hacky things to even get it to work.

However, in my 12 years of C# programming, I've only run into the need for this _once_, and it was to assist interfacing with a VB6 monstrosity.


The hash value order and insertion order are unrelated.


That seems like it would be a performance hit to actively make it different? That seems like a very weird and strange design decision if true


It's a very small performance hit because it doesn't do the work to be uniformly random, just "not always the same". If you think about how you have to iterate through a hash table or similar data structure anyhow, it's either O(1) or O(log n) paid once per "range" on the data structure, which is dwarfed by the actual act of ranging on the data structure.

Go's philosophy is also definitely willing to pay that price to avoid a large class of known bugs that has hit all kinds of code bases. It is not about being the fastest language. As compiled languages go, it's solidly middle-tier, and not likely to go up much from there. (Among the C-style compiled languages, it's low-tier on performance, at around half the speed of C in general. However there's enough compiled languages like Haskell that are still generally relatively slow so that Go is mid-tier for compiled languages over all.)


It is a performance hit and I initially had the same reservations. Importantly, the implementation imposes a small overhead on the access time of iteration over a map, but not in a way which changes the scaling behavior. Other map operations are not implicated.

I think the decision was wise - it prevents user code from relying on ordered iteration behavior, which allows the go team to switch to different map implementations without fear of breaking user code.

If the order had been unspecified, but iteration was still ordered in practice, there'd undoubtedly be (incorrect) user code that relied on that behavior.

Amusingly, I now run into engineers who are under the misconception that map iteration is random, and proceed to use maps as an RNG. That's an unfortunate mistake because Go's map iteration is quite non-uniform - it's shuffled just enough to appear random to the untrained eye, while be performant.


It actually doesn't make that much of a difference. See this commit for actual benchmarks about the impact: https://github.com/golang/go/commit/3be4d95731a17073afb1f69b...


> That seems like it would be a performance hit to actively make it different

Not really. I don't know the go implementation but you can get this behaviour by adding an arbitrary "startup defined" seed to your hash function and that does the trick.

It also gives the benefit of making hash table attacks harder.


Python kinda sorta did the same from Python 3.3 to 3.5: iteration order was not actively randomised but the hash was "seeded" (for DDOS resistance) so iteration order would only be coherent within a run, which in effect made it unreliable.

Python 3.6 changed the underlying implementation to one which incidentally made iteration order not rely on the hash function, and as that seemed like a useful property and one which risked causing compatibility issues with alternate implementation (developers would come to rely on cpython's iteration order and code would break on jython or whatever) they decided to make it part of the spec for 3.7.


In case anyone else is curious: https://github.com/golang/go/issues/6719


Ah i love go. While its mega cumbersome sometimes and lacks generics (WHY????) , i found that i write pretty stable code in it that still looks good after years (^^)


It’s kind of annoying, but does catch an easily missed novice mistake.


I haven't done a ton of Python, but I can't really think of a situation where relying on a dict to be ordered is an easy mistake to make. Do you have an example?


I saw this in Perl a long time ago but it could just as easily have happened in Python. The dict (Perl hash) was a set of mappings for template replacement of "from" strings to "to" strings.

    "FOO" => "bar" # Replace FOO with bar
The author had considered the case where one key (FOO) might be a left-substring of another (FOOBAR), and so reversed the output from keys() before iterating over the hash. This ensured that FOOBAR, which sorts later, would be substituted before FOO (else FOOBAR -> barBAR, oops).

Unfortunately, despite empirical evidence to the contrary at the time, keys() does not guarantee sort-order results https://perldoc.perl.org/functions/keys.html

This worked for a while (appeared to work?) and but was subsequently caught by another developer, who added a sort().

(Get a templating library alreay yeesh).

Incidentally Ruby made the transition to sorted Hash some years ago (1.9 maybe?) and I don't remember any great fall-out.


The Perl example sounds like someone incorrectly assuming that the output from `keys` be ordered, when it's explicitely not:

> Hash entries are returned in an apparently random order. The actual random order is specific to a given hash; the exact same series of operations on two hashes may result in a different order for each hash.

Applying a `sort` to the output of `keys` is second nature to me - I'm always aware that hash entries are unordered in Perl.


I definitely recall there being no fall out or or problems when ruby did it in I think 1.9. Semantically, I can't think of how there would be. Since previously order was unpredictable and not guaranteed, it could be anything at all; always using insertion order is, after all, one possible value of "anything at all".

Only possible downside might be performance (or memory). I recall the ruby maintainers ensuring they had an implementation with very little if any performance/memory implication.


The most common situation I saw was where the entries from the dictionary were being iterated over and used to perform edits on a structure whose edits were nearly commutative.

For example, suppose you have a dictionary containing PriorityItem values. PriorityItems are sorted by priority, while the value only matters for equality (affects __eq__ but not __lt__). If you have a dictionary whose values are PriorityItems and you say `sorted(dictionary.values())` the order of the output could vary w.r.t. equal priority items. So if you wrote a test asserting a process hiding a dictionary had a stable ordering of the items, it might consistently pass by accident on your machine and then fail elsewhere.


Internally, Google modifies Java and Python (and Go was made this way and Chrome made JS this way) so that the dictionary traversal is deterministic.


I'm glad to see other languages finally catching up to PHP. I'm joking (kind of) but after a lot of years of doing this, I've begun de-prioritizing pure abstractions and favoring the way that humans tend to do things on their own.

Technically this is along the lines of the worse-is-better philosophy. The single biggest cost in software development is friction. Performance, size, etc are all less important, because they become less important with each passing year as computers grow more powerful. And along those lines, I think that silly concepts like assigning letter names to drives in Windows, or having a paltry few registers in x86, made those architectures more accessible to the masses than the more generalized Unix and Mac platforms. Not that I completely agree with that, just, it's all I can come up with (other than cost and familiarity) for why people are so attached to worse-is-better paradigms.

Preserving insertion order has some small cost associated with it, but that's going to be dwarfed by the cost of bugs introduced when junior devs are surprised by unordered dictionaries. And the pedantic people can just use an unordered dictionary manually since they are already converting pure abstractions to whatever imperfect implementations are provided by languages anyway.


> I'm glad to see other languages finally catching up to PHP

Now if python could just implement PHP's date function the world would be a simpler place... ( https://simonwillison.net/2003/Oct/7/dateInPython/ )


>Performance, size, etc are all less important, because they become less important with each passing year as computers grow more powerful.

That isn't as true as it once was. We are running up against fundamental limits and people are even starting to talk about the environmental impact of computing.


It should be noted that the ordering of python dict occurred as a side-effect of a new implementation with better memory profile (and as good or better performances).


For those that don't know PHP associated arrays are equivalent pythons dictionaries. They are implemented as an ordered map.

https://www.php.net/manual/en/language.types.array.php

Its pretty much the main data type used in php. Once you get used to it is pretty powerful for dynamic languages. They even have built in sorts:

https://www.php.net/manual/en/array.sorting.php

They are "simple arrays" that have numeric indexes, but I find those used less frequently. (They can morph into the key->value pair).


Your point got me thinking...

PHP arrays may have been one of the "killer features" that lead to it taking hold of the web. It's so frictionless, powerful, and you can bend them to your will with ease.


"Ordered dict" is ambiguous, and from the title I thought that key order was meant. Reading the article, I see that it's actually insertion order. Which makes much more sense. Key order would have been a much more significant change.


Not to mention there's often no meaningful way to order by key value since keys can be any hashable values -- so things like this break this idea:

    _dict = {}
    _dict.update({MyObject: 3})
    _dict.update({'MyObject': 3})
There's no ordering over _most_ hashable values since they span multiple types, so insertion ordering is the only sane way to do it.


A sorted map would be tree-based and require its keys to be orderable but not hashable. For instance Rust's HashMap has its key bound on Hash + Eq, while BTreeMap's is bound on Ord.

They're different data structures, with different use cases and different requirements.

> There's no ordering over _most_ hashable values since they span multiple types, so insertion ordering is the only sane way to do it.

Python 2 actually had total ordering of all values. The result was usually stupid but it was there.


>The result was usually stupid but it was there

This is what I meant by "sane" in my comment. It's _way_ more meaningful to the user if data's insertion-ordered.


Well, keys of sorted dictionaries don’t need to be hashable but do need an ordering, whether intrinsic or user-provided. A sorted map (usually implemented in terms of a binary search tree) is simply a separate data type with different requirements than a hash table based one.


> I thought that key order was meant.

Surely that would be a sorted dict not an ordered dict.


It's not ambiguous, because it refers to the long pre-existing OrderedDict class.


Yes, I see that, but the title is unclear.


Where "now" dates back to 2018? https://docs.python.org/3/whatsnew/3.7.html


Came to say the same.. insertion order of dict has been here for years for 90+% of Python users (CPython implementation) and a part of the language spec also for almost as many years.

I guess it’s good to spread awareness to HN readers who apparently were unaware, but the headline is very misleading.


90% seems optimistic. I don't think that many Python projects are running the most recent version in production. From personal experience, upgrading to 3.7 was especially a pain because of packages that used "async" for variable/kwarg names.


That’s a very good point, in that case even 50% would probably be optimistic.

We can probably attribute this article’s novelty to the slow adoption of Python 3. In that way it’s probably a good thing that it’s such a popular topic, even if it is old news.


Even further (2016) if you count the fact they were ordered as an implementation detail in CPython 3.6.


I still see dict ordering as an implementation detail; not a technical one but a descriptive one. If you want to rely on insertion order, use collections.OrderedDict. It communicates your intention far better, and there should be no overhead.


OrderedDict is less efficient. It's best to replace usage with basic dict where possible to improve efficiency. There is one obscure feature of OrderedDict that isn't in the basic, so it can't always be swapped.


Really? I would have figured they'd just do something like:

    class OrderedDict(dict):
        pass
(Plus a little bit of API shimming.)


No, OrderedDict has it's own C implementation which was created just before it was decided that dict would preserve order across iteration.

Further there is a big difference, regular dict preserves order across iteration but OrderedDict treats order up to equality.

I.e. this returns True:

    {1: 1, 2: 2} == {2: 2, 1: 1}
Where as this returns False:

    OrderedDict({1: 1, 2: 2}) == OrderedDict({2: 2, 1: 1})
To make that difference speedy it needs to be done on the C level.


Also ordereddicts provide methods to move items to the start or end, and remove items specifically at the start or end, not so for regular dicts.


dict has popitem for removing at the end. That used to be arbitrary, but now it (de facto) means last-inserted.


For the former, I realize that's how it's done now, but there's nothing forcing it to stay that way.

For the latter issue, as explained above, can't they just implement a replacement __eq__ only for OrderedDict, and still re-use the new dict implementation?

Similarly, any subtle difference can be shimmed on top of the new implementation inside Python, no?


Depends what OrderedDict was being used for. If used for some of the obscure ordering features, replacement with dict + a shim might be a slowdown.


The builtins have some optimizations from the compiler and interpreter that Python-defined classes don't benefit from. The dict type especially so, because it's used to implement so many things.


OrdetedDict = dict


Cool, but, this just encourages people to rely on hashmaps being ordered by insertion. The nature of a hashmap is not one where you rely on the ordering. This is going to trip up newbies who then move to other languages.

Some comments on the post don't seem to understand the changes, for example, the C++ comment saying that std::map is ordered, so this isn't a strange change. The difference is that std::map is a red-black tree implementation, which has a traversal order, not a hashmap with insertion order. So there is already confusion.


Brandon Rhodes gave a great talk at PyCon 2017 about the evolution of dictionaries in Python.

I really appreciate the people who dive deep into these implementation details, and continue to optimize the languages we all use.

https://www.youtube.com/watch?v=66P5FMkWoVU


I usually feel like a reasonably bright person, and then I go to a talk like this and suddenly remember how terribly much I don't (and will never) know. There are some very sharp people at play here.


This is a solid argument for why you should have more datatypes available to you. If your program relies on an insertion order, say so. If anyone ever moves to another implementation, they will thank you for it.


Ordered dictionaries already existed as a separate type. Much of the gain was in the use of dicts for core language features.

In particular, Python 3.6 guarantees that the order in which class attributes are defined is preserved, and that functions which take variadic keyword arguments receive them in the order in which they were passed. It permits other implementations to use types other than dict to accomplish that.

Dataclasses make use of ordering of dictionaries in a subtle way. If you write:

  class Foo:
      bar: int
      baz: str
Then Foo.__annotations__ is a dictionary {'bar': int, 'baz': str}. The @dataclass decorator transforms that into standard boilerplate, and it needs to know the order to do that.


And this is all the more reason using a type would make sense. Curious why things use ordered? Look at all the places that declare they need ordered. :)


You don't know in advance when ordering might be useful, though.

Class __annotations__ was new in 3.6. For its original use there's no need for ordering. They were ordered, because dicts were ordered, but only as an implementation detail.

Dataclasses were added to 3.7, after 3.6's features made a nice syntax possible. If __annotations__ hadn't happened to be ordered already then I would guess it wouldn't have been made ordered just for dataclasses - dataclasses just wouldn't have existed.

Making everything ordered in one go opens up possibilities you haven't even thought of yet.


That argument cuts both ways. You don't know when the benefits of random will be there...

As soon as you do something that cares about order, state how it is derived. Sometimes, insertion order is right. Sometimes, not.

Don't get me wrong alists are nice. And order def matters to those. But it is part of their definition. And reinsertion changes the order in obvious ways. Not even clear what it does to just "dict".

Now, I will concede this is overblown. Life will easily go on.


It's extra annoying because collections.OrderedDict has been in the standard library for ages.


I fail to see what's annoying how. OrderedDict provide features for manipulating ordering, but they are expensive and slower. dict initially became ordered as a consequence of implementation details changes, and that was considered useful enough (or likely enough to trigger compatibility issues with third party implementation) that the core team decided to standardise it rather than break it (one of the advantages of the new implementation is iteration performances, shuffling iteration would probably have slowed it back down more than the old implementation)


Annoying is that it is not obvious why the two exist, by their name. They are both ordered. One is orderable in that you can change is order. I would not guess that looking at this.

For a language that prides itself on being readable, that is an amusing quirk.


For some definition of "now". Python 3.7 was released in June 27, 2018. So "now" is "for the past year and a half". Python 3.6, which implemented the change originally, was released in December 23, 2016. By that measure "now" actually means "for the past 3 years".


The whole Python 3 affair made me realise that a lot of people upgrade their tools like once every 5-10 years. Which is shocking, in an industry as fast as this. I’m not saying you should churn your whole build setup every 6 months (I think release speed for Python is a bit excessive at the moment, for example), but complaining in 2020 for a feature that was released almost two years ago, was in development for a year or so before that, and was widely publicised... I mean, c’mon.


This is a refreshing update for those of us that like having code which will always act in the same way across multiple invocations.

Now, if only Lua could follow the same path with their “tables” (“tables” is what Lua programmers call their form of Python’s “dictionaries” and Perl’s “hashes”).

I just spent eight hours earlier this week debugging Lua code which would run differently on different invocations of the same code.

The standard way to iterate in a table with Lua is like this:

  for key, value in pairs(foo) do
One problem: The order we get elements from the table “foo” is undefined, and it can change between different invocations of the same Lua code, even if the elements were put in the table in the same order. In order to fix things so that we can iterate a table in a consistent manner, this is my fix (public domain [1], if those who want to copy and paste it):

  function sorted_table_keys(t)
    local a = {}
    local b = 1
    for k,_ in pairs(t) do -- pairs() use OK; will sort 
      a[b] = k
      b = b + 1
    end
    table.sort(a, function(y,z) return tostring(y) < tostring(z) end)
    return a
  end
Then we iterate the table like this:

  for _, key in ipairs(sorted_table_keys(foo)) do
    local value = foo[key]
(In Lua, two dashes indicates a comment.)

Note that this code will not always sort in the same order all tables. If we have a table with the keys 1 (as a number) and "1" (as a string), iteration order is still undefined.

The code is open source, and is a procedural (“random”) map generator for Doom written mainly by Andrew Apted which I have added some features and fixed some bugs with. It’s here: https://github.com/samboy/ObHack and the issue is here: https://github.com/samboy/ObHack/issues/4

[1] The project I added this code to is GPL, but this function, which I wrote entirely by myself, is one I am donating to the public domain.


You can write an iterator yourself, no need to leave ?pairs idiom:

  function sortpairs(t)
    local keys = { }
    for key in pairs(t) do
      table.insert(keys, key)
    end
    table.sort(keys, function (a, b)
      return tostring(a) < tostring(b)
    end)
    local i = 1
    return function (t)
      local key = keys[i]
      i = i + 1
      if key ~= nil then
        return key, t[key], i-1
      end
    end, t
  end

  t = {a=10, c=20, d=30, b=40, 50}
  for k,v,i in sortpairs(t) do
    print(k, v, i)
  end


What Lua is lacking here (and why the above iterator function needs 17 lines) is the ability to have “for” go through a list (without converting the list in to values returned by an iterator function), which would let us quickly and easily sort lists that “for” can use. Something like:

  d = {"foo": 2, "bar": 1, "zoo": 4}
  for k in sorted(d.keys()):
    print k
(I’m not advocating Python here, since Perl has a similar way of using “for” to go through lists which can also be easily sorted)

However, with Lua, “for” only accepts a numeric range, or an iterator function, so customizing “for” requires understanding function closures: Understanding how a function, when called multiple times, stores variables altered in previous invocations of the function, and understanding how to give those variables initial values (usually in the “function factory” function which creates the function we use).

In other words, “for”, in most modern high-level languages, can be one of:

1. for variable in [something that specifies a numeric range]

2. for variable in [iterator function]

3. for variable in [list]

But Lua only has “something that specifies a numeric range” and “iterator function”; it can not natively go through a list.


You can convert a list first and then feed it to a simple iterator. I don’t fully understand what your exact real-code issues can be, but hope this snippet may help:

  function vs(t)
    local i = 0
    return function (t)
      i = i + 1
      return t[i]
    end, t
  end

  function sorted(t, cmp)
    table.sort(t, cmp or function (a, b)
      return tostring(a) < tostring(b)
    end)
    return t
  end

  function keys(t)
    local keys = { }
    for key in pairs(t) do
      table.insert(keys, key)
    end
    return keys
  end

  t = {a=10, c=20, d=30, b=40, 50}
  for k in vs(sorted(keys(t))) do
    print(k, t[k])
  end
I.e. if “natively” means strictly “for in t” that generates values, then no, Lua can’t do that. But if “for in vs(t)” is okay, then that vs() is the solution.


That looks good, and I think putting these in a prominent place of the Lua documentation (along with a notice that the code is public domain) would help us who are used to the AWK/Perl/Python/PHP way of having “for” natively traverse a list without needing a complicated list-to-iterator function that uses function closure (i.e. the iterator function remembers the value “i” -- I’m writing this for the lurkers because code like this can be difficult to follow).

One honest question: Is there any reason why the function factory (i.e. a function which returns a function) which converts a list (Actually, table with ascending integer indexes) in to an iterator Lua can use with “for” returns both the element and the entire table here? Here is the code I am asking about:

    return function (t)
      i = i + 1
      return t[i]
    end, t
I’m curious why we’re returning both the table element for the iterator and the entire table.


1) I am personally annoyed as someone who has had to unwind knotty dict soup python codebases.

2) This will make life easier for thousands of programmers and prevent a massive number of extremely difficult bugs from hurting users.


It seems people dont realize that you can have your cake and eat it too. You can have both orderered and unordered maps in the same language:

https://yaml.org/type/map

https://yaml.org/type/omap

No reason to argue about which one to make "dict". In fact it would be better to have both because youre taking a performance hit (a significant one) by ordering the entries.


Python has had both ordered and unordered maps in the same languages for 10 years or so (OrderedDict was added to the stdlib in 2009).

In 2016, the standard dict was moved to a new and more efficient implementation (more compact & faster to iterate). It was naturally ordered, and so as a side-effect of this change and since the CPython devs did not explicitly scramble it, iteration on dicts became both deterministic and specific (it would always follow insertion order).

This was considered both a boon because conserving iteration order is strictly more useful than not doing it and a woe because it would cause compatibility issues with third-party implementations or further changes down the line as users would absolutely start relying on the iteration order even if it was still considered unspecified.

So in the next major release they decided to resolve the issue by making this behaviour part of the language. It restricts future design space, but they can always add an unordered dict as part of the stdlib if that ever truly becomes useful so meh.


The new dict does not come with a performance hit, let alone a significant one. It’s much more memory efficient and generally slightly faster.


pretty strong evidence to the contrary

https://apps.dtic.mil/dtic/tr/fulltext/u2/a627127.pdf

do you have any references to back your claim that its actually faster?


https://morepypy.blogspot.com/2015/01/faster-more-memory-eff...

The new python dict implementation is benchmarked as faster in both microbenchmarks and in practice for almost all real-world workloads.

Note that this isn't a sorted map, like C++'s "Ordered Map", but an Ordered map. C++ get's the name wrong. Items aren't ordered by key comparison, but ordered by insertion time.


Good link, thanks


Not sure why this is posted and upvoted to front page now. After all this is a major bullet point in py37 What’s New, and even py38 has been out for a while.

Anyway, I’ll keep using collections.OrderedDict (except for personal scripts) until py35 EOL.


Probably because it was posted to lobste.rs and got good traction there:

https://lobste.rs/s/htcz5f


I think this change is great, but this really only becomes news again once all major LTS are shipping with at least Python 3.7, right? No one can really use it in code they plan to distribute at the moment. Maybe I am underestimating the amount of Python code that is meant for internal or personal use only.


We've been running FROM python:3.7 in production for months. Works just fine. ;)


Many apps bundle Python along with their app, either within a container image or as a complete portable copy distributed with the binary. We've been building FROM python:3.8 for a few months.


3.6 is ordered as well. All other versions are about to be EOL soon.


This 37 minute Pycon video covers the essential details of how Python dictionaries are implemented: https://www.youtube.com/watch?v=npw4s1QTmPg&t=3s


Wait, if it is held in a separate dense array, is removal of a key from a dictionary O(N)?


You don't actually remove the entry, you just mark it as deleted.

Eventually if too many things are deleted you repack the array. Still amortized O(1). (No different than a hash table in general, which will need to recopy the underlying array when it grows.)


Brilliant: Now your map contains an allocator and a garbage collector.


sure, but as mentioned in the parenthetical, that was already true even for a non-ordered hash table


I _think_ this may be the implementation: https://github.com/python/cpython/blob/60ac6ed5579f6666130fc...

There are copious notes in there about the implementation and it indicates linked lists are used.

EDIT: Blargh, it's here: https://github.com/python/cpython/blob/60ac6ed5579f6666130fc... . That file references this blog post which indicates they may be flagging and repacking as dilap commented: https://morepypy.blogspot.com/2015/01/faster-more-memory-eff...


You can use memcpy to remove items from the middle of a memory segment. It's probably not a single memory segment either; I imagine it works more like a Golang slice.


Great, you recovered the memory. But now you have to update all the pointers to the array in the hashmap....


Memcpy is certainly O(N). But yeah, if they implement it with multiple slices (e.g. a B tree) it could be fast.


Hrm, I'm pretty sure I meant memmove: https://golang.org/src/runtime/slice.go

EDIT: I could have sworn they were shifting entire pages around without using a cycle per byte but I can't find any reference to that now haha.


Java has many types of Map, Set, and List implementations. And that's just in the core library. Here are some Map implementations that come with Java: TreeMap, a HashMap, A LinkedHashMap, ConcurrentHashMap, EnumMap. There are many more and they each have their own uses, features, and performance/memory characteristics.

I always wondered why languages like ruby, javascript, and python don't include a bit more choice on this front. It seems like sets are still somewhat of a novelty for javascript developers and people just wing it with half-assed solutions involving stupid O(N) contains operations.


The historical background for those interested…

The ordered dict for Python was originally an implementation detail, part of a more compact dict representation that was first proposed for Python in 2012 [1], implemented for PyPy in 2015 [2], and merged into CPython 3.6 in 2016 [3] [4].

[1]: https://mail.python.org/pipermail/python-dev/2012-December/1...

[2]: https://morepypy.blogspot.com/2015/01/faster-more-memory-eff...

[3]: https://mail.python.org/pipermail/python-dev/2016-September/...

[4]: https://news.ycombinator.com/item?id=12460936

The decision to add it officially to the language spec wasn’t made until Python 3.7 in 2017. [5]

[5]: https://mail.python.org/pipermail/python-dev/2017-December/1... "Guido says ‘Make it so.’"


It sure is annoying seeing people complain about a well-received feature implemented 3 years ago that has already transitioned smoothly.


question: does the following expression still evaluate to True with ordered dicts?

    {'a':1, 'b':2}  ==  {'b':2, 'a':1}


nevermind, i thought this was a future change, but it's already live. i've been using 3.7.4, and can confirm that two dictionaries with equivalent key,value pairs, even if inserted in a different order, are considered abstractly equal.

this coincidentally came up for me this morning on a small interview challenge: https://dev.to/candidateplanet/comment/l8o2


Yes. Dict comparison doesn't rely on iteration order.


thanks :-)

i mean, i would hope not if it's an implementation detail and not a change in the abstraction itself... feels a bit like it is breaking the abstraction, though.


The definition of a dict hasn't changed. It's still defined as a set of key:value pairs. It just happens to also have a particular guarantee now that iterating over the keys will return them in insertion order.


Funny, I've never expected or needed dict to be ordered. I've known about collections.OrderedDict but never really found much use for it. A dictionary is a hashtable and the order generally isn't important for typical uses of a hashtable. So, what is it that people are putting in dicts where the order of insertion actually matters?


Pretty important to note that it's "insertion order". I immediately thought, ordered by what comparator function?


A great talk behind the redesign which achieved this can be found here:

BayPIGgies at LinkedIn June 2017: "Techniques for Design Reviews" by Raymond Hettinger https://www.youtube.com/watch?v=cNqJDRsefg8

There's a lengthy lead in, but its worth listening to.


I've always thought this was a bit annoying in Python, good to know a (very) small itch is now scratched.


It's been this way since python3.6 in 2016, so you really only have to worry about very old versions



Unfortunately ordered dict doesn't provide index access. In many cases, it's still required to maintain parallel list + dict approach, even if dictionary is ordered. List for index access and dict for fast look ups and containing the data.


I'd love to know more about where an ordered dict comes in handy. Anyone have use cases? Otherwise, guaranteeing this behavior just seems ‾\_(ツ)_/‾

If this is useful, I wonder if an ordered set is useful.


> If this is useful, I wonder if an ordered set is useful.

An ordered set is a unique list. It's useful. However it's not present in Python, dicts and sets have separate implementations and sets were not moved over to the ordered implementation (because the ordering was initially a side-effect of a change in implementation which was not considered useful or advantageous for sets).


An ordered multiset is a list, not an ordered set.


Imagine a trie implemented as a tree of dictionaries (ignore thinking about whether a tree of arrays may actually be better for now). Let’s say you want to implement an autocomplete algorithm based on a dictionary of words from a corpus. The autocomplete algorithm is naive: if a user types in “abc” you recommend whichever word starting in “abc” has the most occurrences in the corpus.

With ordered dicts you can use the trie to compute your autocomplete very quickly with no additional data structures. There are lots of other trie operations which are more efficient with ordered dicts too. I’m sure someone can come up with less complicated examples but this was the first one I thought of


This isn't a sorted dictionary, it's an insertion order dictionary.


Well, you can convert any sorted dictionary to an insertion order dictionary by copying it over, but that does the make the original example pretty clunky.

I guess a better use case is if you needed some sort of priority/FIFO logic along with indexing. Hard to think of something that doesn’t feel contrived, but I guess imagine having a queue where determining membership or updating fields of an element based on ID can be done in O(1) for any element?


I use OrderedDict in code that generates JSON records so the key orders are preserved to help human readability. LRU caches are another use case.


I've used TreeMap in Java many times. I don't recall exactly why I used it, I think at least some of the cases were simply that it was convenient to be able to store an ordered list of two different types without having to go through the trouble of making a full blown class that holds those types.


But how does ordered map(self-binary search trees) handle hash-only key? I mean you can compare lexicographical order with character strings or numbers, but not a class instance right?


It'd be nice if using collections.OrderedDict could elicit a warning. Is there a tool like flake8/autoflake for flagging things that used to be Pythonic but no longer are?


How does one have insertion order and maintain O(1) deletions?


From TFA "... it brings Python on par with PHP"

Faint praise indeed


I had firsthand experience with this otherwise useful feature masking an obvious bug. In one codebase there was function along the lines of (for calling into MSSQL with it's "peculiar" stored procedures):

    def call_stored_procedure(name, **args):
        cursor.call_proc(name, *args.values())
Of course this is obviously wrong and on Python 3.6 this will break horribly on first use, while in Python 3.7 this worked as long as the keyword arguments were in the correct order.


On 3.6 it will work, it's just relying on an implementation detail (dicts are ordered in 3.6, it's just that it is considered an implementation detail, not a guarantee)


A bit of a side topic, but when Judge Alsup (himself a descendent of Boole and with a middle name of Haskell, and a programmer to boot) said "It is so ordered" (https://casetext.com/case/oracle-america-inc-v-google-8) in the Oracle Google case, I could only think that's what a computer scientist would say when quicksort was done.


Must say I am glad about that. It confused me when I first discovered this and was told it was not a bug


I've written a lot of Python, but more Java. This is where I have a gripe with "batteries included." In Java, I'd have to think slightly about this, then use a LinkedHashMap. It's been in Java since *2002. It also has a Set flavor. Python just doesn't have as rich of a collection of included data structures, and the APIs are more limited.


What java calls linkedhashmap, Python calls ordereddict.

It's not quite as old as java's linkedhashmap but is no spring chicken either (it's a bit above 10 years old).


Where's orderedset, though?


Nowhere because so far the core team has remained unconvinced. People have brought up replacing standard set implementation by a variant of dict's (even expressing surprise that that wasn't already the case) but no dice yet.

There are discussions on the subject on python-dev once in a while e.g. https://mail.python.org/pipermail/python-dev/2019-February/1...


You call that a failing, and others call that a feature. I love the fact that I haven't had to think about ridiculous nit-pick level data-structure trade offs. In python you have a very basic set of data structures, and short of doing something ridiculously novel or exotic, that's all you need. It's part of the beauty of the language.


OrderedDict has existed in python since 2.7, released in 2009.


I agree, and I think it's troublesome that dict now means something more specialized than it did before and the language doesn't communicate this. The dict is dead, long live the dict.

I suspect it's the result of my Java goggles, because the distaste comes from a sense that this is analagous to if the Map interface suddenly always meant LinkedHashMap.


> The dict is dead, long live the dict.

That’s like saying binary search tree is dead because the implementation uses a red-black tree.


Title should say what "ordered" means. Is it by key, or by insertion-order?


This has been true since 3.7 why is this such a seemingly major revelation on HN?


So isn't that a backward-incompatible change for versions prior to 3.7?


Ruby introduced similar change in 1.9 back in 2007, and we're fine.


Unpopular opinion: I'd rather have classes that have precise meanings such as in Java. ArrayList, HashMap, TreeMap.

Having classes with ambiguous names makes it seem like a toy language.


You ordered snake dicks?!


Javascript taught me to never assume any ordering in a dict/map/hashmap (whatever the term)


Well, JavaScript Map preserves insertion order when iterating entries, and as of ES2015 (i.e. ES6), for-in also iterates the (string, non-numeric) properties of an object in insertion order.


That's fine, as long as you adjust your assumptions when working with other languages.


Holding as little assumptions as possible works as a general rule no matter which language you're using.


It's a safe default, if maybe not a performant one.


Is the ordering there at least fixed, or can it vary from run to run even on the same data, or between implementations or versions of the same implementation?

EDIT: to clarify, I was asking about JavaScript there.


It is deterministic, but it's an odd rule-set. For example: `var o = { 3: 'foo', 1: 'bar', b: 'baz', a: 'quux' }` Will yield: `{1: "bar", 3: "foo", b: "baz", a: "quux"}`.

Numeric keys get sorted. String keys are insertion-order. If key order is a priority, a Map should be used instead. Often when JavaScript's objects are used and a particular sort order is required, an accompanying (sorted) array is used.


(deleted--missed context)


Go do `var o = { 3: 'abc' }` and then `o[1] = 'def';` Now tell me the output of `o`. (Doing this in your console should suffice). Is that insertion order?


If the ordering is determined by a hash function and it's always the same it is likely a defect in the implementation, because keys can be easily chosen to cause massive performance degredation; essentially a DoS attack. That's why many dynamic languages these days use things like siphash with a random key.


I'll argue that having the order depend on the hash function, rather than insertion order, is the problem. It's bad to have unnecessary nondeterminism in software. It makes testing and reproducing bugs unnecessarily difficult.


These collections are often default defined as unordered collections; it’s just that people discovered over time that some were iterable even if the language docs said otherwise.


I don't think ordering and iteration are related, are they? Whether something is sortable has no impact on whether it is enumerable.


What would it even mean for a container to be ordered if you can't iterate through it?


How I think about it is if something is unordered then the iteration strategy is a question.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: