Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Why do I always want to use an ordered dictionary?
7 points by noaharc on Feb 26, 2009 | hide | past | favorite | 15 comments
I routinely find myself yearning for this data structure in almost every project, but I've scoured the web, and inevitably I just find forum postings saying something along the lines of "dictionaries are unordered, idiot". I usually end up just making a list and dictionary, but it strikes me as inelegant. And, if it were truly a common problem, I can't believe someone hasn't made this data structure before. So I must be doing something wrong in the way I'm thinking about these problems. Any idea what it is?



Here's a library for it in Python: http://www.voidspace.org.uk/python/odict.html


This looks great. Thank you so much!


For posterity, here is a good discussion about the existence (or lack thereof) in Python: http://groups.google.com/group/comp.lang.python/browse_frm/t....


I usually want that when what I really need is ordering but for some reason I want key-based/random access to the values.

I think Ruby added this recently. I'm not using it yet. A crappy hack can just dump the next/prev key into the value, plus a pointer to the first, and drop them all in a hash. It's not really pretty, but would give you ordering if you want it. Another hack is to store just the keys in a sorted list and keep the values in the hash (but this is equally inelegant to the list+hash hack and it has the same maintenance overhead).

I think most of the ordered-hashes are implemented with something like list pointers between the hash keys, but then again I haven't looked too closely under that kimono.

Edit: trees are also a good compromise.


As for whether or not the data structure exists, it does. :) The std::map in C++ is a sorted dictionary, for instance.

But it can be uncommon to require this, usually for two reasons. One, the data set may be huge. And two, much of your code may not even "care" if the data is sorted. In either of those cases, if automatic sorting occurs, a lot of compute time may be wasted for no net benefit. (A solution is to have data sorted "on demand" within the code that actually cares about sort order.)


Often, an algorithm that requires iteration in a certain order is a sign that you are keeping information implicitly in the ordering instead of explicitly in a more appropriate data structure. You also may be trying to pack lots of unrelated functionality into a single object when multiple objects would serve better.

But it's hard to say without better examples, and there are times when you do want order and lookup. Can you give a couple situations where you want ordered dictionaries?


Right now I'm working on a large data set where there is a set of players and each player has a score. I'm using a Python dictionary from player ID to the player score, but I also want an easy way to grab the top N players from the set. The players' scores are updated continuously, so the quick mapping from a player ID to their updated score is also critical (if the scores were constant I'd just put it in a list, sort it, and be done with it).


Hmmm, does an ordered dictionary actually solve this problem? It seems like you want to retrieve based on player id, but get the top N based on score. An ordered dictionary would be ideal if you wanted to retrieve and get the top N based on the same key.

I guess you could use two data structures, with one of them being an ordered dictionary. But then the ordered dictionary doesn't actually have to support lookup; you could just use a heap.


Ruby 1.9 makes Hash ordered by default.

The major reason I care about it is that when loading data from yaml files the key order doesn't get messed up. In ruby 1.8 the order got messed up.

So yeah.. use ruby 1.9 :)


Java has TreeMap for that purpose.


Seems totally reasonable. The rise of Ruby shows that modern languages should optimize for programmer efficiency first and speed second.

Modern scripting languages have several data structures: mostly arrays, hashes and strings. But it is a hassle move data between them.

The next language should only have one, the ordered dictionary.

x = [], x << "hi" # 1=>"hi", x["seven"]=6 # 1=>"hi", "seven"=>6,

Etc


Very thought-provoking idea.

So, here I am designing the Next Big Language. And I'll take your suggestion.

Now, first issue: if I say x = "dog", then the value of x is really

1 => 'd', 2 => 'o', 3 => 'g'

Right? And despite the fact that this is an O.D., it should print in a string-like way: dog. Now suppose I do

x['z'] = (1 => 2, 2 => 3)

Now x is

1 => 'd', 2 => 'o', 3 => 'g', 'z' => (1 => 2, 2 => 3)

How should it print now?

Next issue: If I do y[1] = 'a' and y[3] = 'b', and then y << 'c', what is the value of y?

Final issue: Do the above questions really matter?

One could argue in the first case that if I want something to print nicely, then I will probably keep it in a "reasonable" form, and not mix data types in odd ways. So it really doesn't matter much how the O.D. I came up with prints.

Similarly, in the second case, if I want a collection to work well as a stack or queue, then I am probably not going to use the bracket operator to do strange things with it. So it doesn't matter which of the multiple reasonable options for the behavior of << is chosen.

Or maybe it does matter?


Hey, Someone took it seriously!

I would assume that strings and "ordinary" O.D. would differ in their object type: each would inherit from an abstract O.D. and each would have a different print/display function (you could have a "hidden" variable specifying this function if you wished to keep in all concrete).

If I do y[1] = 'a' and y[3] = 'b', and then y << 'c', what is the value of y?

probably y = { 1=>'a', 3=>'b', 4=>'c'}, though like any good OO/metaprogramming language, "<<" would be an overidable function.

"Do the above questions really matter?"

I am arguing for the next language and by that token, I haven't answered all the questions this raises. I think old languages like lisp and new languages like Ruby prove that new languages can unleash new possibilities, even more than fifty years after the original computer languages were created.

Thanks for your detailed reply, seriously...


Just to pose the counter argument, is it that hard to move data between hashes and lists? In reasonable languages it is at most a single function call.

Javascript has eliminated many differences between arrays and hashes (objects) but this leads to many bugs. For example, you have a third-party library that adds a .toString function on every object. But you also have code that iterates over the keys in an array, and now that iteration code hits a "toString" key instead of just the array indices you expect. Admittedly you could blame this on the object-hash identification rather than the array-hash identification....


1) As others have pointed out, creating an ordered hash out of an array and unordered hashed takes doing. Conversion between (ordered) arrays and unordered hashes doesn't solve this problem. 2) By my previous reasoning, I think identifying objects and hashes is also the-way-to-go, once you can figure out for to do it correctly. Javascript clearly has some problems. One possible solution is having different functions to iterate different types of keys, so objects/members could be added without being seen by data iterators.

Of course, I don't claim to have put all this is into the codified form it needs in order to form the basis of a language. I'll do that if and when I ever create a new language.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: