Hacker News new | past | comments | ask | show | jobs | submit login
Very fast, scalable mutable maps and hashes for Haskell (donsbot.wordpress.com)
31 points by dons on Sept 27, 2009 | hide | past | favorite | 27 comments



I'm a lazy bastard and didn't read the complete article but isn't mutability pretty much of a "NO!" in Haskell? I mean, it was Haskell folks who even wound up implementing monads just to emulate mutable state without actually having to be mutable.

Or are they finally going to look for the sweet spot in the middle of two extremes on the im/mutability continuum...


Well, at a conceptual level, mutability doesn't really mean what it sounds like in Haskell. Anything that looks like it has mutable state is (roughly speaking) syntactic sugar for sequencing operations via function composition with a guarantee that the pre-"mutated" version of the data will no longer be used.

Of course, with a guarantee that only one copy will be in use at a time, you're free to implement the data as a mutable structure, as a practical optimization.


It sounds like you are describing linear types or uniqueness types, not Haskell's monad solution to mutable state. (Haskell supports neither linear nor uniqueness types.) The aerial view of Haskell mutable state is that every function that wants to "really" mutate state secretly takes a "world" parameter and returns a new "world" parameter. Mutable variables are just references to some named "thing" in this world value. It appears to mutate because the thing to which it refers also appears to mutate because the world is being swapped out from underneath it. There is of course only one world value (RAM) and this is really and truly being mutated. Only at the most abstract level can this be considered immutable, but there are still theoretical advantages to operating at this level, e.g., there is an implementation of the IO monad (IOSpec I think?) that truly does keep a copy of the world around, which is useful for testing and such.

The main difference is that Haskell maintains uniqueness of only one secret paramter, the world, while linear or uniqueness types can do it on a per-value basis (which sounds to me like what you described.) The bad news is that this is much less interesting from a theoretical point of view---within the monad side effects are still unconstrained---but from a practical point of view it means imperative programming in haskell is pretty accessible, funny vocabulary notwithstanding.


Yes, you're correct, that did sound more like linear types. I was trying for a simple description of why mutability can be regarded as an optimization/implementation detail given certain guarantees.

That said, Haskell does allow more than just "the state of the world"--for instance, the State monad can contain whatever you like and (if memory serves me) is a strict superset of the IO monad.


Mutability and immutability aren't really on a continuum. They represent different semantics. Haskell embeds mutability semantics within immutability semantics by using monads. So mutability is "100% available" in Haskell, you just have to tag it as mutability.


Sort of true; monads like Reader/Writer/State can make pure computations appear impure.

Making impure computations appear pure is another issue, to which monads are orthogonal. (It boils down to making sure the program can only ever see the "current" state; then it doesn't matter whether something is actually mutable. I don't have a good bite-sized explanation, but I will try to find some relevant papers for you.)


All programming languages must have a side-effect somewhere, or else you would never be able to see it.

I believe this work was spurred by my comments on Haskell here:

http://www.codexon.com/posts/why-arent-functional-languages-...

Within 10 minutes of publishing the article, there was a flood of angry Haskell supporters ready to decry every single point I had. Needless to say I still stand by what I wrote. And I take this as further evidence that I had struck a chord.


Trolls, too, "strike a chord". That doesn't mean they're right. Just saying.

I just read your post and the comments. I think Don (who, coincidentally, posted this article as well) did a pretty thorough job of dissecting your arguments. What, exactly, is your purpose in reiterating them?


He absolutely did not dissect my arguments. His responses were canned responses or rather irrelevant.

1. Haskell does not have good hash tables

His response: Just use HSJudy. He forgets to mention that it isn't even a hash table and its even LGPL. An amusing fact is that the default hash table implementations are so slow that someone had to implement their own from scratch on the GPLS benchmark

http://shootout.alioth.debian.org/u32q/benchmark.php?test=kn...

2. Haskell's lazy evaluation has bad consequences for SMP programming.

His response: First he tries to argue against it. Then when I convince him that I am not just making things up, he finally backtracks and says that's why he made a library to cover up the shortcomings.

3. Haskell's default static bundling of LGPL GMP.

I didn't specify the word "distribution", but I was talking in part about commercial usage where this is a reasonable assumption. He decided to harp on this single specific word and declared the whole point to be invalid.


These are all problems (to someone), but these aren't the reasons why people don't use Haskell. You are projecting your minor complaints onto the world as a whole, and that simply doesn't make for a good blog post.

1. I have written a bit of Haskell software. I have never suffered any performance problems by using a tree instead of a hash table. O(1) is nice, but O(log n) is very small for very large ns. "You are not Google", but even Google does fine with trees. If you are actually experiencing a performance problem, try HsJudy. If this doesn't solve your real-world performance problem, then you can complain. Right now you sound like you just read the Wikipedia page for "hash table" but have never had a dataset bigger than 10 elements.

2. Lazy evaluation does have consequences, sure... but so does strict evaluation. You are trading one set of problems for another set of problems. What's nice about Haskell is that you can use strict evaluation when you think it's required.

3. LGPL is not a problem for me.

Any software I would distribute to outside users would be GPL'd anyway. Any software that is worth money to keep secret is not distributed anyway. (Not everyone is like this, of course, but like I said... this isn't a problem for me.)

Anyway, you should just say you don't like Haskell if you don't like Haskell. Instead you are trying to say that it is objectively bad, which requires more proof than your blog post provided.


1. Using the "You are not Google" argument is a blatant disregard for the facts. If you've done the math, you would see that using tries and ternary trees often lose performance at 60,000+ elements, and this is even ignoring cache invalidation that often occurs in jumping through nodes. I'm sorry, but you are the one who sounds like you've never had a dataset bigger than 10 elements.

And speaking of Google, have you ever heard of this?

http://code.google.com/p/google-sparsehash/

2. The point is that lazy evaluation has More problems than strict. Not less. If you read the comments you will see that Dons actually agreed with me after arguing the contrary. Even Simon Peyton Jones believes that "the next Haskell should be strict".

3. LGPL is a problem for many companies. If you don't really care about it, good for you. You are in the minority.

In conclusion, I thought that Hacker News would be immune from Haskell group think, but I was unfortunately wrong.


In conclusion, I thought that Hacker News would be immune from Haskell group think, but I was unfortunately wrong.

I'm not Hacker News, I'm one poster.


And that's why I have multiple downvotes and everyone else arguing against me has multiple upvotes...


> An amusing fact is that the default hash table implementations are so slow that someone had to implement their own from scratch on the GPLS benchmark

Is that a fact!

You mean they didn't just translate the hash table from Clean, that I translated from the simple_hash.h that Doug Bagley provided for ye olde The Great Computer Language Shootout back at the end of the millenium?

Is your fact also why the Pascal program includes a Pascal translation of simple_hash.h ?


> Is that a fact!

Yep it is a fact. Did you disprove it? No. The benchmark is right there in front of you showing that Haskell is taking 10x longer than C++.

> Is your fact also why the Pascal program includes a Pascal translation of simple_hash.h ?

What is your point? Haskell supporters often cite its speed and brevity in GPLS. Instead of using the bundled hash table, the fastest Haskell program needed to implement their own, losing out in brevity. If this is not indicative of a problem in GHC, then what is?


> Yep it is a fact. Did you disprove it? No. ... What is your point?

Listen and you might hear the point.

Initially, simple_hash.h was a basis for comparison across languages.

Initially, the obvious comparison was between Haskell and Clean, so it was obvious to translate the hash table from the Clean program.

Your speculation about why the Haskell program includes a hash table is wrong.


Actually you are wrong. Next time be sure to read the entire article before you make uninformed assumptions.

You've obviously missed important information such as this link below and unsurprisingly made a Haskell fanboy comment without verifying the truth to your statements.

http://www.haskell.org/haskellwiki/Shootout/Knucleotide#On_m...

>This benchmark is completely bottlenecked by Data.Hashtable (in GHC 6.4.1) and this discussion is based on the assumption that I am using Hashtable optimally...The entry below runs 16 slower than the DMD entry on my powerbook G4. Profiling shows the bottleneck. I downloaded simple_hash.h and shamelessly optimized it to replace Data.Hashtable for exactly this benchmark code.


It's a common misconception that programming languages must deal with side-effects. A program can be a pure function if it is a single function that takes parameters when it is run and it returns a set of results when it exits. Interactive programs which take a sequence of inputs or generate a sequence of outputs as they are running have to deal with side-effects.


Is it another change to HN that handles don't show up sometimes? Any background for this change or is it a bug?


His name is 'by'.


Oh! I'm so trained to seeing the "X points by somebody" I didn't even think of that.


Giggle. It was my little joke that it used to say 'by by'. Maybe I'll have to think of a new username now the adjacent 'by' has gone. It's only an invented username. I know HN likes real names, but I didn't want to be completely public.


Needless to say I still stand by what I wrote. And I take this as further evidence that I had struck a chord.

No, you are just wrong. As dons told you, "There are reasons why Haskell is not popular. These aren't those."

(You didn't "strike a chord", it's more like you just pressed random keys on the piano. While some people consider dissonance music, most don't.)

All programming languages must have a side-effect somewhere, or else you would never be able to see it.

This is technically true, but not on the right level of abstraction. Our current computing machines work solely on mutating chunks of memory. This makes every program one big side effect.

But just because computers work that way doesn't mean that the application programmer needs to view them that way. The application programmer can imagine that the computer as an evaluator of pure functions, and program accordingly. The runtime will turn this idealized view of computation into something our current computers can run.

But that's the runtime's job, not yours. As far as you're concerned, there is no such thing as side effects. That is what everyone is referring to when they mention pure functional programming.


> No, you are just wrong. As dons told you, "There are reasons why Haskell is not popular. These aren't those."

I really like this Haskell group-think here. I am not the only one to have raised these points before, and those people were equally ostracized. The level of replies has been nothing more than just "you are wrong", and you wonder why Haskell still isn't more widely accepted.

And drawing an arbitrary line where purity can begin and end does not change the definition of side effects.


I don't know why I continue to engage you, but...

I really like this Haskell group-think here.

There is no group-think here. You made your argument. Nobody cares.

Where we stand now: you think Haskell is unusable because of the three points you mention. I disagree. That is all we have. You could be doing the community a huge favor by pointing these things out, and "we" are just to dumb to realize you're right. But probably not. These are just your pet annoyances that you feel like talking about.

(I have my own pet annoyances, especially with libraries that unnecessarily use IO when they could use ST or even hide the IO completely. Notice how I don't whine about this stuff, but rather simply try to avoid this in my own code.)

You don't like Haskell or OCaml, clearly. So don't use them! (But realize that nobody really cares.)

And drawing an arbitrary line where purity can begin and end does not change the definition of side effects.

The point is you write code as though there are no side effects. There are side effects, but they are hidden from the programmer by the runtime system and by the libraries in use.

Abstraction is the key to staying sane as a programmer.


There really is Haskell group-think here.

I commented on this story nearly a day late, and the only people that scour through these comments downvoting everything remotely anti-Haskell are the Haskell fanatics.

These are also not my pet annoyances. These problems have been pointed out in the past (which you would have known if you even bothered to read the article).

The fact that Haskell has done nothing about them for years is damning evidence.

> You don't like Haskell or OCaml, clearly. So don't use them!

This is where you and many Haskell fanatics are wrong. Just because I am criticizing Haskell doesn't mean I don't like Haskell. I want it to be better so people will stop using languages like Java with a last generation paradigm.

Instead of taking the criticism in stride, people like you attack the person, and even managing to criticize Simon Peyton Jones, one of the designers of Haskell, for saying something bad about the language.


Oi! Those bar graphs confused me for so very long. A better graphic would have been a scatterplot or time series chart and the x label should be something like "repetition no." or "trial no."




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: