Oh dear. I love F# and would really want to motivate myself to Learn me some Has...

conistonwater · on Oct 15, 2014

There is a simple way to implement a dense computational kernel in haskell: write it in C++ and call it using the foreign function interface.

Seriously, this is not (as I understand it) a problem that Haskell is designed to solve. Haskell is expressive, functional, lazy, etc., and that is all great, but if you care about actual instructions that get executed on the cpu, haskell doesn't care, and that's supposed to be like that. For me, writing performant haskell always descends into a game of figuring out why the compiler didn't unbox something or other, or why it didn't understand that a certain variable can be made strict.

My advice would be: if performance is really the main thing you need, study c++ and learn how to write clean code in c++. [0] And in the end, you can always call c++ functions from haskell.

If you really need to use haskell, then just learn in detail how the runtime and the compiler work, so that their output is predictable.

The chains of c++ is what makes sure that c++ code can be mapped almost directly to machine-level instructions.

[0] Maybe you already know c++ better than I do, in which case ignore me.

fsloth · on Oct 15, 2014

Thanks. This was what I was expecting.

Although, I was hoping FFI was not the answer http://rwmj.wordpress.com/2011/09/21/which-foreign-function-...

Richard Jones is fairly erudite in the art and industry (one of the authors of http://gchandbook.org/ and quite proficient around various programming languages in general) and if he says Haskell FFI is painful to work with I get a really bad feeling.

I was hoping the answer would have been "There is this stream computation library that is really neat and just works...".

tome · on Oct 15, 2014

That's strange to read. The Haskell community believes GHC's FFI is very good. Perhaps things have changed in the last three years. (I've never used the FFI so cannot respond directly.)

conistonwater · on Oct 15, 2014

> There is this stream computation library

Sometimes Haskell allows you to easily implement algorithms that would be, comparatively, much more painful to write in c++. There are stream fusion libraries, and it's important to know them. You can do a lot with plain Haskell before you need to start worrying about CPU instructions. Things like linear algebra, matrix multiplication kernels are exceptions.

> Haskell FFI is painful to work

That link is three years old, and I think I disagree with some of things he says. I agree that haskell ffi is poorly documented. But he also says it is "deeply strange". Without using any extra tools, I would say it is a little laborious, but conceptually very simple. He also says he couldn't figure out how to return anything other than integers.

Think about it this way. Haskell makes no mention of how its objects are laid out in memory. If you pass a native Haskell object (a boxed int, or a struct) to C, C code will not ever be able to unwrap it without making guesses about memory layout. What Haskell expects you to do is to do this yourself: if a structure consists of an integer and a double, you define (yourself) a function that takes a pointer, and lays out the integer and the double in the memory as your C code would expect to find them there.

Then every FFI call takes the form

1. Allocate a chunk of memory.

2. Store your Haskell object into that memory, using your layout. This is what the `poke` function does.

3. Pass the pointer to a C function.

4. C function reads and writes to that chunk of memory.

5. When C function returns, read from the memory and pack the results into a Haskell object. This is the `peek` function.

5a. Quite often a C function would return a success code anyway, and the actual return value will be written to a pointer passed to the C function. This is really common.

6. Deallocate the chunk of memory.

6a. Notice that no assumptions were made about internal object layout, except that you have to know what layout the corresponding C structure is expected to have.

This might also be the reason why it's unintuitive how to return complicated objects from C. The complicated objects need to be explicitly unpacked and stored into a Haskell object. There are tools that automate this process, but the tool-less FFI is straightforward once you figure it out.

Here is the layout GHC actually uses: https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage...

This is maybe a bit unintuitive, but it makes sense to me.

fsloth · on Oct 15, 2014

Thanks for your overview. I had not done yet a deep analysis of the FFI myself, your summary clarified things.

Ericson2314 · on Oct 15, 2014

http://benchmarksgame.alioth.debian.org/ has some pretty performant Haskell, though it uses some real dirty tricks.

It's generally more straight-forward to optimize CPU-bound Haskell because there are less external moving parts to wrestle with (i.e. OS's graphics / networking stack).

Any code that doesn't allocate ought to be as fast as C with clang as it's pretty much up to LLVM to optimize those inner-loops. GHC's deforestation is the big killer feature to make number-crunching code more elegant.

Once one goes the way of fancy tree structures and all that, well one is writing very different code than they would in C, so expect different performance tradeoffs. GHC should still do a very good job, however, and Haskells various synchronization primitives are quite good too, if your domain requires that.

fsloth · on Oct 15, 2014

Thanks... but the alioth benchmarks do not really measure production readiness, although they are very educational.

To me it seems that for writing high performance dense numerical code the tradeoff still is: either write idiomatic code in a horrible language, or write horrible kludges in an beautiful language. Sigh.

Would there be any "Scientific Haskell" resources available that would straightforwardly explain how to write performant code that wrangles data from one data sequence to another?

codygman · on Oct 15, 2014

Have you checked out ATS[0] and the ATS book[1]?

0: http://www.ats-lang.org/

1: Sorry, I can't find the link at the moment. I'll try to update this later.

fsloth · on Oct 15, 2014

Is this the book you are referring to? http://www.ats-lang.org/DOCUMENT/INT2PROGINATS/HTML/book1.ht...

Thanks, interesting. Hopefully it will not become yet-one-more dead and forgotten ML dialect.

yawaramin · on Oct 15, 2014

Perhaps this will interest you: https://www.youtube.com/watch?v=McFNkLPTOSY

fsloth · on Oct 16, 2014

Thanks for the reference! Seems to be precisely the sort of material I was looking for.

jimbokun · on Oct 15, 2014

Isn't Rust pretty much designed exactly to fulfill what you're hoping to achieve?

Functional programming language features, with predictable performance.

fsloth · on Oct 15, 2014

The design intent is that but the design effort itself seems to be ongoing along with the implementation. So I would say its design once complete is probably aligned with the constraints I implied. But it's not ready yet...

carterschonwald · on Oct 16, 2014

I write performance sensitive haskell all the time.

In fact my main focus the past 2 years has been designing better array computation tools (for dense and sparse matrix computation). (some of which i'm in the process of finally open sourcing)

I've tools and tricks that would drive humans mad before they could figure out how to port the full awesomeness of what i've done to C++. The main place where I'll use C in my haskell is for writing SIMD kernels or wrapping up some unrolled ASM from openblas/blis like this monstrosity https://github.com/xianyi/OpenBLAS/blob/develop/kernel/x86_6...

for throughput cpu/ram throughput bounded workloads that aren't allocation heavy, thats pretty much the only time I'll break out C, for wrapping up those crazy kernels that are tuned to a specific cpu microarchitecture.

I also know quite a few people who use haskell as a meta language for some domain specific EDSL they compile via using LLVM as a library, I know of people who've proprietary tool chains using modern GHC haskell for really really performance sensitive workloads in this fashion in high frequency trading, computer vision, and scientific computing. (and I've the fortune to consider them my friends)

btw, over on reddit, the amazing austin seipp (who does a LOT of work on GHC) has a super articulate post about performance engineering in haskell http://www.reddit.com/r/haskell/comments/2jbl78/from_60_fram...

Anyways, at the end of the day, engineering is about building software that works in finite time. The performance bits will only bit in your inner most loop, and the ffi overhead in haskell for c that takes < 10µs should be about 2-5ns (when an ffi call takes less than ~ 10µs, its safe to use the "unsafe" ffi, for operations that may take > 10µs, you should always use the default/"safe" ffi convention or someone will come knocking at your door late at night quite angry).

I should also disclose I'm slowly working out a few crazy extensions to ghc for low level performance engineering, though those might not make it into GHC till 7.12 at the current rate things are going in my life :)

the one area where GHC doesn't shine is in mega core allocation heavy workloads, but i've yet to see anyone do well by default in that regime in any language! :)

EDIT: also, profile before doing performance engineering. And before that choose the right algorithms! :)

EDIT: to further elaborate on some of the tech i've got, I've a way of doing array computation that gives me a very nice blend of guaranteed good memory locality + extensibility that i've not seen in any other array computation tooling i've been able to lay my hands on, at least on this planet :)

fsloth · on Oct 16, 2014

Thank you, your post was most encouraging! I think I'll have to presume that the adept Haskell devs are mostly developing and not popularizing that much of their advancements.

carterschonwald · on Oct 16, 2014

well, they do whatever they have to do to pay the bills! I think a lot of these tool chains, or fragments thereof, are going to be open sourced eventually, but all engineering (open or not) is being paid for by someone (at least indirectly). So a lot of these tools are only going to be make that move once the authors can "pay" for the time to do so. ('cause life is more than just software i'm told!)

You can talk with a lot of people doing numerical computingy things in haskell on the #numerical-haskell channel on freenode.