Haskell in the Datacentre

greenspot · on Dec 14, 2016

> At Facebook we run Haskell on thousands of servers

Wow, I didn't know this. Anyone knows for which programs they use Haskell? The article doesn't say anything.

_0uto · on Dec 14, 2016

https://code.facebook.com/posts/745068642270222/fighting-spa...

DigitalJack · on Dec 14, 2016

its for spam detection. Everything that gets posted on Facebook is processed by this code. There is a talk on it from a year or two ago at a functional programming conference. I'll see if I can find it and edit this post.

sdm · on Dec 14, 2016

Probably this one: https://www.youtube.com/watch?v=mlTO510zO78

noir_lord · on Dec 14, 2016

I remember this one https://m.youtube.com/watch?v=UMbc6iyH-xQ is that it?

DigitalJack · on Dec 14, 2016

No, but I apparently didn't bookmark it and I'm having trouble finding it again.

This might be it, don't have time to verify right now: http://cufp.org/2015/fighting-spam-with-haskell-at-facebook....

thedufer · on Dec 14, 2016

Simon Marlow gave a talk at icfp in 2014 or 2015 that might be what you're referring to

thedufer · on Dec 15, 2016

Sorry, couldn't look it up very well from my phone. I'm fairly certain you're referring to Simon Marlow's "Fighting Spam with Haskell" from ICFP 2015.

Video: https://www.youtube.com/watch?v=IOEGfnjh92o

Abstract: http://cufp.org/2015/fighting-spam-with-haskell-at-facebook....

rvdm · on Dec 14, 2016

Really great to read more about real world large scale Haskell use.

I'd be curious to find out if the success FB has had with Haskell for spam filtering means they might consider Haskell for different parts of their stack too? Does anyone have any insight on this?

bojo · on Dec 14, 2016

Happy to see they pushed the changes to the upstream GHC!

4ad · on Dec 14, 2016

> GHC’s runtime is based around an M:N threading model which is designed to map a large number (M) of lightweight Haskell threads onto a small number (N) of heavyweight OS threads. [...] To cut to the chase, we ended up increasing N to be the same as M (or close to it), and this bought us an extra 10-20% throughput per machine.

Ah, yes. As a Go developer, really wish Go moved to an 1:1 threading model.

crawshaw · on Dec 14, 2016

1:1 in Go would mean moving servers from blocking-style code to callbacks/futures.

(I believe the story is slightly different in Haskell.)

Nice sequential blocking-style code is my favorite thing about Go. I realize it's not C, and it makes runtime work much harder, but the payoff up in complex servers is completely worth it.

4ad · on Dec 14, 2016

> 1:1 in Go would mean moving servers from blocking-style code to callbacks/futures.

No, it would not mean such a thing at all. Where did you get this idea?

In fact, on some operating systems/linker combinations, gccgo uses a 1:1 threading model.

> Nice sequential blocking-style code is my favorite thing about Go.

Mine too.

crawshaw · on Dec 14, 2016

It would in practice.

A typical server under load has more outstanding requests to answer than it can OS threads. It is not uncommon in a Go server at Google to find a million or so goroutines.

If you want to have an OS thread that can also run C code, it needs a large enough stack to run C code. A million such stacks is not practical.

On an OS that lets you avoid creating stacks with threads like Linux, and which has quite light-weight kernel-side accounting for threads, it would be possible to give each goroutine a thread. You would still need an N:M pool for cgo.

As an added problem, 1:1 is slower for certain kinds of common channel operations. Anywhere you have an unbuffered channel and block waiting for a value, a 1:1 model requires a context switch to pass the value, whereas M:N means the userland scheduler can switch goroutines on the OS thread.

It is precisely these benchmarks that led Ian to implement M:N in gccgo. If there are combinations where it is 1:1, that is either a new decision he has made in the last year, or (more likely) OS/ARCH combinations that he hasn't moved to M:N yet.

I have seen a similar attempt at 1:1 lightweight tasks in C++ that ran into this. Without the ability to preemptively move tasks between OS threads, it ran into performance problems. Programs that needed the speed in that model had to switch to a futures-style of programming.

4ad · on Dec 14, 2016

> A typical server under load has more outstanding requests to answer than it can OS threads.

First of all, I object to the use of "can" here. There was a time where this was true, but, as I said, that time has passed. There is no problem running millions of kernel threads today.

That being said, there are advantages of not doing that. Right now Go uses a network polling mechanism like epoll to implement network I/O without consuming threads. This is totally orthogonal to the 1:1 vs. N:M discussion. Go can use 1:1 scheduling while still performing asynchronous I/O behind the scenes, just as it does today.

The nature of the mapping between g's and m's does not influence the other {g, g, ...} -> {g0{m}} mapping used to implement network I/O.

> It is not uncommon in a Go server at Google to find a million or so goroutines.

No problem with that, they can continue to do so.

> If you want to have an OS thread that can also run C code, it needs a large enough stack to run C code. A million such stacks is not practical.

C code needs its own stack, but this is again orthogonal to scheduling. Right now there is a N:1 mapping {g, g, ...} -> {g0} which makes it easy to switch to g0 stack. This will have to change to a N:M mapping {g, g, ...} -> {g0, g0, ...} and Go code would have to acquire C stacks, just as it acquires any other resource that it needs to use.

This is not expensive, in fact, C calls are now expensive precisely because there is a deep interaction between calling C code and the scheduler (that needs thousands of cycles). All this cycles will Go away. In a 1:1 model all you'd have to do is acquire a C stack, which is very fast and happens in userspace for the uncontented case (the common case), probably less than a hundred cycles.

> You would still need an N:M pool for cgo.

As explained above, you need a N:M pool, but this does not need to be integrated with the scheduler any more, making it much simpler to implement (and more performant).

> Anywhere you have an unbuffered channel and block waiting for a value, a 1:1 model requires a context switch to pass the value, whereas M:N means the userland scheduler can switch goroutines on the OS thread.

Only if you naively implement channels as mutex-based queues.

There is still a runtime underneath that can switch stacks on different threads. I want to move general scheduling decisions out of the Go runtime and into the kernel, but pointwise, stateless, swap-of-control type of things can still happen synchronously in the Go runtime.

crawshaw · on Dec 14, 2016

I do not believe the networking issue is orthogonal. I believe you will find that any communication between two OS threads is significantly slower than switching one thread between an active and a parked goroutine. Futexes are slow and spinlocks burn CPU. (This is exactly the case the C++ model I mentioned ran into, and why gccgo got an N:M model.)

But as I said I'm happy to be convinced. The easiest way to demonstrate it would be to move gccgo linux/amd64 back to 1:1 without hurting the channel benchmarks. You could use that as an argument that fanning out an epoll loop among threads can be made fast.

user5994461 · on Dec 14, 2016

> First of all, I object to the use of "can" here. There was a time where this was true, but, as I said, that time has passed. There is no problem running millions of kernel threads today.

I object to the use of "can", just because we can doesn't mean we should.

Last I checked, creating a thread (on a Windows) took 16 ms. Creating 1 million threads would take more than 4 hours. That's the hell of a startup time! =D

(And don't bother saying Linux is faster. It's not relevant unless it's by 4 orders of magnitude ;) ).

crawshaw · on Dec 14, 2016

I should add: if you believe these are surmountable problems, please take it out of the forum and write a proposal. Everyone who works on the Go runtime would love to see it radically simplified. We put up with the complexity only because we believe it necessary.

4ad · on Dec 14, 2016

> I should add: if you believe these are surmountable problems, please take it out of the forum and write a proposal.

Yes, I've been wanting to write a proposal for over 5 years now, if only there was more time in the world...

Coincidentally, I also have a method for implementing Go stacks (growing and shrinking) that does not involve copying stacks and avoids the hot-split problem.

Late edit:

In fact, this method would work particularly well for "growing" the stack to call C code in the 1:1 model proposed above.

It would make calling a C function, just another function call with some marker on it (like we have NOFRAME and NOSPLIT now). And when we switch to a more ABI-compatible calling convention (which we want to do anyway, irrespective of all this), then we would not even be a need for a C translator written in assembly any more.

Quite literally we would get cgo for free.

asb · on Dec 14, 2016

If you do find the time, count me in as being very interested in seeing the proposal for growing/shrinking Go stacks.

4ad · on Dec 14, 2016

Sure thing.

di4na · on Dec 15, 2016

Well then add the needed Preemptive move of tasks. Erlang does it and the force behind it was far less funded than most recent stuff like Go...

Ok i am a bit tongue in cheek, but still.

marcosdumay · on Dec 14, 2016

> (I believe the story is slightly different in Haskell.)

Haskell would lose more features, but also its "blocking-style" syntax (that is based on callbacks, but nobody ever thinks about it this way).

chowells · on Dec 15, 2016

No, Haskell can abstract across the callbacks/blocking choice and let you write straight-line code that works with either. I mean, that makes sense as the two are fundamentally the same thing at super-low levels anyway. Haskell just supports enough abstraction to allow you to do either with the same interface to higher-level code.

asb · on Dec 14, 2016

Interesting, could you say more about your desire for 1:1 threading in Go? Do you feel that there are few users who need a huge number of goroutines, or that 1:1 is sufficiently lightweight for many of those users?

4ad · on Dec 14, 2016

I routinely write programs that require a huge number of goroutines. Many programs in Go require a huge number of goroutines. Good thing modern kernels deal with large numbers of threads just fine. People misjudge how cheap kernel threads really are, because there was a time where they were expensive. That time was long ago. Today, threads are very light weight. For example, the solaris kernel schedules interrupts as threads, they are really that light weight.

There is a reason POSIX threading is implemented in the kernel on every operating system, even if they all started as N:M or 1:M implementations.

asb · on Dec 14, 2016

Related to this discussion, this talk on user-scheduled OS threads is really interesting. It's a shame no proposed implementation or experimental patches have appeared https://www.youtube.com/watch?v=KXuZi9aeGTw

imtringued · on Dec 15, 2016

Can I start a thread and have it run code on a different core within one microsecond?

paulddraper · on Dec 16, 2016

If you can, I hope you won't.

What could you be hoping to accomplish?

lmm · on Dec 14, 2016

I think the bigger question is why they're using a C++ Thrift server. Haskell is good at transforming data - they should be able to use an OS socket and do the message decoding/encoding in pure Haskell.

chongli · on Dec 14, 2016

the bigger question is why they're using a C++ Thrift server

Because it's already there and it already works just fine.

It's so frustrating to see this sort of comment on nearly every post about a language outside the mainstream. It's a religious argument and not a productive one.

lmm · on Dec 14, 2016

> Because it's already there and it already works just fine.

Well clearly it didn't work just fine if its threading model was interacting poorly with that of their Haskell code. If there's a mismatch between those two components then that will continue to bite them in the future, and it's legitimate to ask whether moving the boundary would be a better way to solve the problem.

mattnewton · on Dec 14, 2016

There is an old saying: "We did what we did, not because it was easy; But because we thought it was going to be easy"

When they realized the bug the path forward probably seemed like patching and not a Haskell re-write, and that's a reasonable mistake to make.

easytiger · on Dec 14, 2016

your right, makes more sense to have both in c++

massudaw · on Dec 14, 2016

That was the last tried solution, didn't work out well. And what is holding is nothing haskell specific. Is just wasting effort re implementing good broad used code at Facebook.

thinkpad20 · on Dec 14, 2016

I think the "why not write it all in Haskell" argument is a bit spurious. Only the engineers on the team knew best the advantages and disadvantages of the various approaches. Presumably there were very good reasons to integrate it the way that they did, whether driven by architecture, performance, knowledge, library, smallest-effective-change, or other reasons. It's clearly worked out very well in the most basic sense (it does the job). And on top of that, we now have bug fixes and other improvements submitted to ghc. Seems like a win-win.

tome · on Dec 14, 2016

https://www.reddit.com/r/haskell/comments/5i9qmc/haskell_in_...

zhte415 · on Dec 14, 2016

Surely a bigger question is how a site with a basis in PHP transformed to making PHP compilers, .js frameworks, Haskell, C++ etc.

They seemed to have succeeded in creating a culture of coders/hackers that were opinionated/empowered sufficiently to use their tool of choice, and to persuade others of the advantages.

This doesn't really happen in large-cap companies (though does happen to empowered teams, but even then, only somewhat - thinking finance). And speaks to advantages.

user5994461 · on Dec 14, 2016

It's a big company. Thinkg just go to all directions when you've got thousands of devs.

fusiongyro · on Dec 14, 2016

If you hire developers at scale, you probably get quite a bit of variety.

paulajohnson · on Dec 14, 2016

At a guess, the server probably provides a unified front-end to services written in multiple languages.

newmana · on Dec 14, 2016

This was covered in a previous post: "Haskell is sandwiched between two layers of C++ in Sigma. At the top, we use the C++ thrift server. In principle, Haskell can act as a thrift server, but the C++ thrift server is more mature and performant. It also supports more features. Furthermore, it can work seamlessly with the Haskell layers below because we can call into Haskell from C++. For these reasons, it made sense to use C++ for the server layer." https://code.facebook.com/posts/745068642270222/fighting-spa...

jb1991 · on Dec 14, 2016

Don't confuse the features of a language with its actual run-time productivity.