No More Callbacks: 10,000 Actors, 10,000 Threads, 10,000 Spaceships

ChuckMcM · on Oct 16, 2013

Nice exemplar. Back when Java was being created, James Gosling was pretty insistent that concurrency be lightweight and scalable. When I ported it from SunOS 4 so Solaris 2.0 I had to move from the really light weight setjump()/longjmp() threads that he had implemented, into the thread system that Solaris had defined. There was a huge negative impact on performance (as I recall about 15x slower). That sucked because one of the coolest demos at the time had a little world in it where 'Fang' (the Java mascot) lived and a bunch of things in that world were all animated with threads. Looking at the 'fiber' model for threads I think they are much closer to what we should have done in the first place.

The thought was to have a billion threads on a SPARCStation 10 (that is like an old Pentium machine now). We never got close but it was a great goal. Definitely going to have to go back and revisit this topic now. Thanks for the excellent demo to play with!

rdtsc · on Oct 16, 2013

I got to about 500K processes on Erlang's VM on an i7 with lots of memory.

These people people got up to 2M concurrent TCP connections

http://blog.whatsapp.com/index.php/2012/01/1-million-is-so-2...

And on top of it, it is using isolated heaps. That is beautiful I think. Completely concurrent GC is beautiful.

I know Erlang syntax is not to for everyone's taste (I do like it though). There is also Elixir (http://elixir-lang.org/). But the underlying BEAM VM is an awesome piece of technology.

> The thought was to have a billion threads on a SPARCStation 10

There are some green-thread C libraries I've been playing with, like Protothreads ( http://dunkels.com/adam/pt/index.html ) and libconcurrency ( https://code.google.com/p/libconcurrency/ ) I think they use setjump/longjump trick. Some use setcontext()/getcontext() POSIX call.

I still like Erlang VM best.

kerneis · on Oct 16, 2013

> There are some green-thread C libraries

Shameless plug for CPC, a compiler translating green-threads to event-driven code: http://gabriel.kerneis.info/software/cpc/

Used to compile coroutines in QEMU during the latest GSoC: http://gabriel.kerneis.info/research/files/qemu-cpc.pdf

reginaldjcooper · on Oct 17, 2013

Curse you; you added five papers to my already-full reading list :)

jzelinskie · on Oct 16, 2013

After you've been writing a bit of Erlang, you find yourself always putting a "." at the end of lines in other language's interpreters.

I actually really like Erlang and would love to get more familiar with it, but there are very few small projects that are suited to Erlang's domain.

SkyMarshal · on Oct 16, 2013

One option is to port your personal site/blog to Zotonic CMS and play around with that.

lucaspiller · on Oct 17, 2013

Or your favourite library from another language, Erlang is quite lacking in that regard. I had a go at porting RSpec:

https://github.com/lucaspiller/espec

imperialWicket · on Oct 18, 2013

ChicagoBoss offers a pretty low barrier for entry as well.

knewter · on Oct 16, 2013

I love Erlang. I've been learning elixir and am recording screen casts at http://elixirsips.com. it's so good for concurrency.

pron · on Oct 16, 2013

BTW, the only reason the demo runs 10K or a few 10Ks of actors is because this simulation is very much CPU-bound, and no matter how you schedule tasks – you can't create more CPU power in software. In less computation intensive scenarios, Quasar supports millions of actors.

HowardMei · on Oct 16, 2013

Erlang is really good, especially for engineers with hardware background, who may find FP & Message Passing are more intuitive than OOP & Context Switching.

But it's quite difficult to hire.

mpweiher · on Oct 16, 2013

"FP & Message Passing are more intuitive than OOP & Context Switching."

Huh? OOP & Message Passing!

http://lists.squeakfoundation.org/pipermail/squeak-dev/1998-...

reginaldjcooper · on Oct 16, 2013

Difficult as in you are looking for Erlangers but cannot find them? I haven't seen many postings looking for Erlang programmers.

yapcguy · on Oct 16, 2013

Seems to be the same thing with Haskell and other languages, including Golang, that many groups on HN are passionate about. There aren't many people hiring for skills in these languages. Some start-up founders I know who have built their company on Python and Javascript, when I ask them if they have looked at Golang, they reply 'What's that?'.

skriticos2 · on Oct 16, 2013

I'd be happy if someone would hire Python. Most of it is Java, PHP and C# around here. So sad.

wtracy · on Oct 17, 2013

Where's "here"? I constantly get emails from recruiters looking for Python devs.

(I'm in the SF area, but a lot of the ads I see are for New York and parts of the Midwest.)

yen223 · on Oct 17, 2013

Quite frankly, "here" could be anywhere that's not near the Bay Area.

skriticos2 · on Oct 17, 2013

I think so too. I'm located in Germany, but I think this situation applies to most all of the word except the west coast of the U.S.

reginaldjcooper · on Oct 17, 2013

Yeah I don't see many positions for Haskell, either (and I'd like to). OOP class systems are so irritating when you could get more powerful type-checking from less boilerplate. Maybe I should look into F#? AFAICT it seems to be Haskell's sister, who left academia to do finance and is pulling seven figures in her early twenties.

marshray · on Oct 17, 2013

My group in Azure is hiring. I don't think we have any Haskell in production, but I sure bet we could get away with using F# and the new immutable collections for a lot of new stuff. zip [marshray@; maray] [live.com; @microsoft.com] Email me, I'd love some coworkers who were really into FP.

lt4 · on Oct 17, 2013

This sounds interesting. If you don't mind I'd like to get in touch also.

marshray · on Oct 17, 2013

Yes that'd be awesome!

cmccabe · on Oct 17, 2013

Python is 22 years old, Golang is 4 years old. Give it time.

RussianCow · on Oct 17, 2013

Haskell is even older, but you don't see many job ads for it. I don't think it has much to do with the age of the language. Node.js is an example of a very young platform that many, many startups are using. (Although, to be fair, JavaScript has been around for quite some time now.)

cmccabe · on Oct 19, 2013

In order for a programming language to get used:

* it has to fill a need people have.

* people have to know about it.

* it has to have a mature implementation.

Age helps with point #2 and #3.

With regard to JavaScript, the web browser pretty much forces you to use it. So it doesn't really matter how good or bad it is, it will have adoption.

CmonDev · on Oct 17, 2013

Space pilots and nuclear plant engineers are hard to hire because you need hardware to train. Hiring software developers just requires willingness to pay more money than market.

ww520 · on Oct 16, 2013

People have been using Netty (Java) to handle over 500K connections with NIO on one server. http://urbanairship.com/blog/2010/08/24/c500k-in-action-at-u...

That's 3 years ago. I'm sure more can be handled now, with more memory and more cores.

Bundled Netty with a distributed system like Hazelcast, you can easily scale out to more machines.

pron · on Oct 16, 2013

Quasar is distributed on top of our own IMDG called Galaxy. Our goal is not to give you scalability, but to give you scalability while keeping familiar and simple programming models. There is absolutely no reason not to use Netty with Quasar. In fact, we're doing exactly that in an internal project.

cnlwsu · on Oct 16, 2013

I've setup 1m connections to a Netty service we host a couple years back, its kinda a fun exercise. I would avoid something like hazelcast if want to do something that palatalized. We used terracotta and the entire concept of a shared memory in a distributed system becomes a major headache.

ww520 · on Oct 17, 2013

Can you give some insights on why to avoid Hazelcast or friends for distributed system?

I can give my sample usages. I mainly used it for organizing the cluster automatically, like managing membership, server joining or leaving, maintaining the master server address list. With a distributed hash scheme like Soda, any member server can accept work from any client. Scaling out is very simple, just boot up a machine from an image. It would use Hazelcast to join the cluster and become available.

I also used it for distributing jobs, nice to able to say - queue these jobs and the distributed workers pick them up automatically.

I avoid using it for shared memory stuff, using it like Memcached. That's what Memcached, Redis, and friends are for.

jeremyjh · on Oct 16, 2013

Yes but you're dealing with callbacks then. In the context of the SP, Netty is an example of what they wanted to escape from.

justinsb · on Oct 16, 2013

I also remember user-space threading in Java (where N threads would be multiplexed onto M kernel threads), although my understanding was that we moved to the 1:1 threading model because it was faster and had fewer pathological edge cases.

It sounds like you're saying that it wasn't faster? Can you explain a bit more about the motivation if so?

Any thoughts on whether we should have user-space threading in the JVM today? Maybe it could be implemented in RoboVM as an experiment (rather than wrestling with the JVM immediately)?

ChuckMcM · on Oct 16, 2013

So the "original" move to native threads was motivated by the fact that Sun didn't want Java to be SunOS only (there were a number of SunOSism's in it when it was Oak). That the system threading system would improve independently of the language and have fewer edge cases were put forth as good reasons to do that.

Initially it was slower because context switches are slower, and anything that had to context switch into and out of the kernel was thus slower. Later there were other reasons. "User level" or as I prefer "Single Context" [1] threads have the ability to be very fast because they have extra knowledge about what is going on and access to other address spaces (so you don't have to save something if you know how it's used). There has been lots of great work on this since the early 90's when this decision was made so I suspect the issue would be significantly less.

[1] At the time kernel threads were 'single context' and quite fast. Fast enough to be used for interrupts. There is a great paper on the threading system in Solaris which describes them. -- http://dl.acm.org/citation.cfm?id=202217

pron · on Oct 16, 2013

Naturally, we've experimented a lot measuring context-switch cost. While pretty bad on OS X, on Linux it's quite good in many circumstances (sleep, IO interrupt) but not so good when a thread waits to be woken up by another application thread (as opposed to being woken up by some kernel event). Aside from consuming less memory, Quasar fibers shine exactly in that last scenario: when fibers wait on other fibers (they could be waiting for an actor or channel message, a condition variable, or an in-memory DB transaction). Fibers use Doug Lea's awesome fork/join thread pool for scheduling.

I hope to write about our context-switch results some day.

tieTYT · on Oct 16, 2013

Isn't the Java mascot named Duke?

ChuckMcM · on Oct 16, 2013

The guy who drew him (Joe Palrang) called him 'Fang' and most of the rest of the folks in the Java group did as well. When we went public with Java Sun PR didn't like the name we'd given him so they picked Duke. I think Wayne Rosing suggested that name but my memory of that is quite fuzzy.

Keyframe · on Oct 16, 2013

He looks like a fang though. Maybe that's what they've nicknamed him internally.

jaimefjorge · on Oct 16, 2013

Well written, good description and nice demo.

Would love to see more on how this is different to (or better than) Akka. The programming model is actually close to Akka (with actor systems, supervision, receive method, message passing, etc).

The article states that Akka has no true lightweight threads. The guys behind Akka have put it running with 50M messages/second[1] and perfomance vs erlang seems to be good as well [2][3].

Perhaps a benchmark would be great.

Thanks for sharing.

[1] http://letitcrash.com/post/20397701710/50-million-messages-p...

[2] http://uberblo.gs/2011/12/scala-akka-and-erlang-actor-benchm...

[3] http://musings-of-an-erlang-priest.blogspot.pt/2012/07/i-onl... (discussing millions of messages is a good signal IMHO).

pron · on Oct 16, 2013

The main capability provided by Quasar is the fiber, or the lightweight thread. It is the same as a normal Java thread in the sense that it can block – on IO, on a DB call, or on a synchronization mechanism. This makes the programming experience very natural. The actor and the channel abstractions build upon fibers.

Akka doesn't have lightweight threads at all. You implement a message-handling method, but it must not block on, say, a DB call, lest it block the entire thread it runs in. An Akka actor simply must not issue a DB call: it's as simple as that.

With Quasar things are different: you pull messages rather than implement a callback; you can block: on IO, DB, lock or anything else. The programming then is not only simpler, but also more powerful. For example, Quasar supports selective receive - just like Erlang.

mark242 · on Oct 16, 2013

An Akka actor simply must not issue a DB call

That's untrue. With Akka's pipe pattern, you can take the results of any future and pipe it back to the sender, including using a map on the future if you like. This is how we do reactive database calls in Akka. For example:

  def receive = {
    case msg => {
      val f = future { myDatabaseResult(msg) }
      f map { result => myTransformResult(result) } pipeTo sender
    }
  }

At no point does this actor block. Assuming you even have something like Play calling this actor, you wouldn't be blocking there, either, you'd take the result from the actor, likely map it to a result, and Play would asynchronously return that. My basic rule is that if you're typing Await.result anywhere in your Play/Akka code, you're doing it wrong.

pron · on Oct 16, 2013

There are many ways to do asynchronous programming employing functional approaches. The difference is that with Quasar you can use them if you like but you don't have to. You can issue a plain-old JDBC call, and at no point will the thread block, either, but the actor will: because it's simple, familiar and intuitive. You don't need to learn so many unfamiliar patterns. You program as you normally would a single thread.

dxbydt · on Oct 16, 2013

Ok, so issuing a blocking call is "simple, familiar and intuitive". Invoking a Future or a Promise is "so many unfamiliar patterns".

Yes Sir, with this attitude I hope to make a remarkable progress in my tech career :) Seriously, there is nothing mysterious or magical about shoving a "plain-old JDBC call" into a Future.

http://en.wikipedia.org/wiki/Future_(programming)

Remarkable demo, btw. But lets not run down other approaches simply because one might, god forbid, have to "learn so many unfamiliar patterns".

rdtsc · on Oct 17, 2013

> Ok, so issuing a blocking call is "simple, familiar and intuitive". Invoking a Future or a Promise is "so many unfamiliar patterns".

You got it! Great job!

> Yes Sir, with this attitude I hope to make a remarkable progress in my tech career :)

Well one way to not make a remarkable progress in your career is use fads, acronyms, and unnecessarily complicated constructs. Why use futures in that example when actors perfectly model the problem domain? Are you showing off that you know about Futures and they are easy?

andrewvc · on Oct 16, 2013

I think you're pretending that removing the ceremony of the Future isn't a significant difference. Your point is correct, but you're missing the forrest for the trees

pron · on Oct 17, 2013

A future is a simple blocking mechanism. The Scala example above uses something called a future, but isn't. It's an "Rx" functional future – cool and often useful, but it's yet another construct that isn't part of the actor model. I'm happy to use Rx, but I wouldn't use it in an actor.

There are many ways to tackle concurrency, but IMO it's best to keep them separate as much as possible, or you quickly lose track of what's happening when.

MrBuddyCasino · on Oct 16, 2013

Well he is kind of right. I'm familiar with enterprise IT, and there are very mediocre programmers at work. I'll bet you most of them have never heard of "futures". Sad, but thats how I experienced it.

rdtsc · on Oct 17, 2013

> I'm familiar with enterprise IT

I am also familiar with snobby wannabees functional programmers who instead of opening the goddam file and reading it are creating homomorphic endofunctors wrapped in futures with double memoization and distributed locks, so that nobody on the fucking team knows what's going on.

These people are 10x more dangerous than mediocre programmers who just find the simplest way to get the work done and ship the product.

Eventually 1% of the wannabes might get enlightened and realize that simple basic code is usually better than using every single programming concept wrapped in 100 lines of code that nobody (including themselves 2 weeks later) can understand.

jacquesm · on Oct 17, 2013

That's quite a feat of mind-reading you performed there. The fascination with technology rather than just to solve the problem at hand via the shortest critical path is a thing that has been puzzling me for a long time. At some level technology is so fascinating in its own right that the temptation to lose sight of the goal is ever present and many people succumb to that temptation.

Imo it's just another variation on the Yak Shaving theme with a dose of procrastination thrown in for good measure.

discreteevent · on Oct 17, 2013

“Well, Mr. Frankel, who started this program, began to suffer from the computer disease that anybody who works with computers now knows about. It's a very serious disease and it interferes completely with the work. The trouble with computers is you play with them. They are so wonderful. You have these switches - if it's an even number you do this, if it's an odd number you do that - and pretty soon you can do more and more elaborate things if you are clever enough, on one machine.

After a while the whole system broke down. Frankel wasn't paying any attention; he wasn't supervising anybody. The system was going very, very slowly - while he was sitting in a room figuring out how to make one tabulator automatically print arc-tangent X, and then it would start and it would print columns and then bitsi, bitsi, bitsi, and calculate the arc-tangent automatically by integrating as it went along and make a whole table in one operation.

Absolutely useless. We had tables of arc-tangents. But if you've ever worked with computers, you understand the disease - the delight in being able to see how much you can do. But he got the disease for the first time, the poor fellow who invented the thing.”

― Richard P. Feynman, Surely You're Joking, Mr. Feynman!

rdtsc · on Oct 18, 2013

> The fascination with technology rather than just to solve the problem at hand via the shortest critical path is a thing that has been puzzling me for a long time

Exactly. And fascination with technology is important, it is what keeps people learning and searching, finding better tools. The problem with it, is it has a pathological side.

Like the tool analogy, just because I found an experimental, electronic, automatic nail gun, voice activated, with blinking lights, doesn't mean I should use it when building my own house, if all I need is to hammer a few nails, a regular trusted hammer will do.

MrBuddyCasino · on Oct 17, 2013

I fully agree, thats why I don't like Scala. Readability and simplicity above everything.

That doesn't mean that you shouldn't be familiar with the basic concepts of the programming language that you use, and I feel the silicon valley bubble is sometimes unaware that not all programmers are startup hot shots.

You actually support my argument, introducing the concept of futures is just an added layer of unnecessary "cleverness" that could be avoided with quasar/pulsar.

lotsofcows · on Oct 16, 2013

People who don't follow your favourite programming paradigm are mediocre? Did you mean that or is it just a poorly constructed paragraph?

MrBuddyCasino · on Oct 17, 2013

Certainly not. I was just responding to dxbydt , who was questioning the statement that futures and promises are unfamiliar to many programmers, and I think they are, because I've seen some corporate IT departments from the inside, and its just a different environment.

In no way do I endorse the use of futures or whatnot, nor are they my "favourite programming paradigm".

I am simply advocating the simplest solution that works, and avoiding constructs like promises/futures seems like a good idea in that regard.

eeperson · on Oct 16, 2013

Wouldn't a thread still have to block somewhere? How would you call the synchronous code without blocking on any threads?

pron · on Oct 17, 2013

You block the lightweight thread (fiber), rather than the OS thread. Fibers are implemented as continuations scheduled by a very good multi-threaded scheduler (ForkJoinPool).

tuxychandru · on Oct 17, 2013

When a query is executed with JDBC, the execute() method does not return until the the DB responds. The method must have been invoked in some OS thread by quasar scheduler. Wouldn't that thread block as long as execute() doesn't complete?

To put it differently, can I make 10000 concurrent HTTP requests (to different domains), using a non-NIO HTTP client library, without ending up with one OS thread per request?

If I can, how does the scheduler manage it?

pron · on Oct 17, 2013

> can I make 10000 concurrent HTTP requests (to different domains), using a non-NIO HTTP client library, without ending up with one OS thread per request?

No. We provide you with a standard HTTP client API (JAX-RS client) that gives you a blocking API. Under the hood it uses asynchronous IO. We then transform callbacks to fiber-blocking operations. So you use a standard blocking API, that is implemented asynchronously.

JDBC is a little more complicated as there is no async JDBC standard. What we do, then, is run the thread-blocking call in a separate IO workers pool. Those worker threads will block, but your API call will just block the fiber, letting other fibers use the same OS thread for something else until the JDBC call completes, at which point the IO worker will wake up your fiber.

twic · on Oct 17, 2013

Does there need to be a handw-written integration to every kind of blocking resource, or is there magic happening here?

What happens if a thread loads from a MappedByteBuffer, and the OS needs to read from disk to satisfy the load? What happens if a thread loads from some far corner of main memory that has been paged out, and the OS needs to read from disk to satisfy the load?

Those situations go beyond the power of lightweight threading systems i have seen before. They're not fatal problems, nor even serious ones for most programs, but they're part of the reason that lightweight threading hasn't, and can't, become the general solution to threading. Well, not until scheduler activations make a comeback, at least. That doesn't mean that lightweight threading is not a hugely valuable thing to have on an opt-in basis, though, as this demonstration, er, demonstrates.

pron · on Oct 17, 2013

> Does there need to be a handw-written integration to every kind of blocking resource, or is there magic happening here?

Yes (to handwritten integration), but it's incredibly simple. We can transform any callback-based API to a fiber-blocking API within hours of work.

> What happens if a thread loads from a MappedByteBuffer, and the OS needs to read from disk to satisfy the load? What happens if a thread loads from some far corner of main memory that has been paged out, and the OS needs to read from disk to satisfy the load?

The thread will be blocked, but it isn't likely to be much of a problem. It's perfectly OK for the fiber to block the thread occasionally (hopefully rarely) – the work-stealing scheduler can deal with that because it runs in a thread-pool; if one thread blocks other will steal its work and do it. It's just not OK for fibers to block their thread very often. The scenarios you've described involve missed caches and so are rare by design.

> Those situations go beyond the power of lightweight threading systems i have seen before. They're not fatal problems, nor even serious ones for most programs, but they're part of the reason that lightweight threading hasn't, and can't, become the general solution to threading.

I agree. Quasar fibers are by no means meant to serve as a replacement for threads. They are specifically targeted for cases when you want lots of concurrent "threads" that interact very often by passing information (either via messages or a shared data structures), and so block and wait for each other a lot.

If you have a long-running computation: use a plain thread.

Actually, one of the cool things in Quasar is an abstraction called a strand. A strand is simply either a thread or a fiber. All of the synchronization mechanisms (channels, condition variables etc.) provided by Quasar work with strands - not directly with fibers - so you can use them both in fibers or threads. In fact, you can run a Quasar actor in a thread rather than a fiber.

tuxychandru · on Oct 17, 2013

How do you determine whether a method call is thread-blocking or not?

Should libraries be quasar-aware to not end up in the IO workers pool when invoked?

pron · on Oct 17, 2013

Every fiber-blocking operation eventually ends up with a call to Fiber.park(). If you want to call park in your function or call a function that calls park etc., you need to let Quasar know that your function is "suspendable". There are several ways to do that: you can declare to throw a SuspendExecution exception, you can annotate your method with a @Suspendable annotation, or you could declare it as suspendable programmatically or in an external text file.

tuxychandru · on Oct 17, 2013

Ah, so I cannot just use any Java library that performs IO if I have to get quasar's benefits.

pron · on Oct 17, 2013

Correct, but we'll provide implementations for the most popular IO APIs (NIO, REST services, web sockets, JDBC etc.) so it is a limitation, but I think it's a small one.

Once the documentation is complete you'll know how to transform any callback-based asynchronous call into a fiber-blocking one, so you could integrate your own libraries with Quasar. It's very-very simple.

hp · on Oct 17, 2013

Scala has an `async`/`await` feature (like C#) now, which hides the Future ceremony and gives sequential syntax. https://github.com/scala/async

I guess this does still require the `await` word, but I think it's good to have a magic word for "suspend execution now" so you can see where it's happening.

An OS thread still has to block on blocking IO calls somewhere, of course. There's no "syntactic" fix for that on the JVM - you have to actually port the blocking IO to nonblocking IO - AFAIK nobody can magically fix JDBC to be nonblocking from outside JDBC.

pron · on Oct 17, 2013

Essentially, Quasar provides async and await for all JVM languages. async is called `Fiber.start()`, and await is called `Fiber.park`.

Other than working for all JVM languages, Quasar fibers are more general in that they can spawn many functions (they have a stack), while async is limited to a single expression block. Because of this, we can hide the "await" deep inside the JDBC call stack.

Under the hood, they are similar: both instrument your code. Only async does this at the language level (it's a Scala macro), while fibers do it at the bytecode level.

saryant · on Oct 16, 2013

> Akka doesn't have lightweight threads at all. You implement a message-handling method, but it must not block on, say, a DB call, lest it block the entire thread it runs in. An Akka actor simply must not issue a DB call: it's as simple as that.

That's not quite true. Yes, you block that thread but this is why you configure actors with blocking calls to use their own dispatcher. For DB queries you typically put querying actors onto a thread-pool dispatcher where pool size ~= available DB connections.

Selective receive is also quite easy to implement in Akka using stash().

arielweisberg · on Oct 16, 2013

I am really looking forward to where actor frameworks like Akka are going, but that seems like a leaky abstraction to me.

I shouldn't have to define extra actors or figure out thread pool concurrency levels to do blocking operations like lock acquisition, waiting on conditions, IO etc. A framework that that doesn't allow that adds zero value to me because I am not willing to ask others to reason about that sort of thing.

Not having that kind of transparency makes it difficult and dangerous to convert existing code bases that use traditional concurrency primitives.

I would like to have thousands or millions of actors, but for now I am stuck going to 1:1 with threads.

pron · on Oct 16, 2013

I agree. That's why Quasar starts by providing true lightweight threads - you can block, wait on conditions - whatever. Only, you can have millions of those.

On top of these fibers, Quasar gives you Go-like channels and/or Erlang-like actors (I say Erlang like because they follow the Erlang model closely: you pull messages rather than implement message callbacks, they have selective receive etc.)

rdtsc · on Oct 16, 2013

Quasar looks very cool. How does pre-emption work? I know Erlang's VM count an actors' "reductions" -- bytecode instruction and after a certain number preempts that actor and lets other run. How does that work in Quasar? Does an actor have to explicitly yield, sleep, do IO or run receive?

pron · on Oct 16, 2013

We've experimented with reduction-based preemption but saw no perceivable performance benefit (you can look at the Fiber class code and see them commented out). We might bring it back if we find a good use for it.

lmm · on Oct 17, 2013

Does that mean this system is inappropriate for CPU-bound tasks?

pron · on Oct 17, 2013

The spaceships demo is very much CPU bound.

But, at least for the time being I wouldn't run a long-running, CPU heavy computation in a fiber, but in a plain thread. Fibers work best when they block often.

pron · on Oct 16, 2013

stash is not the same as selective receive at all. With selective receive you can do simple, intuitive nested receive blocks (see the Pulsar Clojure examples here: http://blog.paralleluniverse.co/post/49445260575/quasar-puls... or any Erlang code).

And yes, you can configure Akka for certain types of usage, but it is anything but simple. We value the simplicity and intuitiveness that are at the core of Clojure and Erlang.

saryant · on Oct 16, 2013

I didn't say that stash() is the same, just that it's quite easy to use it to implement selective receive in Akka in combination with hot-swapping an actor's behavior with become/unbecome. We use this pattern quite successfully in our code base.

jamesaguilar · on Oct 16, 2013

> An Akka actor simply must not issue a DB call: it's as simple as that.

Must not issue a synchronous db call, if I'm reading you correctly. I assume most DBs also provide async interfaces, or you could create one yourself.

jeremyjh · on Oct 16, 2013

That is not a correct assumption. Most DB vendors only provide JDBC drivers. There are some async drivers, but there is no standard for them and availability and quality varies substantially. This problem is repeated for other network libraries - if they weren't written specific for NIO then you have to KNOW that and you have to be sure you run that Actor in it's own thread. Its a leaky abstraction.

eeperson · on Oct 16, 2013

How does the Quasar know when it should defer execution of something that might block? Does the user declare this in some fashion? In the article, it looks like this is done with try-with-resource blocks.

pron · on Oct 16, 2013

The try-with-resource block is used to delineate an atomic transaction, and is part of the SpaceBase API. An example in the code for a blocking call is the call to receive.

Quasar identifies blocking methods if they declare that the throw a SuspendExecution exception, marked with a @Suspendable annotation, or listed in an external file. Pulsar, Quasar's Clojure API, marks suspendable functions differently, but that's an implementation detail.

CookWithMe · on Oct 16, 2013

My first thought was "why don't they use Akka"?

> Akka has no true lightweight threads (the actors are actually callbacks)

Would you care to elaborate? I'm not too familiar with the internals of Akka, but they definitely don't use "heavyweight" threads (which I assume are threads that are 1:1 mapped to OS threads).

Also, I didn't get "the actors are actually callbacks". Yes, there may be callbacks involved internally (why not?), but there is a big difference whether I am sending a message to an actor (which may be processed at any time) vs. calling a callback (which is immediately executed on the very same thread that I'm running on).

Sorry if this sounds dismissive, but I'd really like to learn why you choose to implement your own solution, because you've obviously put some time into evaluating what is out there.

hp · on Oct 17, 2013

https://github.com/scala/async is the syntactic sugar to write sequential nonblocking code in Scala (no callbacks). Though functional-style code works well also if you know it.

pron · on Oct 17, 2013

[cloned comment]

Essentially, Quasar provides async and await for all JVM languages. async is called `Fiber.start()`, and await is called `Fiber.park`. Other than working for all JVM languages, Quasar fibers are more general in that they can spawn many functions (they have a stack), while async is limited to a single expression block. Because of this, we can hide the "await" deep inside the JDBC call stack.

Under the hood, they are similar: both instrument your code. Only async does this at the language level (it's a Scala macro), while fibers do it at the bytecode level.

pron · on Oct 16, 2013

See my reply to jaimefjorge

Morgawr · on Oct 16, 2013

I'm going to be "that" guy and ask... why actors? Why not agents?

The concept of agents (as defined by Rich Hickey in a lot of his Clojure talks) is all about a globally shared, immutable and persistent state on which you can act upon.

With actors you still need to have the actor handle its own mailbox of requests and then handle them, the actor has to define its behavior.

With agents you don't have to ask for the world to stop to communicate, you can read the current snapshot of the world (aka no request to view the state, no database queries) and send transformation functions on the data of that specific agent, which will be then processed by the agent's thread in an ordered way.

I'd love to see more insight on the choice for this, it's interesting as I am currently working on a similar project.

rdtsc · on Oct 16, 2013

> I'm going to be "that" guy and ask... why actors? Why not agents?

Because actor is an establish paradigm and that has been around for a while. I haven't heard Hickey's talk on "agents" but based on your description, how are agents radically different?

What stops actors from reading the snapshots of the world? They can 1) subscribe to "world" actor and get publication when it changes or 2) if database is immutable (I guess you are hinting at Datomic or Clojure's datastructures here?) an actor can also call a function. Remember actors in practice are there to help isolate concurrency contexts. Reading truly immutable data is safe so actors could just periodically read this immutable data (just think of the database a function). It would be awkward having to process messages from mailbox, timing out, then reading world state, process world state, going back to processing messages. Etc. I like 1) better.

> the actor has to define its behavior.

How does an agent bypass defining its behavior? Doesn't an agent have a piece of code that specifies what that agent does.

> and send transformation functions on the data of that specific agent, which will be then processed by the agent's thread in an ordered way.

So this basically centralizes the state of all the agents in one central location that is an immutable database? Hmm interesting. It is a different way of looking it at it I guess. Each actor usually handles its own internal state privately. I guess we also assume that there is something underneath that constantly distributes all these incrementing tree of states across a whole system. I don't know I would rather think of actors explicitly choosing to send their state to a system half way across the world rather than rely on another layer of distributing state. Maybe it is just a matter of a mental model here...

saryant · on Oct 16, 2013

The best summary of actors vs. agents I've heard is Jonas Bonér's, one of the head Akka engineers:

"With actors you send state to the behavior, with agents you send behavior to the state"

Morgawr · on Oct 16, 2013

The substantial difference is that actors act on messages while agents are only pieces of data with an assigned thread (in a pool, doesn't matter) that is scheduled to retrieve the functions passed via send and send-off. There is an amazing talk from Rich about clojure concurrency where he implements an ant simulation with agents, it's really great. He also mentions the difference between the erlang and the clojure model in a much better way than I possibly would.

pron · on Oct 16, 2013

Actually, the biggest thing Quasar gives you is lightweight threads. That's the hard part. Building actors, agents or dataflow variables on top of that is very easy.

Having said that, I do think actors have some advantages over agents when it comes to fault tolerance. Actors better isolate and communicate faults.

Morgawr · on Oct 16, 2013

Yes, you are entirely correct, my point about actors vs agents was only taking in consideration a shared memory situation (only one local multicore processor), in case of a distributed system with multiple nodes located on different memory spaces and machines the erlang approach is much superior, especially for actor discovery, monitoring (ie heartbeat) and fault tolerance in general.

IgorPartola · on Oct 16, 2013

> Writing correct and efficient multi-threaded code is at once necessary and extremely difficult.

I do not agree with this. The original statement he is quoting says "can be very challenging". Yes, if you are designing something very state heavy and your design is somehow flawed or too complex then you can run into issues. However, in most cases threads are no more complex than callbacks, actors, etc. In fact, from what I've seen, concurrent code eventually all converges to some semblance of the actor model anyways.

Where the actors/green threads/etc. really shine is having huge numbers of them. OS threads still have very large overhead compared to lighter weight green threads, so you can spin up many magnitudes more of them than you have CPU cores.

Also, in lots of languages multi-core != concurrent. You can have 10,000 actors using a single core. In fact writing a scheduler that can efficiently distribute actors between different cores is probably where the complexity Doron Rajwan refers to lies.

chriswarbo · on Oct 16, 2013

> However, in most cases threads are no more complex than callbacks, actors, etc. In fact, from what I've seen, concurrent code eventually all converges to some semblance of the actor model anyways.

Threads are the WorseIsBetter approach to concurrency; they're incredibly simple to implement, but that just means that the difficulties are pushed on to the users (ie. developers using the framework/library).

Threads may be a good idea for code which has no 'design flaws' and is not 'too complex', but as we all know everything has bugs and everything is more complex than it seems. The arguments in favour of higher-level concurrency models are basically the same as for tests and version control: if you don't use them, you're making a dangerous gamble which may cause a large price down the road.

Concurrency models like callbacks and actors can make dangerous things more difficult; if we use the callback examples from the article:

> It’s hard for a programmer to reason about which line of code executes on which thread, and passing information from one callback to another is cumbersome as well.

Of course, this is the point of callbacks. The callback model tells us to reason using function arguments and function calls, so of course we can't map lines of code to threads, since neither lines of code or thread have any place in a callback model. Likewise for passing data between callbacks; the problem with threads is that everything is shared all of the time, which makes it incredibly difficult to enforce invariants. When using callbacks, everything is local by default and transfering data between threads requires explicit channels, eg. free variables.

In the actor model the safety comes from messages having no ordering or latency guarantees, so we can't assume that our data is always up to date.

With higher-level concurrency models we end up screaming at our IDEs as we try to contort our code to fit the paradigm. This is how it should be, since this means nothing's gone wrong.

With low-level concurrency models, the machine gladly accepts our dangerously broken code, the number of interleavings is so huge that our tests never hit an error case (or more likely, some of the bugs are so obscure that it never occurred to us to test them). Six months later the application explodes and as we sift through the pieces we find the true extent of the problem, and discover that subtley corrupt output has permeated through every aspect of the business and we can't anything that's been done since that code went live.

IgorPartola · on Oct 16, 2013

> the problem with threads is that everything is shared all of the time, which makes it incredibly difficult to enforce invariants.

Your entire commet comes down to this, and my point is that this is not a problem. Design your threaded code around a simple principle: one thread's code must never touch another thread's data. Now you have safe threaded code. If you want to add some limited well-documented cases where you break that golden rule, go for it and reap the performance benefits.

There are some things that some threading models can be criticized for. For example, POSIX threads cannot be killed if they get stuck. However, threads are a powerful tool. The idea that you can share the in-memory code between all your threads is great. Additionally, you can share state and you control how and when it is shared. Want complete isolation? Communicate via queues! Want some shared state for performance reasons? Go for it! Want complete and utter chaos that will blow up as soon as you look at it funny? Let threads access other thread's data at will.

Your argument is similar to one that table saws are terrible because one cannot guarantee that they will never cut off your fingers.

Edit: one other problem with callbacks. AFAIK, no implementation of callback-based concurrency is able to take advantage of multiple hardware cores for true parallelism. In the meantime OS schedulers already take care of distributing OS threads between CPU cores, and some green thread implementations do this as well.

JabavuAdams · on Oct 16, 2013

So, you're not wrong ... but the table-saw argument is a straw-man.

There's a company that sells a revolutionary table saw with intelligent saw stop precisely because experienced, skilled practitioners regularly cut off their fingers.

In general "be smarter / do better" is not a reasonable prescription for large numbers of people. Empirically, if people are fucking up, it makes sense to analyze why and to give them automatic solutions to their fuck-ups.

IgorPartola · on Oct 16, 2013

I don't see it as a straw-man as I see threads as a tool. Existence of the actor model does not detract from the value that OS threads provide, the same way that existence of Common Lisp does not detract from the value that C provides. They are both tools. It's just that some tools are more dangerous than others. In other words, I don't believe that threads are a "worse is better" approach. There are things that can be improved about the specific implementations of threading, but on the whole, the paradigm is far from broken.

> Empirically, if people are fucking up, it makes sense to analyze why and to give them automatic solutions to their fuck-ups.

The problem is that other implementations of concurrency are not as widely adopted and people tend to fall back on threads (especially OS threads) when they really don't need them. But when you really do need threads, very few things are a good substitute.

P.S.: I am aware of the table saw you refer to, and this is the kind of improvement that tooling around threads could use. Note that this new table saw does not completely re-design how you interact with the blade in order to provide the safety.

lmm · on Oct 17, 2013

>Your entire commet comes down to this, and my point is that this is not a problem. Design your threaded code around a simple principle: one thread's code must never touch another thread's data. Now you have safe threaded code. If you want to add some limited well-documented cases where you break that golden rule, go for it and reap the performance benefits.

How do you know which code is touching which data, particularly if you're using libraries? Heck, we can't even reliably keep track of which data another piece of data belongs to - even with code written and audited by experts, memory leaks get found all the time. Just as memory management is too hard to do in complex programs without language support, isolating data to the appropriate threads is too hard to do in complex programs without language support.

IgorPartola · on Oct 17, 2013

Bullshit. You know that you are not violating your one golden rule by only having the one golden rule. Break fingers of any developers that violate it. Testing is important but there is a certain level at which mistrust of your code becomes paranoia. How do you know that your code is not littering the disk with debug files, declaring global variables, adding rogue macros, etc.?

As for libraries, don't use ones where you have not seen the source or good docs that make the guarantees that satisfy you. Thread safety is one of many reasons for this.

As for memory management being too complex for large projects, see Linux kernel, BSD kernels, nginx, apache, and a million other large projects written in C.

The only thing I agree with you on is that often times language support makes things easier. However, using "unsafe" languages does not make large projects impossible.

lmm · on Oct 17, 2013

> How do you know that your code is not littering the disk with debug files, declaring global variables, adding rogue macros, etc.?

I use a language in which functions that perform disk I/O look different (and are typed differently, so this is not just convention but compiler-enforced) from functions that don't, functions that mutate state look different from functions that don't, and macros don't exist.

Yes, you can forcibly cast around these things. But you have to do so explicitly. Whereas in most threaded languages, access to a variable that's owned by another thread looks exactly like access to a variable that's owned by the current thread.

> As for memory management being too complex for large projects, see Linux kernel, BSD kernels, nginx, apache, and a million other large projects written in C.

I do. I watch the growing list of security advisories for each of them with a mixture of amusement and frustration.

rdtsc · on Oct 16, 2013

> Threads are the WorseIsBetter approach to concurrency;

Threads/Actors are the obvious way to do concurrency. Just like a sibling comment I think you comment confuses the comparison. The semantic difference is between threads _and_ actors vs callbacks.

An actor is a sequential context, ideally isolated, but it can run concurrently with other actors (that are themselves actors). Think of group of entities in a game. Each one executes some simplified sequence of operations. Do x, do y, do z, then go back to x. But there multiple such entities in parallel. Another example is handling web requests. A web GET request is dispatched a new actor is spawned. They read the request body, process it, read some data from database maybe and return the response -- very sequential. But there are multiple potential such requests running concurrently.

Callbacks also form sequences of calls but there is no explicit concurrency context, and if sequence is simple it works ok, but it if it is not it is very easy to get tangled. You are processing one sequence but another piece of input comes in and a parallel sequence of callbacks has started, unless the data is immutable and you have pure functions at some point it becomes a tangled mess.

> With higher-level concurrency models we end up screaming at our IDEs as we try to contort our code to fit the paradigm.

That is why you'd want to run isolated concurrency contexts (actors). You can do this by making copies of data and storing it locally. Talking to threads via queues only. Spawning OS processes. That is how you decompose a highly concurrent system. Using callbacks is not going to fix the problem is only going to make it worse.

pron · on Oct 16, 2013

> Threads/Actors are the obvious way to do concurrency.

Sure. Actors or other forms of CSPs. But I think that a necessary component is some form of a shared data-structure that works alongside, rather than interfere with, your threading model.

Erlang has ETSs, which are a little limited – not saying that there aren't better concurrent, shared data-structures in Erlang, just that even a language that works purely with the actor model admits that such a data structure is necessary.

IgorPartola · on Oct 16, 2013

Necessary in all cases or necessary in some cases? My take on this is that you can successfully pass state around if it's small enough and only one actor cares about it at a time. Once it gets big enough you probably want to use an external service to store and synchronize it (a database), and then it matters less how your program is structured.

I suppose the exception to this might be gaming and simulations where what's more important is speed as opposed to durability of your data, yet you have lots of state to keep track of.

pron · on Oct 17, 2013

If it were that simple, people wouldn't be spending so much time configuring caches or using Redis. I think most non-trivial applications require some central, shared, data store. More often then not, this data store becomes a bottleneck that limits scaling. Databases compete with one another over which interferes with scaling the least.

If you accept the premise in the opening quote about Amdahl’s law, then you must consider that any global or semi-global lock has a huge impact on scalability. Sometimes we have no choice, but I believe that we can and should remove many single-points-of-synchronizations while still keeping the programming model relatively simple. I also believe that rather than hindering scalability, a database can help achieve it.

IgorPartola · on Oct 17, 2013

That is definitely true. Databases are necessary. In fact in-memory data stores that can handle large volumes of data are not all that useful since they usually lack things like backups, etc. Not everyone is writing a RabbitMQ-like system. And of course locking plays a central role in all of this.

What I am saying is that when you accept that synchronization is going to be handled by your database of choice, it becomes somewhat less important how you actually structure your application in terms of performance. There are reasons not to use callbacks, but if you go with threads, actors, processes, etc. is now a choice between how you want to utilize memory and to an extent which technology your runtime supports best.

rdtsc · on Oct 16, 2013

> However, in most cases threads are no more complex than callbacks, actors [...]

You are mixing the two concepts. The distinction is between threads/actors vs callbacks. Not threads and callbacks vs actors.

If you disregard isolated heaps and memory an actor is just a thread plus a queue. Other threads write to the queue and the threads gets messages from queue and executes them.

The real distinction is callbacks vs threads and actors thought. And I agree with original point. Callback-based concurrency is more complicated and more challenging to write than threaded-based concurrency.

> Also, in lots of languages multi-core != concurrent. You can have 10,000 actors using a single core. In fact writing a scheduler that can efficiently distribute actors between different cores is probably where the complexity Doron Rajwan refers to lies.

What do you mean by multi-core? Languages don't come with cores, hardware does. Do you mean that CPU-bound units of concurrency (threads, actors, processes, co-routines) can be dispatched onto multiple CPUs if those exist? Yeah some languages (or more precisely their runtimes and libraries) can't do that. Like Python has the GIL so CPU bound threads can't work. But threads work great for IO bound threads.

IgorPartola · on Oct 16, 2013

> If you disregard isolated heaps and memory an actor is just a thread plus a queue. Other threads write to the queue and the threads gets messages from queue and executes them.

Exactly. Turns out, this is what good thread design looks like anyways, no matter if they are OS or green threads. However, if you put on your safety goggles and lead apron, you can also do other "unsafe" things which may lead to performance boosts. For example, why toss the giant JSON blob into the queue intended for the JSON decoder, when you can just put a pointer to the blob? Of course, then the burden of cleaning up the blob is up to you, the developer, not the runtime.

> What do you mean by multi-core?

I mean distributing N actors/green threads/etc. to run in parallel over M cores. This is not a trivial "write it in a weekend" type of task and support for it may or may not be built into language+runtime. For example, Erlang had concurrency but not parallelism for a time.

zurn · on Oct 16, 2013

It's also not generally necessary.

regi · on Oct 16, 2013

Interesting. I'm attempting to do pretty much the same thing in C: http://github.com/reginaldl/librinoo

sramsay · on Oct 16, 2013

Well done, man. Seriously. Every time someone starts talking about how x language makes it "easy" to do some kind of backflip, I start peering over the fence. Then someone almost immediately implements it in a C library -- or indeed, gets there first.

But have you given any thought to the critical and urgent problem of running 10,000 Actors, 10,000 Threads, and 10,000 Spaceships?

regi · on Oct 16, 2013

We often forget that even though this common problem of replacing callbacks is getting more critical and urgent, people already thought about it and gave some solutions. Maybe not in higher level languages (although I think Go does a great job there). In C, I have in mind glibc's ucontext for example. I'm trying to improve that through rinoo. So to answer your question, if you look at the wiki section you'll see test results I've done running 20,000 Actors. Of course, once you handle "actors" correctly (which should definitely be called fiber) you shouldn't use that many Threads (if too many you'll end up spending most CPU cycles scheduling your threads). However, rinoo handles multi-threading as well. I'm currently writing doc about it.

auvrw · on Oct 17, 2013

concurrency --- albeit not at this scale --- is something that that you sometimes have to deal at a low level with when writing android apps. animating custom views, for example, often winds up involving direct use of Runnable s rather than (what i assume are) system-level AsyncTask s. a lot of the die-callbacks-die neatness on the java side of this relies on a coroutine library, but that library doesn't run on android. there is a continuation library that does > http://commons.apache.org/sandbox/commons-javaflow/ which could be used to create coroutines and from there user-level threads

... but if we just want some generic kind of concurrency-niceness on a java virtual machine, might it make more sense to use scala rather than write your own lightweight thread library? is the user-space thread implementation really necessary or even helpful if you're abstracting toward actors anyway? do these questions even make sense to anyone?

pron · on Oct 17, 2013

Quasar gives you fibers. On top of them you can build actors, Go-channels, or data flow variables.

Scala gives you no advantage here. None of its concurrency constructs really require Scala. There is no reason not to implement them in Java and use them in any JVM language. More specifically, Quasar actors are more general and powerful than Scala actors because they run in true lightweight threads and can block. Also, a lot of people don't like Scala.

newobj · on Oct 16, 2013

Title: "...10,000 Threads..."

Post: "...10,000 Fibers..."

sigh

rdtsc · on Oct 16, 2013

Edit "...10,000 Actors..."

Comments "...10,000 Co-routines ... "

;-)

ericHosick · on Oct 16, 2013

We are working on a fully composable frame and concurrency is done as follows (upper-case = Object, lower-case = property):

AsyncRun ( part SomeObject )

multiple items can run in parallel like this:

AsyncRun ( part SomeObjectA SomeObjectB .. )

synchronization:

AsyncSync ( part AsyncRun ( part SomeObjectA SomeObjectB .. ))

locking a property:

AsyncRun ( part AsyncLock ( lockName = "someName", part = SaveUser ( ... ) ) )

On main thread (for UI/UX):

MainThreadRun ( part SomeObject )

vendakka · on Oct 16, 2013

Looks very nice!

Does this play well with existing JVM threading support? More specifically, if there is a call to a synchronized method inside of a fiber and another JVM thread has entered the monitor, will this block the entire fiber scheduling thread?

The reason I ask is I'd like something that plays well with legacy code.

pron · on Oct 16, 2013

A synchronized method would block the entire thread, but calls to ReentrantLock.lock, or any other java.util.concurrent class, can be turned from thread-blocking to fiber blocking.

meowface · on Oct 16, 2013

Is this similar to green threads / "greenlets" in Python? They look to be the same concept.

fzzzy · on Oct 16, 2013

One thing required for an Actor model that is missing from greenlets and Python in general is the ability to have isolated contexts. Basically, each Actor should have its own global state and shouldn't be able to share state with any mechanism other than message passing.

In Python with greenlets, state can leak between green threads through module globals and other module state.

meowface · on Oct 17, 2013

Well, greenlets can and do have their own isolated contexts, but you're right, they can indeed leak state. Thanks for the clarification.

pron · on Oct 16, 2013

I'm not too familiar with Python greenlets, but I believe they don't exploit multi-core hardware as well due to the GIL. Quasar fibers run and scale extremely well on multi-core hardware.

fzzzy · on Oct 16, 2013

One can always start up multiple Python processes, though.

pron · on Oct 17, 2013

You can, but communication between processes is usually more difficult than communication between threads. Sometimes you want to do more than pass messages. For example, you might want to do something as simple as increment a single counter. Sharing a counter efficiently among processes is not so easy.

fzzzy · on Oct 17, 2013

It is true that communication between processes is more difficult, but solving ipc also allows distributing actors across the network for free, so there is a lot of advantage to be gained by allowing for out of process actors.

I don't understand your counter example at all. Are you saying that sometimes you share memory (the counter) between Actors in the same process? Because that is not Actors, and you should probably find something else to call it instead.

pron · on Oct 17, 2013

If your only communication mechanism is actors, then you're right: several process would be fine. But I think that almost any software built with actors requires some shared memory as well. Even Erlang has ETS. In the demo, we use an in-memory spatial database. It's this shared data store that's hard to get right with several processors, especially if it is supposed to help with parallelization and scheduling.

BTW, Quasar also has distributed actors.

mpweiher · on Oct 16, 2013

And we nowadays have the hardware resources to run this on one CPU per spaceship, at least theoretically:

http://blog.metaobject.com/2007/09/or-transistor.html

Needs some interconnect, of course...

stevefturner · on Oct 23, 2013

I think I'm being dense... can someone explain the difference between a 'blocking' fiber and Ada's task/rendezvous constructs? Both seem like synchronous message passing mechanisms?

EGreg · on Oct 16, 2013

How does this compare with Grand Central Dispatch on the Mac?

roryokane · on Oct 16, 2013

I can't compare the principles by which they work, but in terms of which to choose, I think you will never have to make that decision, since GCD is only for (Objective-)C programs, while Quasar is only for the JVM.

knodi · on Oct 16, 2013

I'm not a fan of this approach. I like what Go does with channels and I like what D does with synchronized functions. Its simple and powerful and no magic. Fuck magic.

pron · on Oct 16, 2013

Quasar gives you channels just like Go: you can have primitive channels, you can select from several channels at once, or anything else you'd do with Go. As a bonus, it performs better than Go.

tuxychandru · on Oct 16, 2013

Have you published the benchmark source codes used for the comparison, anywhere? I am interested in figuring out the bottlenecks that make go perform worse.

dschiptsov · on Oct 17, 2013

The other day some guys proudly re-implemented jemalloc in pure Java - https://blog.twitter.com/2013/netty-4-at-twitter-reduced-gc-... now these guys re-implemented a half of Erlang.)

Isn't it better (and bitter) to face the reality and just use Erlang or Go or at least to ask oneself why should everything be stuffed into JVM in 2013?)

pron · on Oct 17, 2013

Go has nothing to do with it. The JVM is a superset of Go, and Go's strengths lie mostly in a short startup time.

Erlang is a different matter. We love Erlang. But the JVM ecosystem is not only two or three orders of magnitude bigger, but the JVM serves other requirements as well like excellent performance (performance is not Erlang's strongest suit, and more than a few Erlang projects require C code to meet performance requirements).

Some projects will be best served by Erlang, but many will benefit from Erlang's capabilities on the JVM.

In short – Erlang is awesome, the JVM is awesome, extremely popular and very successful. Why not combine their strengths? We already have a full Erlang implementation for the JVM, and we think that Pulsar (Quasar's Clojure API) really brings together the best of both Erlang and Clojure.

There are many technical advantages to using the JVM, too. Because it has really good low-level concurrency constructs, you can implement state-of-the-art concurrent data structures in Java. This is downright impossible in Erlang, as in all pure functional languages. This is as it should be, because these languages work at a higher level. The problem is that BEAM, Erlang's VM, operates at the same level, too, so if you want to write a concurrent DS for Erlang you'll need to do that in C. It's a lot harder than it may seem, because many of these data structures require a good GC, and the BEAM the GC can't help because it only manages process-private heaps.

dschiptsov · on Oct 17, 2013

I am really appreciate your drive and effort, thank you for the reply.

In my opinion, however, as I could gather from the writings of mr. Armstrong, (one of) the fundamental problem with JVM is that it lacks a process isolation, and when it crashes, everything crashes completely. He explicitly pointed out this in his thesis - JVM cannot provide fault-tolerance due to being a mere user-level multi-thread process.

As a person who had experience of running huge Java crapware like Business Objects I could tell that yes, it crashes and it crashes often, and when it crashes there are situations in which there is no way to preserve data integrity and plain re-installation is required.

I am also not quite sure about any superior concurrency constructs which aren't based on OS primitives, but I am not Java guy.

Go is a way of doing things without a VM.)

pron · on Oct 17, 2013

> In my opinion, however, as I could gather from the writings of mr. Armstrong, (one of) the fundamental problem with JVM is that it lacks a process isolation, and when it crashes, everything crashes completely. He explicitly pointed out this in his thesis - JVM cannot provide fault-tolerance due to being a mere user-level multi-thread process.

This is true in general, but not entirely accurate. When a Java thread crashes it doesn't bring down the whole JVM any more than when an Erlang process does. Just the one thread dies. With Quasar you get the same isolation for fibers.

It is true, however, that one thread in Java could negatively impact the performance of another by triggering a GC, while in Erlang each process has its own private heap. The Erlang approach (or the BEAM approach, rather, as its a feature of the VM - not the language) provides this isolation because Erlang was designed for systems where fault-tolerance is the number one concern. But it has its cost, too. The lack of a global heap makes it impossible to implement useful shared data structures, so Erlang provides some simple shared data-structures (like ETS) implemented in C, but those aren't garbage collected.

Also, the JVM has a big performance advantage over BEAM. That's why quite a few Erlang projects need to code some performance critical functions in C. But once you do that, you lose Erlang's isolation guarantees: a failed C function could bring down the entire application, and one that's stuck in an infinite loop will affect the performance of other processes.

> I am also not quite sure about any superior concurrency constructs which aren't based on OS primitives, but I am not Java guy.

You can start by looking here: http://docs.oracle.com/javase/7/docs/api/java/util/concurren...

None of these classes uses kernel mutexes or other synchronization mechanisms.

dschiptsov · on Oct 17, 2013

> When a Java thread crashes it doesn't bring down the whole JVM any more than when an Erlang process does. Just the one thread dies.

I think this is inaccurate also.) Technically there is no memory protection from one pthread to another, so "crashed" pthread could damage shared data or the common stack. It is, however, not JVM's problem but of the pthreads as a concept, and Armstrong argued that only share-nothing architecture (process-based) could be fault-tolerant, and pthreads are just "broken by design".

pron · on Oct 17, 2013

True, but this is not black-and-white, but a matter of degree. Erlang processes also share memory: ETS. A crashed process could well leave an ETS table in an applicatively illegal state. So isolation is a scale. With Quasar we try to tip the scale closer to Erlang's isolation levels, but, as I've said, shared data structures could be extremely useful, too.

If fault-tolerance is your most important requirement, that far exceeds in its importance any other requirement, then by all means use BEAM. It was designed for precisely that kind of application.

If, however, fault-tolerance is just one of several important requirements, then the JVM will be the better choice in many circumstances.

twic · on Oct 17, 2013

Your experience of the JVM crashing and crashing often is, i believe, rather unusual.

I have been working with the JVM for many years, on large and complex systems, and i recall exactly two actual crashes - one due to a HotSpot compiler bug, and one due to a stack overflow bug in the garbage collector.

I have been part of various Java communities in physical and virtual space for as many years; we spend a lot of our time talking (or shouting) about problems we face with Java and the JVM, and although there is no shortage of them, crashes are not something i hear other people complaining about either.

dschiptsov · on Oct 17, 2013

I do remember Google Wave developers also mentioned crashes on this very site. Actually, according to them, it was one of the problem with Wave project. I cannot provide an exact citation but it is quite easy to crosscheck.

Another well-known story is when digg.com was switched to Cassandra. It keep crashing under load.

To be clear, when one gives it two times the size of workload of RAM, running only one JVM process on a whole server and has no fluctuation either in data flows or connection rate (they call it tuning), well it works.

frozenport · on Oct 16, 2013

I think the approach is interesting but I don't understand how this considered theoretical. 10,000 elements for an N-Body problem is expected.

What I am more confused about is how this considered peak optimization.

Assuming they are utilizing doubles and doing both read and write I get the following computation:

(10000x10x8x2 bytes per second) or 12 Megabits per second vs the theoretical bandwidth of a PCIe of 40 Gbs?

Are they computationally limited and what is their memory access pattern?

pron · on Oct 16, 2013

This is far from an optimal simulation, because the framework is so general. The spatial database gives you true isolated transactions that require (fiber-blocking) locks.

In fact, the code is very naive, and that's our main point. Even naive code can scale well with this approach. We care more about scaling than sheer performance.

So your calculation is wrong, as we're not trying to approach a theoretical limit (which would require optimizing the algorithm) but to demonstrate scaling of a naive algorithm. For example, instead of a single spatial join, each spaceship queries its surroundings: this is asymptotically (n^2 vs n) worse than a single join.

perlgeek · on Oct 16, 2013

>On my 4-core (8 virtual cores) i7 MacBook, with 10,000 spaceships, I get close to 10 simulation cycles per second. [...]

> When running the simulation synchronously, i.e. with a phaser, performance drops to about 8 cycles per second on my development machine.

> Performance – we are able to fully exploit the computing power of modern multi-core hardware.

So, 25% faster with 8 cores is "fully exploit the computing power of modern multi-core hardware". WTF?

wtetzner · on Oct 16, 2013

When he says he's using a phaser, he means that updates happen in lockstep. Each update still happens on multiple cores, but each fiber will not move on to the next update until all of the other fibers have finished the current update.

So it's not synchronous in the sense that it's running everything sequentially.

pron · on Oct 17, 2013

Exactly. Still parallel, but don't start the next cycle until the previous one has completed.

Morgawr · on Oct 16, 2013

8 cores != 8x speedup