Async-await on stable Rust

Dowwie · on Nov 7, 2019

This is a major milestone for Rust usability and developer productivity.

It was really hard to build asynchronous code until now. You had to clone objects used within futures. You had to chain asynchronous calls together. You had to bend over backwards to support conditional returns. Error messages weren't very explanatory. You had limited access to documentation and tutorials to figure everything out. It was a process of walking over hot coals before becoming productive with asynchronous Rust.

Now, the story is different. Further a few heroes of the community are actively writing more educational materials to make it even easier for newcomers to become productive with async programming much faster than it took others.

Refactoring legacy asynchronous code to async-await syntax offers improved readability, maintainability, functionality, and performance. It's totally worth the effort. Do your due diligence in advance, though, and ensure that your work is eligible for refactoring. Niko wasn't kidding about this being a minimum viable product.

Kaladin · on Nov 7, 2019

Is there any resources you could point to learn more about how to use this async programming with?

Dowwie · on Nov 7, 2019

https://rust-lang.github.io/async-book

https://book.async.rs

https://tokio.rs

larusso · on Nov 7, 2019

Thanks for the links. But they are not clickable :)

Dowwie · on Nov 7, 2019

done.

nirui · on Nov 8, 2019

So I guess the Generic Associated Types should be the next then?

I was trying to refactor one of my Rust project the other day and almost immediately got hit by the "No async fn in traits" and the "No lifetime on Associated Type" truck. Then few days later, comes this article: https://news.ycombinator.com/item?id=21367691.

If GAT can resolve those two problems, then I guess I'll just add that to my wish list. Hope the Rust team keep up the awesome good work :)

nicoburns · on Nov 8, 2019

They'll probably be a while still as they're quite a complex feature. But they'll be useful for all sorts.

timvisee · on Nov 7, 2019

Yes this is amazing! And refactoring is actually quit easy to my experience, with two projects I've done this with at least.

bluejekyll · on Nov 7, 2019

I think it's straightforward, but it took me quite a while to do correctly in my dns project.

Dowwie · on Nov 8, 2019

Are there any lessons that would be useful for others to know about?

bluejekyll · on Nov 10, 2019

Yes, I think so. I’ve been planning on writing a blog post.

ComputerGuru · on Nov 7, 2019

I’ve been playing with async await in a polar opposite vertical than its typical use case (high tps web backends) and believe this was the missing piece to further unlock great ergonomic and productivity gains for system development: embedded no_std.

Async/await lets you write non-blocking, single-threaded but highly interweaved firmware/apps in allocation-free, single-threaded environments (bare-metal programming without an OS). The abstractions around stack snapshots allow seamless coroutines and I believe will make rust pretty much the easiest low-level platform to develop for.

vanderZwan · on Nov 7, 2019

Have you ever heard of Esterel or Céu? They follow the synchronous concurrency paradigm, which apparently has specific trade-offs that give it great advantages on embedded (IIRC the memory overhead per Céu "trail" is much lower than for async threads (in the order of bytes), fibers or whatnot, but computationally it scales worse with the nr of trails).

Céu is the more recent one of the two and is a research language that was designed with embedded systems in mind, with the PhD theses to show for it [2][3].

I wish other languages would adopt ideas from Céu. I have a feeling that if there was a language that supports both kinds of concurrency and allows for the GALS approach (globally asynchronous (meaning threads in this context), locally synchronous) you would have something really powerful on your hands.

EDIT: Er... sorry, this may have been a bit of an inappropriate comment, shifting the focus away from the Rust celebration. I'm really happy for Rust for finally landing this! (but could you pretty please start experimenting with synchronous concurrency too? ;) )

[0] http://ceu-lang.org/

[1] https://en.wikipedia.org/wiki/Esterel

[2] http://ceu-lang.org/chico/ceu_phd.pdf

[3] http://sunsite.informatik.rwth-aachen.de/Publications/AIB/20...

kbenson · on Nov 7, 2019

> Er... sorry, this may have been a bit of an inappropriate comment, shifting the focus away from the Rust celebration.

I think if it's interesting and spurs useful conversation, it's appropriate, tangent or not. I for one an thankful for your suggested links, they look interesting.

Rusky · on Nov 7, 2019

It's not the same, but Rust async/await tasks are also in the order of bytes, and you can get similar "structured concurrency"-like control flow with things like `futures::join!` or `futures::select!`.

Ceu looks very neat, I suspect (having not read much about it yet) that async codebases could take a lot of inspiration from it already.

vanderZwan · on Nov 7, 2019

> you can get similar "structured concurrency"-like control flow with things like `futures::join!` or `futures::select!`.

That sounds very promising, I should give it a closer look (I don't program in Rust myself so I only read blogs out of intellectual curiosity)

naasking · on Nov 7, 2019

Or you can use something like protothreads or async.h [1] if you're stuck with C/C++ and need something lightweight.

[1] https://news.ycombinator.com/item?id=21033496

vanderZwan · on Nov 8, 2019

Céu compiles to C actually, and I believe Francisco Sant’Anna (the author of the language) makes a direct comparison to protothreads and nesC in one of his papers (probably the PhD thesis). I had not seen async.h before though, interesting!

[1] http://ceu-lang.org/publications.html

mamcx · on Nov 7, 2019

I also think async (the paradigm) is kind of weird in rust world. I agree with https://journal.stuffwithstuff.com/2015/02/01/what-color-is-....

pcwalton · on Nov 7, 2019

The solution suggested by that article is to use M:N threading, which was tried in Rust and turned out to be slower than plain old 1:1 threading.

If you don't want to deal with async functions, then you can use threads! That's what they're there for. On Linux they're quite fast. Async is for when you need more performance than what 1:1 or M:N threading can provide.

mamcx · on Nov 7, 2019

> If you don't want to deal with async functions, then you can use threads!

Truly? If some very popular lib become async (like actix, request, that I use), I can TRULY ignore it and not split my world in async/sync?

pcwalton · on Nov 7, 2019

You can easily convert async to sync by just blocking on the result. The other way (sync to async) is more difficult and requires proxying out to a thread pool, but it's also doable.

BatmanAoD · on Nov 11, 2019

An `async` function _can_ call blocking functions, of course, it just blocks the entire thread of execution which could otherwise continue making progress by polling another future.

nemothekid · on Nov 7, 2019

Both actix and reqwest are already async, the fact that you haven't noticed yet just shows that you are already ignoring it.

mamcx · on Nov 7, 2019

But the signatures are not. If them mark the types/functions async, what happened?

I must rewrite all the calls?

pas · on Nov 7, 2019

Async is just a syntactic sugar for Future<...>, you can poll() it manually.

You can take the event loop, run it on one thread and do whatever you wish on other threads.

Fearless concurrency, after all.

bluejekyll · on Nov 7, 2019

you can just `.await` on any Future returned directly by APIs, if it's not marked as an async fn.

tomp · on Nov 8, 2019

How hard was it tried?

I imagine there's a reason that languages like Go adopted M:N threading... obviously part of the reason is that it's way more scaleable, but userspace threading is also supposed to be faster, as context switches don't need to switch to kernel space... were the problems tight loops (which AFAIK is also the problem in Go)? or maybe it's just so much easier / more efficient if you also have GC...

dunkelheit · on Nov 8, 2019

In my opinion they tried hard enough. Here are the links to the discussions of that time if you are into that kind of thing:

https://mail.mozilla.org/pipermail/rust-dev/2013-November/00...

https://github.com/rust-lang/rfcs/blob/master/text/0230-remo...

It is clear that they really wanted to make green threading work, but in the end it proved incompatible with other goals of the language.

The main problem as I understand it is with the stack: you can't make the stack big from the beginning (or your threads wouldn't be lightweight anymore) so you need to grow it dynamically. If you grow it by adding new segments to the linked list, you get the "stack thrashing"/"hot split" problem: if you are unlucky and have to allocate/deallocate a segment in a tight loop the performance suffers horribly. Go solved the problem by switching to contiguous relocatable stacks: if there is not enough space in the stack, a bigger contiguous chunk of memory is allocated and old contents of the stack are copied there (a-la C++ vector). Now there is a problem with references to the stack-allocated variables - they become invalid. In go this problem is solvable because it is a managed garbage-collected language so they can simply rewrite all the references to point to the new locations but in rust it is infeasible.

the_why_of_y · on Nov 8, 2019

The reason is that M:N threading inherently requires a language runtime, and the advantage of increased scalability (GHC threads use less than 1KB of memory) comes with the disadvantage that FFI calls are really quite expensive to do, because you need to move to a separate native thread in case the FFI call blocks.

These languages (Erlang, Haskell, Go, ...) have no ambition to be useful for system programming; they're not intended as a replacement for C/C++ in that domain, unlike Rust.

danShumway · on Nov 7, 2019

Author's solution is threads:

> But if you have threads (green- or OS-level), you don’t need to do that. You can just suspend the entire thread and hop straight back to the OS or event loop without having to return from all of those functions.

Correct me if I'm wrong, but wasn't the lack of threads one of the biggest reasons why NodeJS originally outperformed most of its competitors?

Spinning up threads for each concurrent request was expensive, and (nonblocking) async code was by comparison ridiculously cheap, so the lack of overhead meant Node could just make everything async, instead of trying to decide up-front which tasks deserved which resources.

Granted, it's been over a decade since Node came out. Maybe thread overhead has gotten a lot better? But barring faulty memory, I definitely remember a number of people explaining to me back then that being single-threaded was the point.

IfOnlyYouKnew · on Nov 8, 2019

I'm pretty sure that was the official talking point at the time, and some people may have even been motivated enough to actually belief it.

Of course that all changed once Node got threads.

BatmanAoD · on Nov 11, 2019

Did it, though? AFAIK Node still doesn't have OS threads, which are the expensive version. The "About" page still says that Node is "designed without threads": https://nodejs.org/en/about/

bryanlarsen · on Nov 7, 2019

All 5 of his points seem to be 2015 Javascript only. Some of them don't even apply to modern Javascript; I don't see any that apply to rust.

vanderZwan · on Nov 8, 2019

While I'm happy to believe you, with a short comment like that I just have to take your word for it. Could give a short explanation of how Rust already addresses each of these points for those of us who don't program in it?

littlestymaar · on Nov 8, 2019

This article is 90% a rant against “callback hell” that js was facing before async/await was introduced. The remaining 10% stays valid even in the presence of async/await, but the trade-off (ignored by the article) of the alternative is having to manually deal with synchronization primitives (at least channels), which would make zero sense given that JavaScript is a single-threaded environment.

Rust is a different beast, you can have whichever model you like best (OS threads, M:N threads (with third party libs), async/await) but async/await is by far the most powerful, that's why it's such a big deal that it lands on Rust stable.

BatmanAoD · on Nov 8, 2019

I hadn't heard of "synchronous concurrency", but looking at the Céu paper (only briefly so far), I think the model looks very close to how `async`/`await` works in Rust - possibly even isomorphic. This is really exciting, because Rust's model is unique among mainstream languages (it does not use green threads or fibers, and in fact does not require heap allocation), and I wasn't previously aware of any similar models even among experimental languages.

I'll open a thread on users.rust-lang.com to discuss the similarities/differences with Céu, but for now, here's the main similarity I see:

A Céu "trail" sounds a lot like an `async fn` in Rust. Within these functions, `.await` represents an explicit sync-point, i.e., the function cedes the runtime thread to the concurrency-runtime so that another `async fn` may be scheduled.

(The concurrency-runtime is a combination of two objects, an `Executor` and a `Reactor`, that must meet certain requirements but are not provided by the language or standard library.)

vanderZwan · on Nov 8, 2019

That is a correct understanding of how `await` works in Céu. What is important to note however is that all trails must have bounded execution, which means having an `await` statement. One abstraction that Céu uses is the idea that all reactions to external events are "infinitely" fast - that is, all trails have bounded execution, and computation is so much faster than incoming/outgoing events that it can be considered instantaneous. It's a bit like how one way of looking at garbage collected languages is that they model the computer as having infinite memory.

Having said that, the combination of par/or and par/and constructs for parallel trails of execution, and code/await for reactive objects is like nothing I have ever seen in other languages (it's a little bit actor-modelish, but not quite). I haven't looked closely at Rust though.

Céu also claims to have deterministic concurrency. What it means by that is that when multiple trails await the same external event, they are awakened in lexical order. So for a given sequence of external events in a given order, execution behavior should be deterministic. This kind of determinism seems to be true for all synchronous reactive languages[0]. Is the (single-threaded) concurrency model in Rust comparable here?

The thesis is a bit out of date (only slightly though), the manual for v0.30 or the online playground might be a better start[1][2].

[0] which is like... three languages in total, hahaha. I don't even remember the third one beyond Esterel and Céu. The determinism is quite important though, because Esterel was designed with safety-critical systems in mind.

[1] https://ceu-lang.github.io/ceu/out/manual/v0.30/

[2] http://ceu-lang.org/try.php

BatmanAoD · on Nov 9, 2019

I'm not sure I see the connection between bounded execution and having an `await` statement. Rust's `async` functions, just like normal functions, definitely aren't total or guaranteed to terminate. They also technically don't need to have an `await` expression; you could write `async fn() -> i32 { 1 + 1 }`. But they do _no_ work until they are first `poll`ed (which is the operation scheduled by the runtime when a function is `await`ed). So I believe that is equivalent to your requirement of "having an `await` statement". You could even think of them as "desugaring" into `async fn() -> i32 { async {}.await; 1 + 1 }` (though of course that's not strictly accurate).

I think it's reasonable to consider well-written `poll` operations as effectively instantaneous in Rust, too, though since I haven't finished the Céu paper yet I don't yet understand why that abstraction is important to the semantics of trails.

As for `par`/`or` and `par`/`and`, I expect these are equivalent to `Future` combinators (which are called things like `.or_else`, `.and_then`, etc).

Since Executors and Reactors are not provided by the language runtime, I believe it would be easy (possibly even trivial) to guarantee the ordering of `Future`-polling for a given event. I am guessing that for Executors that don't provide this feature, the rationale is to support multi-threaded waking (i.e. scheduling `poll` operations on arbitrary threads taken from a thread pool). When using `async`/`await` in a single thread, I'd be mildly surprised if most Executors don't already support some kind of wake-sequencing determinism.

In any case, I've now opened a discussion on users.rust-lang about whether Rust can model synchronous concurrency: https://users.rust-lang.org/t/can-rusts-async-await-model-th...

vanderZwan · on Nov 10, 2019

In Céu, an `await` statement is ceding execution to the scheduler. It is the point at which a trail goes to sleep until being woken up again. So "bounded" here means "is guaranteed to go to sleep again (or terminate) after being woken up". The language does not compile code loops if it cannot guarantee that they terminate or not have an `await` statement in every loop.

I'll try to explain how the "instantaneousness" matters for the sake of modelling determinism. Take the example of reacting to a button press: whenever an external button press event comes in, all trails that are currently awaiting that button press are awakened in lexical order. Now imagine that there are multiple trails in an infinite loop that await a button press, and they all take longer to finish than the speed at which the user mashes a button. In that case the button events are queued up in the order that they arrive. If the user stops then eventually all button presses should get processed, in the correct order.

The idea is that external events get queued up in the order that they arrive, and that the trails react in deterministic fashion to such a sequence events: if the exact same sequence of events arrives faster or slower, the eventual output should be the same. So while I might produce congestion if I mash a button very quickly, it should have the same results as when I press them very slowly.

Now, you may think "but what if you were asked to push a button every two seconds, with a one second time-window? Then the speed at which you press a button does matter!" Correct, but in that case the two second timer and one second time window also count as external events, and when looking at all external events then it again only matters in what the order in which all of these external events arrive at the Céu program.

Lacking further knowledge about Rust I obviously cannot say anything about the rest of your comment, but I hope you're right about the `Future` combinators because I really enjoyed working with Céu's concurrency model.

estebank · on Nov 7, 2019

Keep in mind that the current release only brings an MVP of the async/await feature to the table. The two things I've missed are both no_std support and async trait methods, but there are reasons these haven't been completed yet. That doesn't mean they won't be available in the future, just that the team has prioritized to release a significant part of the feature that many will already find useful.

rtpg · on Nov 7, 2019

related: async/await is also amazingly useful for game development. Stuff like writing an update loop, where `yield` is "wait until next frame to continue".

This can let you avoid a lot of the pomp and circumstance of writing state machines by hand.

kd5bjo · on Nov 7, 2019

I played around with exactly that concept during Ludum Dare last month, if you're interested: https://github.com/e2-71828/ld45/blob/master/doc.pdf

jimmyspice · on Nov 9, 2019

I don't know any rust,. but that was a really good read. what is your background, and what materials did you use getting into Rust?

Your explanations were surprisingly simple

kd5bjo · on Nov 9, 2019

Thanks. I was a software engineer at several different SF-area companies for about a decade, and then I decided to take some time off from professional programming. I'm now a full-time student in a CS Master's program. As I'll eventually need to produce a thesis, this was partly an exercise to practice my technical writing.

I picked up Rust for my personal projects a couple of years ago, and mostly worked from the official API docs, occasionally turning to blog posts, the Rust book, or the Rustonomicon. Because I didn't have any time pressure, I ignored the entire crate ecosystem and opted to write everything myself instead. This has left some giant gaps in my Rust knowledge, but they're in a different place than is typical.

As far as the explanations go, I realized that good authors don't say important things only once: they repeat everything a few different times in a few different ways. So, I tried to say the exact same things in the prose as in the code listings, and trusted that the fundamental differences between Rust and English would make the two complement each other instead of simply being repetitive.

littlestymaar · on Nov 7, 2019

The Fushia team also uses Rust's async/await this way in the implementation of their network stack.

Matthias247 · on Nov 7, 2019

Yeah, that's a super interesting idea that I'm also toying with in my head. One issue however is the "allocation-free" part. Sooner or later you typically hit a situation where you need to box a Future - either because you want to spawn it dynamically or need dynamic dispatch and type erasure. At that point of time you will need an allocator.

I'm still wondering if lots of the problems can be solved with an "allocate only on startup" instead of a "never allocate" strategy, or whether full dynamical allocation is required. Probably needs some real world apps to find out.

gpderetta · on Nov 8, 2019

I'm not familiar with rust futures stuff, but having researched similar problems in C++, the important part is giving the control of allocation (when and how) to the application.

Matthias247 · on Nov 8, 2019

Right. There are some possibilities with custom allocators. But that is still kind of a non-explored area in Rust. Especially the error handling around it. Looking forward to learn more.

gregwtmtno · on Nov 7, 2019

I've dipped my toes into embedded rust from time to time. Can you give me an example or two of how you would use async/await in an embedded environment? I'm just curious about how it would work.

guntars · on Nov 7, 2019

Another infrequent toe dipper here - say you need to send some command to a peripheral, wait for the device to acknowledge it, then send more data. The naive implementation would poll for the interrupt or just go to sleep, meaning no other user code can run in that time. A naive async implementation would spread your code all over the place - it wouldn't just be a function call with three statements. There are libraries that can give you lightweight tasks, but you might need to keep separate stacks for each of the suspended tasks, there might be other ones that don't, but require the code is written in a non trivial way. Rust with async/await on no_std can give you best of both worlds - easy to read sequential code while the best possible performance.

ppf · on Nov 8, 2019

>poll for the interrupt

Errr, no. You either poll, or set up an interrupt to catch an event. The point of an interrupt is to allow other code to continue to run while waiting for a peripheral.

guntars · on Nov 8, 2019

Sorry, I mean polling for the ready register, not interrupt.

NewJazz · on Nov 7, 2019

Which executor do you use? Tokio is a bit heavy, no?

steveklabnik · on Nov 7, 2019

Not the parent, but I've been keeping my eye on https://github.com/Nemo157/embrio-rs

dbcurtis · on Nov 7, 2019

Not much doc at the github. Can you provide a quick overview?

steveklabnik · on Nov 7, 2019

Not much I can say other than “an executor designed specifically for embedded.” I don’t do embedded dev myself, so it’s more of a curiosity thing for me than something that I can give you a lengthy explanation of.

Matthias247 · on Nov 8, 2019

Embrio provides only a very limited executor. it can just drive only single Future. Which gets re-polled every time a certain system interrupt occurs. You can obviously have subtasks of that single task (e.g. via join! and friends), but you can't have other dynamic tasks.

jrobsonchase has implemented a more dynamic solution which uses custom allocators for futures: https://gitlab.com/polymer-kb/firmware/embedded-executor

I think there might also be other solutions out there. I remember the Drone OS project also doing something along that.

rendaw · on Nov 8, 2019

I'd like to hear more about this - I thought Tokio was just some tools on top of MIO which is just `select` (or similar). Is Tokio heavy?

nabla9 · on Nov 7, 2019

It's like old Windows programming (Windows 3.X). Cooperative multitasking is process control version of manual unsafe memory management. When your app suddenly freezes, you discovered a bug.

It's for cases where alternatives don't exist or they are too expensive.

Language level green threads are safer abstraction over asynchronous I/O operations.

ComputerGuru · on Nov 8, 2019

The difference is that in an embedded context hopefully everything is written and tested all together by the same entity, rather than a mish-mash of high and low quality code held hostage by the weakest link.

jstimpfle · on Nov 7, 2019

> I believe will make rust pretty much the easiest low-level platform to develop for.

This stuff is easily available for C. It's just a function call instead of syntactic sugar. On Windows, use Fibers. On Linux, there is a (deprecated) API as well (forget the name). Or use a portable wrapper library.

Or just write a little assembly. Can't be hard.

ComputerGuru · on Nov 8, 2019

How exactly do I write embedded code for a tiny chip on a different architecture with just 4K of RAM and no OS that uses the Win32 fibers?

jstimpfle · on Nov 8, 2019

Probably not using green threads / async at all? That would require a number of stacks as well as a little runtime, which will quickly eat up your 4K.

tmandry · on Nov 8, 2019

Just to clarify for those following along, Rust async code does not use green threads and doesn't require a stack per task.

jstimpfle · on Nov 8, 2019

I'm missing some of the technical details here, but from a quick glance of the article it seems like Rust's futures are lazy. I.e. a stack would only be allocated when the future is actually awaited. But in order to execute the relevant code, a call stack per not-finished future is still needed, or am I missing something?

dthul · on Nov 8, 2019

Afaik Rust's futures compile to a state machine, which is basically just a struct that contains the current state flag and the variables that need to be kept across yield points. An executor owns a list of such structs/futures and executes them however it sees fit (single-threaded, multi-threaded, ...). So there is no stack per future. The number of stacks depends on how many threads the executor runs in parallel.

jstimpfle · on Nov 8, 2019

> the current state flag the variables that need to be kept across yield points.

you mean like... a stack frame?

littlestymaar · on Nov 8, 2019

Like a stack frame, but allocated once of a fixed size, instead of being LIFO allocated in a reserved region (which itself must be allocated upfront, when you don't know how big you're going to end up).

The difference being: if your tasks need 192B of memory each, and you spawn 10 of each, you just consumed a little less than 2kB. With green threads, you have 10 times the starting size of your stack (generally a few kB). That makes a big difference if you don't have much memory.

jstimpfle · on Nov 8, 2019

So that's actually green threads in my book (in a good implementation I expect to be able to configure the stack size), with the nice addition that the language exposes the needed stack size.

littlestymaar · on Nov 8, 2019

It's a stackless coroutine. AFAIK, the term “green thread” is usually reserved to stackful coroutines, but I guess it could also be used to talk about any kind of coroutines.

rcxdude · on Nov 10, 2019

It's more efficient (potentially substantially more so). In a typical threaded system you have some leaf functions which take up a substantial amount of stack space but don't block so you don't need to save their state between context switches. In most green threaded applications you still need to allocate this space (times the number of threads). The main advantage of this kind of green threads is you can seperate out the allocation which you need to keep between context switches (which is stored per task)versus the memory you only need while actually executing (which is shared between all tasks). For certain workloads this can be a substantial saving. In principle you can do this in C or C++ by stack switching at the appropriate points but it's a pain to implement, hard to use correctly (the language doesn't help you at all), and I've not seen any actual implementations of this.

dunkelheit · on Nov 8, 2019

Kinda like a stack frame, but more compact and allocated in one go beforehand.

gpderetta · on Nov 8, 2019

because of syntactical restrictions of how await work, at most you need to allocate a single function frame, never a full stack, and often it doesn't even need to be allocated separately and can live in the stack of the underlying OS thread.

jstimpfle · on Nov 8, 2019

So that async function cannot call anything else?

gpderetta · on Nov 8, 2019

They can, but the function itself cannot be suspended by calling something else (i.e. await being a keyword enforces this), so any function that is called can use the original OS thread stack. Any called function can in turn be an async function, and will return a future[1] that in turn capture that function stack. So yes, a chain of suspended async functions sort of looks like a stack, but its size is known a compile time [2].

[1] I'm not familiar with rust semantics here, just making educated guesses.

[2] Not sure how rust deals with recursion in this case. I assume you get a compilation error because it will fail to deduce the return value of the function, and you'll have to explicitly box the future: the "stack" would then look a linked list of activation records.

littlestymaar · on Nov 8, 2019

Async function don't really call anything by themselves: the executor does, and all function called in the context of an async function is called on the executor's stack. You just end up with a fixed number of executors running on their own OS thread with a normal stack.

weberc2 · on Nov 7, 2019

[flagged]

jstimpfle · on Nov 7, 2019

What gives you doubts I'm serious?

weberc2 · on Nov 7, 2019

Proposing asm and deprecated APIs, "just use a portable wrapper library" as though there are any such (good) libraries, type safety, etc.

jstimpfle · on Nov 7, 2019

The fact is, all "green threads" do is swapping between call stacks and call register sets. The call stack is usually just a register itself. How hard can it be to save a register and restore it later?

> as though there are any such (good) libraries

Win32 Fibers are extremely easy to use (and not deprecated). I used them once, it was a blast. I wrote a simple wrapper around them in less than 50 lines of straightforward code, exposing maybe a struct and 3-4 function calls. Never had any problems.

That makes writing your own little non-preemptive scheduler extremely easy as well. That might be as little as another 50-100 lines of code. So you get a lot of control at almost no price.

POSIX C has swapcontext() / setcontext() which must be pretty similar (although I'm not sure I've ever used it) and if I understand correctly it was only obsoleted due to a syntactical incompatibility with C99 or something like that...

The only problem here is that green threads are a little bit of an awkward computational model in many cases. But that's not a syntax problem.

Rusky · on Nov 8, 2019

An implementation of green threads-style register saving cannot be as memory-efficient as async/await, because it's more dynamic.

Green threads must save all (callee-save) registers, support arbitrary dynamic stack growth, and work with unsuspecting frames in the middle of the stack.

Async/await has the "thread" itself do all the state saving, so it can save only what's actually live. It computes a fixed stack size up front, so can be done with zero allocation. And because all intermediate stack frames are from async functions, they can also be smaller- transient state can be moved to the actual thread stack, potentially shrinking the persistent task state.

(This also has benefits for cancellation- there is no need to use the exception/stack unwinding ABI to tear down a "stack," because it is instead a normal struct with normal drop glue.)

jabl · on Nov 8, 2019

> Win32 Fibers are extremely easy to use (and not deprecated).

Apparently MS is considering depreciating them: http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p136...

That's in general an interesting paper about M:N threads, fibers, green threads, or whatever you want to call them.

jstimpfle · on Nov 8, 2019

Yeah, so what they write in section 3.1 is basically what I stated elsewhere on this comments page. It's probably not that Win32 Fibers is a bad implementation of Green Threads, but more that Green Threads is an awkward model to work with in many cases.

> Current recommendation is to avoid using fibers and UMS. This advice from 2005 remains unchanged: “... [I]nstead of spending your time rewriting your app to use fibers (and it IS a rewrite), instead it's better to rearchitect your app to use a "minimal context" model - instead of maintaining the state of your server on the stack, maintain it in a small data structure, and have that structure drive a small one-thread-per-cpu state machine.”

fooyc · on Nov 7, 2019

This is a big improvement, however this is still explicit/userland asynchronous programming: If anything down the callstack is synchronous, it blocks everything. This requires every components of a program, including every dependency, to be specifically designed for this kind of concurency.

Async I/O gives awesome performance, but further abstractions would make it easier and less risky to use. Designing everything around the fact that a program uses async I/O, including things that have nothing to do with I/O, is crazy.

Programming languages have the power to implement concurrency patterns that offer the same kind of performances, without the hassle.

brandonbloom · on Nov 7, 2019

Yup. See https://journal.stuffwithstuff.com/2015/02/01/what-color-is-... for a pretty good explanation of the async/await problem.

I know Rust is all about zero-cost abstractions, "but at what cost" beyond just runtime cost? I appreciate their principled approach to mechanical sympathy and interop with the C abstract machine, but I'm just not enthused about this particular tradeoff.

An alternative design would have kept await, but supported async not as a function-modifier, but as an expression modifier. Unfortunately, as the async compiler transform is a static property of a function, this would break separate compilation. That said, I have to wonder if the continuously maturing specialization machinery in Rustc could have been used to address this. IIUC, they already have generic code that is compiled as an AST and specialized to machine code upon use, then memoized for future use. They could specialize functions by their async usage and whether or not any higher-order arguments are themselves async. It might balloon worst-case compilation time by ~2X for async/non-async uses (more for HOF), but that's probably better than ballooning user-written code by 2X.

ameixaseca · on Nov 7, 2019

Async adds a runtime requirement, so Rust cannot just take the same approach as Go. You only have a scheduler if you instantiate one. And having a runtime or not has nothing to do with the C abstract machine, but with precise control on what the code does.

For instance, Go does not give you the option to handle this yourself: you cannot write a library for implementing goroutines or a scheduler, since it's embed in every program. That's why it's called runtime. In Rust, every bit of async implementation (futures, schedulers, etc) is a library, with some language support for easing type inference and declarations. This should already tell you why they took this approach.

Regarding async/await and function colors (from the article you posted), I would much rather prefer Rust to use an effect system for implementing this. However, since effects are still much into research and there is no major language which is pushing on this direction (maybe OCaml in a few years?) it seems like a long shot for now.

jchw · on Nov 8, 2019

Go is even more different. Rust async has explicit yields with await, but Go does it implicitly at various key locations. This is actually pretty surprising to many folks, it was in the past and maybe still is possible to deadlock Go with a certain incantation of tight looping.

Another difference with Rust is that async functions are stackless. Go has to contend with having a non-conventional stack and interoperability with C code (and sometimes syscalls, vDSO, etc.) that expects a relatively large stack requires pivoting.

I do wish Rust could solve this problem, but the two approaches are very different indeed. I think it’s a fact of life that accidental blocking will exist in Rust, approaches that prevent it are complicated and indeed have runtime costs.

systemcluster · on Nov 8, 2019

> [...] Go does it implicitly at various key locations. This is actually pretty surprising to many folks, it was in the past and maybe still is possible to deadlock Go with a certain incantation of tight looping.

This is being worked on here: https://github.com/golang/go/issues/24543

tomp · on Nov 8, 2019

Is there any documentation on the latest status on this, how far along they've come and what technical solutions they're considering / have settled on?

jchw · on Nov 8, 2019

The proposal is marked accepted, and commits are being made to reference it, so I suppose the design doc and that issue are most likely the source of truth.

jchw · on Nov 8, 2019

Fascinating. This would make Goroutines one giant leap closer to being thread-like. It feels weird how much of the OS scheduler is being “duplicated” but you can’t really argue much with the results.

brandonbloom · on Nov 10, 2019

> so Rust cannot just take the same approach as Go

I did not suggest that it do so. In fact, I suggested an approach that wouldn't fit for Go!

In Go, you've got pretty strict modular compilation. Because of this, specialization -- which is pervasive in Rust -- is virtually absent in Go. Rust's rich generics system demands robust specialization machinery. This machinery, which handles instantiation of explicit generics, can be repurposed to handle automatic instantiation of implicit generics. In this case, I'm suggesting that a function could be compiled as either synchronous or asynchronous as determined by the use-site.

> In Rust, every bit of async implementation (futures, schedulers, etc) is a library

Nothing about my suggestion precludes this.

infogulch · on Nov 8, 2019

Exactly! Most code doesn't care whether it's sync or async, the compiler should decide which specialization to use when the program is built. The only places where you usually care is either 1: the very top of the program (some event loop handler called near main) or 2: the very bottom, when you're implementing an IO library. The specialization is decided by what you do at the very top; that decision propagates down the call chain, specializing everything in the middle that doesn't care; until it gets to a function in your IO library where it chooses between the pre-defined sync or pre-defined async implementations.

So you have a synchronous http server and you decide you want to make it async? Ok, no problem, switch to an async-enabled request handler in main, and boom everything that you wrote is recompiled into a state machine a la async, and at the very bottom where it actually calls the library to write out to the socket it, the library knows what the context is and can choose the async-enabled implementation.

I'm glossing over some important details and restrictions that might make this more complicated in practice, but I think it should at least be possible for functions to opt-in to 'async-specializability' to avoid having to rewrite the world for async.

thayne · on Nov 7, 2019

> An alternative design would have kept await, but supported async not as a function-modifier, but as an expression modifier.

How is that different than an `async` block?

brandonbloom · on Nov 10, 2019

It's not. I didn't realize Rust had that, but the rest of my comment still applies: it would be nice if asynchrony was transparent to intermediate functions and compiled via specialization.

littlestymaar · on Nov 7, 2019

> Programming languages have the power to implement concurrency patterns that offer the same kind of performances, without the hassle.

Can you give one that reaches this goal? Go is often cited on that regard but it doesn't really fit your description since it trades performance for convinience (interactions with native libraries are really slow because of that) and still doesn't solve all problems since hot loops can block a whole OS thread, slowing down unrelated goroutines. (There's some work in progress to make the scheduler able to preempt tigh loops, though).

Felz · on Nov 7, 2019

I think Project Loom would fit. It's a remarkably sane approach:

https://cr.openjdk.java.net/~rpressler/loom/Loom-Proposal.ht...

ceronman · on Nov 7, 2019

As a Java developer I'm really looking forward for Project Loom. I think it's a great approach that avoids the pitfalls of the two-colored functions approach.

However, Project Loom doesn't fit into the "zero cost abstraction" paradigm of rust. Project Loom requires quite a lot of support form the JVM runtime:

> The main technical mission in implementing continuations — and indeed, of this entire project — is adding to HotSpot the ability to capture, store and resume callstacks not as part of kernel threads. JNI stack frames will likely not be supported.

It also still require manual suspension so JIT compiled tight loops will likely still cause a problem.

stuhood · on Nov 7, 2019

I think that that makes it fairly similar to Rust's approach then... compiler support for suspension, and cooperative scheduling.

pgt · on Nov 7, 2019

Clojure's pmap function (parallel map) is pretty cool: http://clojure.github.io/clojure/clojure.core-api.html#cloju...

ibotty · on Nov 7, 2019

There is rayon in rust land. This is a pretty different (simpler) thing.

fnord123 · on Nov 7, 2019

Rayon uses iterators. Iterators are the dual of first order functions. Iterators are superior because they are more general. (You can implement pmap using iterators. You cannot implement iterators using pmap.)

wwright · on Nov 8, 2019

Does Rayon use iterators, or just iterator-like combinators? I don’t think a for-loop would work with Rayon, would it?

You can also definitely implement iterators using first-order functions:

    const iterate = array => {
      const step = index => ({
        value: array[index],
        next: () => step(index + 1)
      });
      return step(0);
    }

    // add some combinators on top

First order functions can in fact implement anything if you’re willing to accept some bad syntax and speed — that’s the Church-Turing equivalence.

fnord123 · on Nov 8, 2019

Hm. You're right. I hadn't used Rayon in so long I forgot it is indeed iterator-like combinators using an entry point called `par_iter`.

wongarsu · on Nov 7, 2019

Python's greenthread implementations are quite good, monkey-patching all calls to work asynchronously with low mental overhead and automatic library comparability (except for libraries that call native code obviously).

Of course Python generally fails to offer "the same kind of performance" for anything limited by CPU or memory, so it technically doesn't fit the description either.

bulldoa · on Nov 7, 2019

> since hot loops can block a whole OS thread

Asking as a beginner, what does the above mean?

Not sure what does hot loop means, and why does it block Os thread

brandonbloom · on Nov 7, 2019

Go creates the illusion of preemptive multithreading by having implicit safe-points for cooperative multithreading. Each IO operation is such a safe-point. If you write an infinite loop like `for {}` where there are no IO operations in loop body, it will block indefinitely. This will prevent the underlying OS thread from being available to other goroutines. The same thing can happen even if you do have IO operations in there, but the work being performed is dominated by CPU time instead of IO.

crawshaw · on Nov 7, 2019

Note that this is being fixed in the next release: https://golang.org/issue/10958

littlestymaar · on Nov 8, 2019

That's cool it's finally coming, this issue is open since 2015! I wonder which performance impact this will have.

jdc · on Nov 7, 2019

Maybe this is a dumb question, but what if a language with implicit safepoints lets you opt out with something like a nosafepoint block/pragma?

wwright · on Nov 8, 2019

An “OS thread” is the most basic part of the program that can actually do things. When it is blocked, it can’t do anything.

OS threads can be expensive, so libraries try to only create a few OS threads.

A “hot loop” (usually, I think, called a “tight loop”) is a loop that does a lot of work for a relatively large amount of time all at once. Things like “looping through a list of every building an Manhattan and getting the average price over two decades.”

With some code like networking, you end up having to wait on other parts of the computer besides the CPU often, for things like uploads and downloads.

“Asynchronous programming” tried to make it easy to keep the CPU busy doing helpful things, even while some of it’s jobs are stuck waiting on uploads/downloads/whatever. This keeps the program efficient, because it can do a little bit of many tasks at once instead of having to complete each task entirely before moving on to the next.

The problem comes when you have a tight loop in a thread that is mostly expecting to be doing asynchronous work around I/O or networking. It is basically trying to juggle with one hand. The program can’t multitask as well, and you end up having to wait longer before it can start each piece of work you want.

staticassertion · on Nov 7, 2019

Hot loops are often "expensive", or where your program is spending a lot of its time.

If you're doing a hot loop, which may be synchronous, you will block the event loop attached to that OS thread, because a hot loop is presumably not yielding control until it is done.

bulldoa · on Nov 7, 2019

hmm, why is that a bad thing, if your entire thread is synchronous shouldn't it technically be blocking? Or is OS thread the main thread?

cfallin · on Nov 7, 2019

The problem is that the async model is a form of cooperative multithreading, so if one computation runs for a long time without returning to the main event loop, it can increase the latency for responses to other events. E.g., if one HTTP request takes a long time to process, and many of the worker-pool OS threads are handling such a request, response time goes up for all the other requests. OS-level concurrency is preemptive (timesliced), so one busy thread doesn't block other requests, but of course with much higher overhead in other ways. Best practice is usually to keep event handlers on event-loop threads short and push heavy computations to other OS-level worker threads.

bulldoa · on Nov 7, 2019

ah, that make sense, I think another user point out that spawning n goroutine does not actually spawn n physical threads but rather queue n task to m thread in the pool, so if we exhaust m thread, n-m task will be blocked.

Thanks for the explanation

I wonder what is the point where each trade off make sense (ie. what is consider heavy computation vs light computation, it probably is related to OS thread allocation time)

harikb · on Nov 7, 2019

> interactions with native libraries are really slow because of that

Can you explain a bit? What is the connection between concurrency implementation (which I am assuming you are talking about multiplexing multiple coroutines over the same OS thread) and say slowness in cgo? Having to save the stack? I don't get it.

yepguy · on Nov 7, 2019

FFI is slow because the main Go compiler uses a different calling convention than everything else. I couldn't tell you if or how that's related to its concurrency features.

bbatha · on Nov 7, 2019

Its an important optimization otherwise each go routine would use a lot of memory, but its not required. The stack allocation strategy has changed a couple times in main compiler, gccgo originally support it and CGo functions behave like normal go mod the stack.

sedachv · on Nov 8, 2019

> Can you give one that reaches this goal?

First-class continuations. They are an efficient building block for everything from calling asynchronous I/O routines without turning your code into callback spaghetti, to implementing coroutines and green threads. Goroutines are a special and poorly implemented case of continuations. Gambit-C has had preemptive green threads with priorities and mailboxes at less than 1KiB per thread memory overhead for over a decade now, all built on top of first-class continuations.

http://www.iro.umontreal.ca/~gambit/Gambit-inside-out.pdf

Rusky · on Nov 8, 2019

First-class continuations are too unrestricted to match the performance of Rust-style async/await.

If you take on a few limitations you can get there, but then those are exactly the things people complain about.

sedachv · on Nov 8, 2019

It is worse than that. First-class continuations add a small overhead for stack operations, and they always have more memory overhead than callbacks/async transformation into callbacks. But the comment I was replying was about Go, not Rust. For the small overhead that first-class continuations have, they provide a simple and efficient way to handle coroutines, generators, asynchronous I/O, and green threads.

jsiepkes · on Nov 7, 2019

Erlang probably comes closest?

jabl · on Nov 7, 2019

Not an Erlang user, but my understanding is that the Erlang VM (BEAM) schedules on function calls. Which works fine for that use, since Erlang does looping with tail calls, but is not a solution for procedural languages.

dnautics · on Nov 8, 2019

Erlang vm is fully preemptive and schedules on something called a reduction. You can call an external function with EFI that can screw things up, but otherwise it's not necessarily on what you might consider a function call .

twic · on Nov 7, 2019

In practice, it's enough to preempt at potentially-recursive function entry and loop backedges. A thread which doesn't recurse or loop has to come to an end pretty quickly, at which point the scheduler will get a go.

floatboth · on Nov 7, 2019

There's no fundamental difference between inserting a yield before a tail call and inserting a yield before a continue statement or closing bracket of the for block, is there?

simcop2387 · on Nov 7, 2019

Fundamentally no, but it's a more difficult thing to do in practice at least. Unless you want to give up other optimizations like loop unrolling and other things like that since you end up losing the fact that the loop exists after some optimization passes.

findjashua · on Nov 7, 2019

BEAM (Erlang VM) provides pre-emptive scheduling

ngrilly · on Nov 8, 2019

True for functions written in Erlang/Elixir, but not in NIF functions implemented in C.

unbalancedparen · on Nov 9, 2019

Thanks to dirty schedulers that is not a big issue: https://medium.com/@jlouis666/erlang-dirty-scheduler-overhea...

ngrilly · on Nov 9, 2019

Thanks for the link. It's a very deep and interesting technical post.

Matthias247 · on Nov 7, 2019

> If anything down the callstack is synchronous, it blocks everything.

If you use a work stealing executor tasks will get executed on another thread. Therefore the impact of accidentally blocking is lowered. Tokio implements such an executor

phamilton · on Nov 7, 2019

I haven't used tokio but in my Scala days the execution context was backed by a thread pool. One blocking call wouldn't kill you because it would just tie up one thread, but the thread pool would quickly get exhausted and lock up the application. Does tokio have the same problem?

AaronFriel · on Nov 7, 2019

There is a maximum number of threads and it's by default set to the # of cores (based on the docs for tokio-executor's ThreadPool and Builder). The docs also say that the # of threads starts at 0 and will increase, so one can do the Scala strategy of starting with large threadpools - one of my projects last year defaulted to 100-200 threads per pool to avoid just this problem.

I think the question you're asking is, "are deadlocks possible?" and the answer to that seems like it would be yes. I would hope that Rust's memory model makes accidental memory sharing dependencies & deadlocks harder to cause, but you can always create 8 threads waiting on a mutex and a 9th that will release it, and have the 8 spinlock on the waiting part.

The "async-std" library, one of a few attempts to write async-aware primitive functions around blocking calls to the filesystem, mutexes, etc, implements async wrappers around mutex and others that should ensure that yield points are correctly inserted and the task goes back into the queues if it's blocked.

That seems to me the right solution - make sure all your blocking calls with a potential for deadlocking check "am I blocked?" first and if so, put themselves back onto the task queue instead of spinning.

gdxhyrd · on Nov 7, 2019

Rust cannot guarantee the lack of deadlocks or overall thread-safety. It does guarantee no data races, though.

AaronFriel · on Nov 7, 2019

Yes, is there some way I could have been more precise about that in my comment?

Matthias247 · on Nov 7, 2019

It will certainly have it's limits. I don't know whether tokio spawns new threads if it detects all others are used up - likely not that this point of time. However work-stealing at least allows to mitigate the impact of some accidentally blocking code. E.g. if a syscall is made that takes longer than expected - or if a library holds a mutex for longer than necessary. It shouldn't be used as an excuse for simply blocking everywhere in an async task executor - but it will to reduce the impact, and give developers some time and wiggle room to improve the associated code.

Analemma_ · on Nov 7, 2019

> Async I/O gives awesome performance, but further abstractions would make it easier and less risky to use. Designing everything around the fact that a program uses async I/O, including things that have nothing to do with I/O, is crazy.

Microsoft kind of tried to do this with the new APIs for UWP: pretty much everything is async, the blocking versions of APIs were all eliminated, so there was no way for the async-ness to "infect" otherwise synchronous code. It was actually a pretty nice way to program; it's a shame it never took off.

yakz · on Nov 7, 2019

They're finally opening the APIs (already have, I think, to some extent) for use with normal desktop apps and "UWP" apps outside of the Store. You can even embed the new UI stuff inside of Forms and WPF apps via XAML islands.

They also lowered their portion of the revenue share considerably for Store apps, afaik.

nicoburns · on Nov 7, 2019

The JavaScript world is pretty close to this. Not quite everything is async, but almost everything is async-first.

Matthias247 · on Nov 7, 2019

The javascript world was forced into this. Since it doesn't (or at least didn't) expose threads, everything had to be non-blocking. Otherwise programs would be non-responsive all the time.

int_19h · on Nov 8, 2019

They do, but only at the expense of interoperability. Any "green thread" solution breaks down once you have to invoke code written in something else while allowing it to call you back. Async futures, on the other hand, can map to any C ABI with callbacks.

So until there's some standardized form of seamless coroutines on OS level, that is sufficiently portable to be in wide use, we won't see widespread adoption of them outside of autarkic languages and ecosystems like Erlang or Go.

MaulingMonkey · on Nov 7, 2019

Targeting the browser via WASM, we don't even have syncronous I/O for many/most things - I've been looking forward to async/await as a means of reigning in some of the awkward APIs and callback hell.

phamilton · on Nov 7, 2019

I am waiting for a language to solve this with the type system and compiler. Give me the ability to mark a thread as async only and a clean (async) interface to communicate with a sync thread. If my async code tries to do anything sync, don't let it compile.

bryanlarsen · on Nov 7, 2019

Whether a function is async-safe isn't black and white. A function that performs calculations may return instantly on a small dataset but block for "too long" on large parameters. On what is "too long" will vary widely depending on your application.

phamilton · on Nov 7, 2019

But whether a function is not async-safe is pretty black and white (i.e. there's a lot of clearly unsafe code). Even a definition as simple as "performs blocking I/O" would be extremely helpful.

This is the value people get out of async only environments like node. Yes, you still have to worry about costly for loops but I don't have to worry about which database driver I'm using because they are all async. In a mixed paradigm language like rust I would really appreciate the compiler telling me I grabbed the wrong database driver.

nicoburns · on Nov 7, 2019

What about long-running computations? They're not really async-safe (they will block the thread), but they don't perform any IO.

yazaddaruvala · on Nov 8, 2019

I think the suggestion is “clearly blocking io should be marked as such (with some marker like unsafe) and other functions can be marked “blocking” if the creator decides it is blocking.”

In the worst case a problem could still occur, but once found, the “problem” function can be marked appropriately. At the least that would start to solve the issue.

leshow · on Nov 7, 2019

There are plenty of synchronous calls in node and js. Not everything is async https://nodejs.org/api/fs.html#fs_dir_readsync

And that's just io, most function calls are synchronous also

Zelizz · on Nov 7, 2019

Aren't you effectively asking the compiler to solve the halting problem?

I think the best you could do would be heuristics - having inferred or user-supplied bounds on the complexity of functions, having rough ideas on how disk or network latency will affect the performance of functions, and bubbling that information up the call tree. It wouldn't be perfect, but it could be useful.

staticassertion · on Nov 7, 2019

Compilers can "avoid" the halting problem if the underlying language is not turing complete/ can express function totality.

You can even express algorithmic complexity in a language.

But it's a bit more complex than that, really. You will have a harder time saying "this loop will block the event system for longer than I'd like".

bluejekyll · on Nov 7, 2019

Sorry, how is Rust not that language? You can use single threaded executors that only require Send and not Sync on data being executed on.

Matthias247 · on Nov 7, 2019

Nit: Singlethreaded executors also don't require "Send", since they don't move things between threads. That allows you too e.g. use non-atomically refcounted things (Rc<T>) on singlethreaded executors, which you can't use in the multithreaded versions.

bluejekyll · on Nov 7, 2019

Yes, you're quite correct, and I don't think a nit. That is much more accurate.

phamilton · on Nov 7, 2019

Does the compiler prevent me from calling a library that calls a library that performs a sync action?

eximius · on Nov 7, 2019

What is a 'sync action'?

Async is fundamentally cooperative multitasking. There is no real difference between the 'blocking'-ness of iterating over a large for-loop and doing a blocking I/O action - the rest of your async tasks are blocked either way while another task is doing something.

phamilton · on Nov 7, 2019

While the behavior of a large for-loop and a blocking I/O action doesn't change the event loop, I'd still appreciate the compiler helping me identify the blocking I/O loop. I'll take whatever help I can get.

asdkhadsj · on Nov 7, 2019

I think I can agree with you, but first I think we'd have to somehow define how to even approach this feature.

Eg, fundamentally I feel like you're making a distinction between blocking I/o and a "blocking" for loop. At the end of the day, they're the same in my view - one is just more likely to be costly.

So I think for this feature to be done right, we'd have to somehow be able to analyze the likelihood of an expensive operation - and the negative consequence that action might have on the rest of the workload. Eg, I would want the same hypothetical behavior and compile-time warnings/errors that a huge file-load might cause, with a huge loop.

Otherwise a simple function call which involves no I/O and looks innocent could have the same terrible behavior as some I/O call does.

Defining that, and informing the compiler seems obscenely difficult. To that degree, I think any interaction with any sort of heap-y thing like iterating over a Vec would have to error the compiler if used in a Future context.

_everything_ would have to be willing to yield. Not sure I like it. Interesting thought experiment though. I imagine some GC languages do exactly this.

eximius · on Nov 7, 2019

I mean, what you're asking for is just profiling, but only of your async methods. I'm not familiar enough with the details to predict how async messes with profilers, so maybe it'd need support. But using a flamegraph profiler would show a big chunk of time in a function that only has small amounts of time spent deeper in the stack.

bluejekyll · on Nov 7, 2019

No, but if that library claims to be an async library, wouldn't that be a bug in the library?

Edit: I'm interpreting your use of sync here as "blocking" and not as Sync in Rust, meaning safe to share across threads. To be clear in my initial response I was talking about shared memory across threads, and may have misunderstood your original statement.

phamilton · on Nov 7, 2019

Yes. And I want the language to use types and compilers to eliminate that whole class of bugs.

bluejekyll · on Nov 7, 2019

I'm not sure this is really possible. Given that async programming is cooperative by nature, how do you tell the difference between a blocking IO task, and a really long running loop in a piece of code that is itself blocking others from executing because it's doing too much work?

The blocking IO might be something in Rust that a type could be created for to denote that they are not async, and therefor warn you in some way, but I think that one is easy to detect in testing.

phamilton · on Nov 7, 2019

> I think that one is easy to detect in testing

I have seen that not detected in testing too many times. With a work stealing execution context the code will still run fine unless under heavy load (which will exhaust the thread pool and lock the application).

woah · on Nov 7, 2019

“Non-blocking” code is basically just code that takes a short enough amount of time that we don’t care that it blocks the thread. It’s inherently a matter of judgement.

pas · on Nov 7, 2019

Static analysis should help with this. Basically it should identify every call site where I/O happens (and other syscalls), and then you have to check them that they are invoked with the right async/nonblocking dance.

This is basically a code audit problem.

Of course something like taint analysis could also work. Every such callsite should be counted as tainted unless it gets wrapped with something that's whitelisted (or uses the right marker type wrapper).

Even effects as types can't help much, because the basic interfere to the kernels (Linux, WinNT, etc.) are not typesafe, and as long as the language provides FFI/syscall interfaces you have to audit/trust the codebase/ecosystem.

harikb · on Nov 7, 2019

There are two ways a language can help

1. A performant "i/o" layer in the standard library that allows a large number of concurrent activity (forget thread vs coroutine differences).

2. Ability of programmer to express concurrency. Ideally, this has nothing to do with "I/O". If I am doing two calculations and both can run simultaneously, I should be able to represent that. Similarly for wait/join.

Explicitly marking a thread as async-only will just force everyone else (who need sync and cannot track/return a promise/callback to their caller) write a wrapper around it for no reason.

BiteCode_dev · on Nov 8, 2019

The alternative is to write SansI/O code (https://sans-io.readthedocs.io/), so that your program don't have to think about that.

Besides, you don't have to put async/await everywhere: if your code is not performing IO, it completely ignore this concern.

The problem is that most of your code is mixing I/O and non I/O code, and people just don't think about it. E.G: a django website is not just a web server, but has also plenty of calls to the session store, the cache backend, the ORM, etc.

Now you could argue that the compiler/interpreter is supposed to hide the sync/async choice to the code user. Unfortunately, this hides where the concurrency happens, and things have dependencies on each others. Some are exclusive, some must follow each others, some can be parallel but must all finish together at some points, some access concurrent resources...

You must have control over all that, and for that to happen, you can either:

- have guard code around each place you expect concurrency. This is what we do with threads, and it sucks. Locking is hard, and you always miss some race condition because it can switch anywhere. - have implicit but official switch points and silos you must know by heart. This is what gevent does. It's great for small systems, not so much at scale. - have explicit switch points and silos: async/await, promises, go-routine. This is tedious to write, but the dangerous spots are very clear and it forces you to think about concurrency upfront.

The last one is the least worse system we managed to write.

sedachv · on Nov 8, 2019

> this is still explicit/userland asynchronous programming: If anything down the callstack is synchronous, it blocks everything. This requires every components of a program, including every dependency, to be specifically designed for this kind of concurency.

Welcome to the 1980s world of cooperative multitasking, but now with "multi-colored functions."

jstimpfle · on Nov 7, 2019

Or just use a preemptive scheduler (such as a regular OS scheduler). Or just be explicit, and take difficulties with being explicit as an indication that the data flow is maybe not very well designed.

I don't know, maybe there are valid applications for await (such as much frequented web servers, where you might want to have 10s of thousands of connections, that would be too expensive to model as regular threads, but still you just want to get some cheap persistence of state and it's not a big problem that the state is lost on server reboot). I can't say, I'm not in web dev.

But I bet it's much more common that await is simply a little overhyped and often one of the other options (real threads or explicit state data structures) is actually a better choice.

seppel · on Nov 8, 2019

> Or just use a preemptive scheduler (such as a regular OS scheduler).

Well... I can't help but whenever I see the await stuff it reminds me of times where I had to do cooperative multitasking and was longing for OS and/or CPU support for something which is non-invasive to my algorithms. But then I'm not sure whether I'm the grumpy old man or it is just history repeating.

mratsim · on Nov 7, 2019

await allows to write concurrent (for IO) or parallel (for CPU) code like it was serial.

The issue it solves is programmer having trouble executing parallel code in their head, and when relationship became intricate (a computation graph) they just breakdown and write buggy software.

A scheduler is targeted at use cases. A preemptive scheduler optimize for latency and fairness and would apply for real-time (say live audio/video work or games) but for most other use cases you want to optimize for throughput.

With real thread you risk oversubscription or you risk not getting enough work hence the task abstractions and a runtime that multiplex and schedule all those tasks on OS threads. Explicit state data structure is the bane of multithreading, you want to avoid state, it creates contention point, requires synchronization primitives, is hard to test. The beauty of futures is that you create an implicit state machine without a state.

floatboth · on Nov 7, 2019

You don't need await for data (CPU) parallelism. You'd typically use something like https://github.com/rayon-rs/rayon or OpenMP instead.

mratsim · on Nov 8, 2019

OpenMP does not handle nested parallelism.

Compute-bound parallelism is not always data parallelism, for example a tree search algorithm would need to spawn/async on each tree nodes.

jstimpfle · on Nov 7, 2019

You can't "avoid state". State is essential to any computation.

The only question is, can you express some of the state-relations as serial code? If "relationships became intricate (a computation graph)", chances are, you shouldn't use serial code anymore, because that splits dependencies in a two-class society: those that are expressed using code and those that are expressed using dead data. It is usually preferable to specify everything as dead data if the relationships get complex, and to then code sort of a virtual machine that "executes" the data.

So it's in fact the simple cases that lend themselves to being expressed as serial code. I won't argue with that there are nice looking example usages. Problem is, as always, systems that help making the simple things easy often make the hard things impossible.

mratsim · on Nov 8, 2019

Haskell avoids state.

When I say state I meant "shared state" is the bane of multithreading. Amdahl's law show that if 95% of your code is parallel you limit your speedup to 20x even with thousands of cores, any shared state contribute to those 5%.

jstimpfle · on Nov 8, 2019

> Haskell avoids state.

That's a myth. You need state for computation. It's not a language issue. You need just as much state in Haskell as you need in other languages. In Haskell you just specify each little component as a function which takes as a parameter what is effectively global state.

Which, by the way, is usually a bad idea because it causes so much boilerplate. Plus, it makes it unclear how many instances of a given concept can actually exist in a running program.

jayd16 · on Nov 7, 2019

Isn't this the most convenient setup though? I'm most familiar with async/await in UI programming and you most often have a main thread for synchronization. You want to assume that most of your main thread work is synchronous and non-yielding until you explicitly yield. Seems like it would be a lot harder to use main thread synchronization in the style your suggesting.

Maybe I just can't imagine it. Whats a good language that shows off the style you're suggesting?

bernawil · on Nov 7, 2019

>Whats a good language that shows off the style you're suggesting? javascript.

What you describe sounds like native UI work since forever before javascript. "Don't block the main thread" and all that.

Javascript is diferent in that it's a single-thread with an event loop. Synchronous functions execute until they end. Asynchronous functions are handled by the event loop which "loops" between the pool and runs each one for some time, then switches to other, concurrenly (think round robin). What happens when the runtime is running an asynchronous function and inside it reaches a synchronous one? it stops round-robin and executes this function until it ends.

What OP wants is a language like javascript but without having to write code distinguishing synchronous and asynchronous functions and instead having some other tool to tell the runtime when a function is synchronous or asynchronous without having to write it again.

jayd16 · on Nov 7, 2019

Yes, I realize all this. My question is how you can have such a system and still keep UI thread synchronization without having the opposite problem of marking all your synchronous methods.

bernawil · on Nov 7, 2019

in a strict language? I don't think it's possible. Because if you take a better look you'll see that it's not enough to mark functions as sync or async since inside the functions each line of code can be considered a synchronous function in it's own.

What you want is something like Haskell that's lazy and its not about "executing statements" but rather "evaluting expressions".

weberc2 · on Nov 7, 2019

Not the OP, but Go doesn't have this problem because all I/O is async under the hood, but it exposes a sync interface. This means the entire Go ecosystem is bought into a single concurrency model and runtime, which some find irksome, but it works pretty well most of the time. Of course, Go also lacks Rust's static safety features, but I think that's orthogonal to its concurrency approach.

pcwalton · on Nov 7, 2019

We tried this in Rust and found it was slower than 1:1 threading.

pron · on Nov 7, 2019

What you tried wasn't "this", though. It was one particular implementation of lightweight threading that has to cope with Rust's peculiarities, special requirements and compilation targets. There is absolutely nothing essential about lightweight threads that prevents them from emitting essentially the same code as the stackless-coroutine approach. It's just that in Rust it might be very hard or even not worth it, given the language's target audience.

pcwalton · on Nov 8, 2019

I don't understand what your objection is. It's a given that what I wrote applies to Rust. This is a thread about Rust. I didn't say that M:N threading is always slower than 1:1.

Besides, fibers don't emit essentially the same code as async code. One has a stack, and the other doesn't. That's a significant difference.

silon42 · on Nov 8, 2019

If the stack could be sufficiently small, it's not that different from heap allocated async state. But you probably needs segmented stacks, or at least separate stack async preemptible or non-async-preemptible code (has anyone tried making a system like this?)

pron · on Nov 8, 2019

It's isn't a given that M:N threading is slower than 1:1 threading even in Rust. A particular implementation you tried exhibited that behavior.

> One has a stack, and the other doesn't. That's a significant difference.

They both have some memory area to which they write state. Calling it "a stack" refers to the abstraction in the programmer's mind, not to how the memory is actually written/read. It is true that in order to support recursion, a thread might need to dynamically allocate memory, but so would async/await, except it'll make it more explicit.

pcwalton · on Nov 8, 2019

> It's isn't a given that M:N threading is slower than 1:1 threading even in Rust. A particular implementation you tried exhibited that behavior.

I don't see any way around the problems of segmented stacks and FFI. There is no way to implement stack growth by reallocating stacks and rewriting pointers in Rust, even in theory. It would break too much code: there is a lot of unsafe (and even safe!) code out there that assumes that stack pointer addresses are stable. In fact, async/await in Rust had to introduce a new explicit pinning concept in order to solve this exact problem while remaining backwards compatible. And when calling the FFI, you have to switch to a big stack, which was an insurmountable performance problem. Rust code by its nature is FFI-heavy; it's part of the niche that Rust finds itself in.

pron · on Nov 8, 2019

You can make what are virtually zero-cost copies from what you call a "big stack" to a resizable stack with virtual memory tricks. You don't even need to copy the entire stack, but cleverly rewrite the return address stored on the stack to do this kind of "code-switching". But it does mean doing backend manipulations in a platform-dependent way. There are several good ways to do this, none of them particularly easy. What is perhaps impossible is allowing FFI code to block the lightweight thread, but async/await doesn't solve this, either.

pcwalton · on Nov 8, 2019

> You can make what are virtually zero-cost copies from what you call a "big stack" to a resizable stack with virtual memory tricks. You don't even need to copy the entire stack, but cleverly rewrite the return address stored on the stack to do this kind of "code-switching".

We tried it. It was too slow.

pron · on Nov 8, 2019

OK. It really is hard when you're what you call "FFI-heavy" and don't like a significant runtime. So Rust has several * self-imposed* constraints (whether they're all essential for its target domains is a separate discussion, but some of those constraints certainly are) that makes this task particularly hard, but my point is that there is nothing fundamental to n:m threading that makes it slower than async/await, and async/await does fundamentally come at the significant cost of a particularly viral form of accidental complexity.

matklad · on Nov 7, 2019

Fibers under the magnifying glass [1] might be a relevant paper here. Its conclusion, after surveying many different implementations, is that lightweight threads are slower than stack less coroutines.

[1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p136...

pron · on Nov 8, 2019

No, its conclusion is that fibers with certain properties in C/C++ are slower -- and particularly hard to implement correctly -- than stackless coroutines in C/C++. That's because of the particular characteristics of those languages. In fact, you'll note that the only negative thing he says about Go is that it incurs an overhead when interacting with non-Go code.

pcwalton · on Nov 8, 2019

And that overhead is a deal-breaker in Rust.

pron · on Nov 8, 2019

Sure, but that overhead is also not essential, but a feature of Go's particular implementation. Fibers aren't one thing and there are many, many ways of implementing them. As I said before, implementing them for Rust well would have likely required changes to LLVM and Web Assembly, and even then it would be harder than async/await, perhaps to the point of being too hard to be worth it and probably against aspects of Rust's philosophy (I would say that that is the main difference between the two: achieving similar performance is much easier for the language implementors with async/await). But it's just not true that there is something essential about them that makes them slower. After all, you're running all of your code inside a particular implementation of threads.

pcwalton · on Nov 8, 2019

> Sure, but that overhead is also not essential, but a feature of Go's particular implementation.

The only way to get around the FFI performance problem would be for all fibers to have big stacks. At that point you've thrown away their biggest selling points: high scalability and fast spawning.

pron · on Nov 8, 2019

> The only way to get around the FFI performance problem would be for all fibers to have big stacks.

I don't know all of Rust's specific constraints, but it is not the case in general. There are two levels for FFI support in this context, based on whether you want to allow FFI to block the lightweight thread (perhaps through an upcall), or not. Only if you want to allow that do you need "big stacks", but even then they can be "virtually big" but "physically small". If you don't, then all you need to do is to temporarily run FFI code on a "big stack", but you know that all the FFI frames are gone by the time you want to block. Depending on your FFI, if you don't allow the FFI code to hold pointers into you language's stack, you're all good.

Rusky · on Nov 8, 2019

> Depending on your FFI, if you don't allow the FFI code to hold pointers into you language's stack, you're all good.

Rust does this with virtually all FFI.

leshow · on Nov 7, 2019

It's not that crazy. There's an async ecosystem of libraries, you opt in to using them, if you run a blocking method inside an executor then you obviously pay the cost for that. Granted most executors run on multiple threads so you'd only be blocking one thread.

Still, in practice it's just not that hard, use async methods if you're writing async code.

rightbyte · on Nov 7, 2019

Is it a single threaded state machine?