Hacker News new | past | comments | ask | show | jobs | submit login

I’ve been playing with async await in a polar opposite vertical than its typical use case (high tps web backends) and believe this was the missing piece to further unlock great ergonomic and productivity gains for system development: embedded no_std.

Async/await lets you write non-blocking, single-threaded but highly interweaved firmware/apps in allocation-free, single-threaded environments (bare-metal programming without an OS). The abstractions around stack snapshots allow seamless coroutines and I believe will make rust pretty much the easiest low-level platform to develop for.




Have you ever heard of Esterel or Céu? They follow the synchronous concurrency paradigm, which apparently has specific trade-offs that give it great advantages on embedded (IIRC the memory overhead per Céu "trail" is much lower than for async threads (in the order of bytes), fibers or whatnot, but computationally it scales worse with the nr of trails).

Céu is the more recent one of the two and is a research language that was designed with embedded systems in mind, with the PhD theses to show for it [2][3].

I wish other languages would adopt ideas from Céu. I have a feeling that if there was a language that supports both kinds of concurrency and allows for the GALS approach (globally asynchronous (meaning threads in this context), locally synchronous) you would have something really powerful on your hands.

EDIT: Er... sorry, this may have been a bit of an inappropriate comment, shifting the focus away from the Rust celebration. I'm really happy for Rust for finally landing this! (but could you pretty please start experimenting with synchronous concurrency too? ;) )

[0] http://ceu-lang.org/

[1] https://en.wikipedia.org/wiki/Esterel

[2] http://ceu-lang.org/chico/ceu_phd.pdf

[3] http://sunsite.informatik.rwth-aachen.de/Publications/AIB/20...


> Er... sorry, this may have been a bit of an inappropriate comment, shifting the focus away from the Rust celebration.

I think if it's interesting and spurs useful conversation, it's appropriate, tangent or not. I for one an thankful for your suggested links, they look interesting.


It's not the same, but Rust async/await tasks are also in the order of bytes, and you can get similar "structured concurrency"-like control flow with things like `futures::join!` or `futures::select!`.

Ceu looks very neat, I suspect (having not read much about it yet) that async codebases could take a lot of inspiration from it already.


> you can get similar "structured concurrency"-like control flow with things like `futures::join!` or `futures::select!`.

That sounds very promising, I should give it a closer look (I don't program in Rust myself so I only read blogs out of intellectual curiosity)


Or you can use something like protothreads or async.h [1] if you're stuck with C/C++ and need something lightweight.

[1] https://news.ycombinator.com/item?id=21033496


Céu compiles to C actually, and I believe Francisco Sant’Anna (the author of the language) makes a direct comparison to protothreads and nesC in one of his papers (probably the PhD thesis). I had not seen async.h before though, interesting!

[1] http://ceu-lang.org/publications.html


I also think async (the paradigm) is kind of weird in rust world. I agree with https://journal.stuffwithstuff.com/2015/02/01/what-color-is-....


The solution suggested by that article is to use M:N threading, which was tried in Rust and turned out to be slower than plain old 1:1 threading.

If you don't want to deal with async functions, then you can use threads! That's what they're there for. On Linux they're quite fast. Async is for when you need more performance than what 1:1 or M:N threading can provide.


> If you don't want to deal with async functions, then you can use threads!

Truly? If some very popular lib become async (like actix, request, that I use), I can TRULY ignore it and not split my world in async/sync?


You can easily convert async to sync by just blocking on the result. The other way (sync to async) is more difficult and requires proxying out to a thread pool, but it's also doable.


An `async` function _can_ call blocking functions, of course, it just blocks the entire thread of execution which could otherwise continue making progress by polling another future.


Both actix and reqwest are already async, the fact that you haven't noticed yet just shows that you are already ignoring it.


But the signatures are not. If them mark the types/functions async, what happened?

I must rewrite all the calls?


Async is just a syntactic sugar for Future<...>, you can poll() it manually.

You can take the event loop, run it on one thread and do whatever you wish on other threads.

Fearless concurrency, after all.


you can just `.await` on any Future returned directly by APIs, if it's not marked as an async fn.


How hard was it tried?

I imagine there's a reason that languages like Go adopted M:N threading... obviously part of the reason is that it's way more scaleable, but userspace threading is also supposed to be faster, as context switches don't need to switch to kernel space... were the problems tight loops (which AFAIK is also the problem in Go)? or maybe it's just so much easier / more efficient if you also have GC...


In my opinion they tried hard enough. Here are the links to the discussions of that time if you are into that kind of thing:

https://mail.mozilla.org/pipermail/rust-dev/2013-November/00...

https://mail.mozilla.org/pipermail/rust-dev/2013-November/00...

https://github.com/rust-lang/rfcs/blob/master/text/0230-remo...

It is clear that they really wanted to make green threading work, but in the end it proved incompatible with other goals of the language.

The main problem as I understand it is with the stack: you can't make the stack big from the beginning (or your threads wouldn't be lightweight anymore) so you need to grow it dynamically. If you grow it by adding new segments to the linked list, you get the "stack thrashing"/"hot split" problem: if you are unlucky and have to allocate/deallocate a segment in a tight loop the performance suffers horribly. Go solved the problem by switching to contiguous relocatable stacks: if there is not enough space in the stack, a bigger contiguous chunk of memory is allocated and old contents of the stack are copied there (a-la C++ vector). Now there is a problem with references to the stack-allocated variables - they become invalid. In go this problem is solvable because it is a managed garbage-collected language so they can simply rewrite all the references to point to the new locations but in rust it is infeasible.


The reason is that M:N threading inherently requires a language runtime, and the advantage of increased scalability (GHC threads use less than 1KB of memory) comes with the disadvantage that FFI calls are really quite expensive to do, because you need to move to a separate native thread in case the FFI call blocks.

These languages (Erlang, Haskell, Go, ...) have no ambition to be useful for system programming; they're not intended as a replacement for C/C++ in that domain, unlike Rust.


Author's solution is threads:

> But if you have threads (green- or OS-level), you don’t need to do that. You can just suspend the entire thread and hop straight back to the OS or event loop without having to return from all of those functions.

Correct me if I'm wrong, but wasn't the lack of threads one of the biggest reasons why NodeJS originally outperformed most of its competitors?

Spinning up threads for each concurrent request was expensive, and (nonblocking) async code was by comparison ridiculously cheap, so the lack of overhead meant Node could just make everything async, instead of trying to decide up-front which tasks deserved which resources.

Granted, it's been over a decade since Node came out. Maybe thread overhead has gotten a lot better? But barring faulty memory, I definitely remember a number of people explaining to me back then that being single-threaded was the point.


I'm pretty sure that was the official talking point at the time, and some people may have even been motivated enough to actually belief it.

Of course that all changed once Node got threads.


Did it, though? AFAIK Node still doesn't have OS threads, which are the expensive version. The "About" page still says that Node is "designed without threads": https://nodejs.org/en/about/


All 5 of his points seem to be 2015 Javascript only. Some of them don't even apply to modern Javascript; I don't see any that apply to rust.


While I'm happy to believe you, with a short comment like that I just have to take your word for it. Could give a short explanation of how Rust already addresses each of these points for those of us who don't program in it?


This article is 90% a rant against “callback hell” that js was facing before async/await was introduced. The remaining 10% stays valid even in the presence of async/await, but the trade-off (ignored by the article) of the alternative is having to manually deal with synchronization primitives (at least channels), which would make zero sense given that JavaScript is a single-threaded environment.

Rust is a different beast, you can have whichever model you like best (OS threads, M:N threads (with third party libs), async/await) but async/await is by far the most powerful, that's why it's such a big deal that it lands on Rust stable.


I hadn't heard of "synchronous concurrency", but looking at the Céu paper (only briefly so far), I think the model looks very close to how `async`/`await` works in Rust - possibly even isomorphic. This is really exciting, because Rust's model is unique among mainstream languages (it does not use green threads or fibers, and in fact does not require heap allocation), and I wasn't previously aware of any similar models even among experimental languages.

I'll open a thread on users.rust-lang.com to discuss the similarities/differences with Céu, but for now, here's the main similarity I see:

A Céu "trail" sounds a lot like an `async fn` in Rust. Within these functions, `.await` represents an explicit sync-point, i.e., the function cedes the runtime thread to the concurrency-runtime so that another `async fn` may be scheduled.

(The concurrency-runtime is a combination of two objects, an `Executor` and a `Reactor`, that must meet certain requirements but are not provided by the language or standard library.)


That is a correct understanding of how `await` works in Céu. What is important to note however is that all trails must have bounded execution, which means having an `await` statement. One abstraction that Céu uses is the idea that all reactions to external events are "infinitely" fast - that is, all trails have bounded execution, and computation is so much faster than incoming/outgoing events that it can be considered instantaneous. It's a bit like how one way of looking at garbage collected languages is that they model the computer as having infinite memory.

Having said that, the combination of par/or and par/and constructs for parallel trails of execution, and code/await for reactive objects is like nothing I have ever seen in other languages (it's a little bit actor-modelish, but not quite). I haven't looked closely at Rust though.

Céu also claims to have deterministic concurrency. What it means by that is that when multiple trails await the same external event, they are awakened in lexical order. So for a given sequence of external events in a given order, execution behavior should be deterministic. This kind of determinism seems to be true for all synchronous reactive languages[0]. Is the (single-threaded) concurrency model in Rust comparable here?

The thesis is a bit out of date (only slightly though), the manual for v0.30 or the online playground might be a better start[1][2].

[0] which is like... three languages in total, hahaha. I don't even remember the third one beyond Esterel and Céu. The determinism is quite important though, because Esterel was designed with safety-critical systems in mind.

[1] https://ceu-lang.github.io/ceu/out/manual/v0.30/

[2] http://ceu-lang.org/try.php


I'm not sure I see the connection between bounded execution and having an `await` statement. Rust's `async` functions, just like normal functions, definitely aren't total or guaranteed to terminate. They also technically don't need to have an `await` expression; you could write `async fn() -> i32 { 1 + 1 }`. But they do _no_ work until they are first `poll`ed (which is the operation scheduled by the runtime when a function is `await`ed). So I believe that is equivalent to your requirement of "having an `await` statement". You could even think of them as "desugaring" into `async fn() -> i32 { async {}.await; 1 + 1 }` (though of course that's not strictly accurate).

I think it's reasonable to consider well-written `poll` operations as effectively instantaneous in Rust, too, though since I haven't finished the Céu paper yet I don't yet understand why that abstraction is important to the semantics of trails.

As for `par`/`or` and `par`/`and`, I expect these are equivalent to `Future` combinators (which are called things like `.or_else`, `.and_then`, etc).

Since Executors and Reactors are not provided by the language runtime, I believe it would be easy (possibly even trivial) to guarantee the ordering of `Future`-polling for a given event. I am guessing that for Executors that don't provide this feature, the rationale is to support multi-threaded waking (i.e. scheduling `poll` operations on arbitrary threads taken from a thread pool). When using `async`/`await` in a single thread, I'd be mildly surprised if most Executors don't already support some kind of wake-sequencing determinism.

In any case, I've now opened a discussion on users.rust-lang about whether Rust can model synchronous concurrency: https://users.rust-lang.org/t/can-rusts-async-await-model-th...


In Céu, an `await` statement is ceding execution to the scheduler. It is the point at which a trail goes to sleep until being woken up again. So "bounded" here means "is guaranteed to go to sleep again (or terminate) after being woken up". The language does not compile code loops if it cannot guarantee that they terminate or not have an `await` statement in every loop.

I'll try to explain how the "instantaneousness" matters for the sake of modelling determinism. Take the example of reacting to a button press: whenever an external button press event comes in, all trails that are currently awaiting that button press are awakened in lexical order. Now imagine that there are multiple trails in an infinite loop that await a button press, and they all take longer to finish than the speed at which the user mashes a button. In that case the button events are queued up in the order that they arrive. If the user stops then eventually all button presses should get processed, in the correct order.

The idea is that external events get queued up in the order that they arrive, and that the trails react in deterministic fashion to such a sequence events: if the exact same sequence of events arrives faster or slower, the eventual output should be the same. So while I might produce congestion if I mash a button very quickly, it should have the same results as when I press them very slowly.

Now, you may think "but what if you were asked to push a button every two seconds, with a one second time-window? Then the speed at which you press a button does matter!" Correct, but in that case the two second timer and one second time window also count as external events, and when looking at all external events then it again only matters in what the order in which all of these external events arrive at the Céu program.

Lacking further knowledge about Rust I obviously cannot say anything about the rest of your comment, but I hope you're right about the `Future` combinators because I really enjoyed working with Céu's concurrency model.


Keep in mind that the current release only brings an MVP of the async/await feature to the table. The two things I've missed are both no_std support and async trait methods, but there are reasons these haven't been completed yet. That doesn't mean they won't be available in the future, just that the team has prioritized to release a significant part of the feature that many will already find useful.


related: async/await is also amazingly useful for game development. Stuff like writing an update loop, where `yield` is "wait until next frame to continue".

This can let you avoid a lot of the pomp and circumstance of writing state machines by hand.


I played around with exactly that concept during Ludum Dare last month, if you're interested: https://github.com/e2-71828/ld45/blob/master/doc.pdf


I don't know any rust,. but that was a really good read. what is your background, and what materials did you use getting into Rust?

Your explanations were surprisingly simple


Thanks. I was a software engineer at several different SF-area companies for about a decade, and then I decided to take some time off from professional programming. I'm now a full-time student in a CS Master's program. As I'll eventually need to produce a thesis, this was partly an exercise to practice my technical writing.

I picked up Rust for my personal projects a couple of years ago, and mostly worked from the official API docs, occasionally turning to blog posts, the Rust book, or the Rustonomicon. Because I didn't have any time pressure, I ignored the entire crate ecosystem and opted to write everything myself instead. This has left some giant gaps in my Rust knowledge, but they're in a different place than is typical.

As far as the explanations go, I realized that good authors don't say important things only once: they repeat everything a few different times in a few different ways. So, I tried to say the exact same things in the prose as in the code listings, and trusted that the fundamental differences between Rust and English would make the two complement each other instead of simply being repetitive.


The Fushia team also uses Rust's async/await this way in the implementation of their network stack.


Yeah, that's a super interesting idea that I'm also toying with in my head. One issue however is the "allocation-free" part. Sooner or later you typically hit a situation where you need to box a Future - either because you want to spawn it dynamically or need dynamic dispatch and type erasure. At that point of time you will need an allocator.

I'm still wondering if lots of the problems can be solved with an "allocate only on startup" instead of a "never allocate" strategy, or whether full dynamical allocation is required. Probably needs some real world apps to find out.


I'm not familiar with rust futures stuff, but having researched similar problems in C++, the important part is giving the control of allocation (when and how) to the application.


Right. There are some possibilities with custom allocators. But that is still kind of a non-explored area in Rust. Especially the error handling around it. Looking forward to learn more.


I've dipped my toes into embedded rust from time to time. Can you give me an example or two of how you would use async/await in an embedded environment? I'm just curious about how it would work.


Another infrequent toe dipper here - say you need to send some command to a peripheral, wait for the device to acknowledge it, then send more data. The naive implementation would poll for the interrupt or just go to sleep, meaning no other user code can run in that time. A naive async implementation would spread your code all over the place - it wouldn't just be a function call with three statements. There are libraries that can give you lightweight tasks, but you might need to keep separate stacks for each of the suspended tasks, there might be other ones that don't, but require the code is written in a non trivial way. Rust with async/await on no_std can give you best of both worlds - easy to read sequential code while the best possible performance.


>poll for the interrupt

Errr, no. You either poll, or set up an interrupt to catch an event. The point of an interrupt is to allow other code to continue to run while waiting for a peripheral.


Sorry, I mean polling for the ready register, not interrupt.


Which executor do you use? Tokio is a bit heavy, no?


Not the parent, but I've been keeping my eye on https://github.com/Nemo157/embrio-rs


Not much doc at the github. Can you provide a quick overview?


Not much I can say other than “an executor designed specifically for embedded.” I don’t do embedded dev myself, so it’s more of a curiosity thing for me than something that I can give you a lengthy explanation of.


Embrio provides only a very limited executor. it can just drive only single Future. Which gets re-polled every time a certain system interrupt occurs. You can obviously have subtasks of that single task (e.g. via join! and friends), but you can't have other dynamic tasks.

jrobsonchase has implemented a more dynamic solution which uses custom allocators for futures: https://gitlab.com/polymer-kb/firmware/embedded-executor

I think there might also be other solutions out there. I remember the Drone OS project also doing something along that.


I'd like to hear more about this - I thought Tokio was just some tools on top of MIO which is just `select` (or similar). Is Tokio heavy?


It's like old Windows programming (Windows 3.X). Cooperative multitasking is process control version of manual unsafe memory management. When your app suddenly freezes, you discovered a bug.

It's for cases where alternatives don't exist or they are too expensive.

Language level green threads are safer abstraction over asynchronous I/O operations.


The difference is that in an embedded context hopefully everything is written and tested all together by the same entity, rather than a mish-mash of high and low quality code held hostage by the weakest link.


> I believe will make rust pretty much the easiest low-level platform to develop for.

This stuff is easily available for C. It's just a function call instead of syntactic sugar. On Windows, use Fibers. On Linux, there is a (deprecated) API as well (forget the name). Or use a portable wrapper library.

Or just write a little assembly. Can't be hard.


How exactly do I write embedded code for a tiny chip on a different architecture with just 4K of RAM and no OS that uses the Win32 fibers?


Probably not using green threads / async at all? That would require a number of stacks as well as a little runtime, which will quickly eat up your 4K.


Just to clarify for those following along, Rust async code does not use green threads and doesn't require a stack per task.


I'm missing some of the technical details here, but from a quick glance of the article it seems like Rust's futures are lazy. I.e. a stack would only be allocated when the future is actually awaited. But in order to execute the relevant code, a call stack per not-finished future is still needed, or am I missing something?


Afaik Rust's futures compile to a state machine, which is basically just a struct that contains the current state flag and the variables that need to be kept across yield points. An executor owns a list of such structs/futures and executes them however it sees fit (single-threaded, multi-threaded, ...). So there is no stack per future. The number of stacks depends on how many threads the executor runs in parallel.


> the current state flag the variables that need to be kept across yield points.

you mean like... a stack frame?


Like a stack frame, but allocated once of a fixed size, instead of being LIFO allocated in a reserved region (which itself must be allocated upfront, when you don't know how big you're going to end up).

The difference being: if your tasks need 192B of memory each, and you spawn 10 of each, you just consumed a little less than 2kB. With green threads, you have 10 times the starting size of your stack (generally a few kB). That makes a big difference if you don't have much memory.


So that's actually green threads in my book (in a good implementation I expect to be able to configure the stack size), with the nice addition that the language exposes the needed stack size.


It's a stackless coroutine. AFAIK, the term “green thread” is usually reserved to stackful coroutines, but I guess it could also be used to talk about any kind of coroutines.


It's more efficient (potentially substantially more so). In a typical threaded system you have some leaf functions which take up a substantial amount of stack space but don't block so you don't need to save their state between context switches. In most green threaded applications you still need to allocate this space (times the number of threads). The main advantage of this kind of green threads is you can seperate out the allocation which you need to keep between context switches (which is stored per task)versus the memory you only need while actually executing (which is shared between all tasks). For certain workloads this can be a substantial saving. In principle you can do this in C or C++ by stack switching at the appropriate points but it's a pain to implement, hard to use correctly (the language doesn't help you at all), and I've not seen any actual implementations of this.


Kinda like a stack frame, but more compact and allocated in one go beforehand.


because of syntactical restrictions of how await work, at most you need to allocate a single function frame, never a full stack, and often it doesn't even need to be allocated separately and can live in the stack of the underlying OS thread.


So that async function cannot call anything else?


They can, but the function itself cannot be suspended by calling something else (i.e. await being a keyword enforces this), so any function that is called can use the original OS thread stack. Any called function can in turn be an async function, and will return a future[1] that in turn capture that function stack. So yes, a chain of suspended async functions sort of looks like a stack, but its size is known a compile time [2].

[1] I'm not familiar with rust semantics here, just making educated guesses.

[2] Not sure how rust deals with recursion in this case. I assume you get a compilation error because it will fail to deduce the return value of the function, and you'll have to explicitly box the future: the "stack" would then look a linked list of activation records.


Async function don't really call anything by themselves: the executor does, and all function called in the context of an async function is called on the executor's stack. You just end up with a fixed number of executors running on their own OS thread with a normal stack.


[flagged]


What gives you doubts I'm serious?


Proposing asm and deprecated APIs, "just use a portable wrapper library" as though there are any such (good) libraries, type safety, etc.


The fact is, all "green threads" do is swapping between call stacks and call register sets. The call stack is usually just a register itself. How hard can it be to save a register and restore it later?

> as though there are any such (good) libraries

Win32 Fibers are extremely easy to use (and not deprecated). I used them once, it was a blast. I wrote a simple wrapper around them in less than 50 lines of straightforward code, exposing maybe a struct and 3-4 function calls. Never had any problems.

That makes writing your own little non-preemptive scheduler extremely easy as well. That might be as little as another 50-100 lines of code. So you get a lot of control at almost no price.

POSIX C has swapcontext() / setcontext() which must be pretty similar (although I'm not sure I've ever used it) and if I understand correctly it was only obsoleted due to a syntactical incompatibility with C99 or something like that...

The only problem here is that green threads are a little bit of an awkward computational model in many cases. But that's not a syntax problem.


An implementation of green threads-style register saving cannot be as memory-efficient as async/await, because it's more dynamic.

Green threads must save all (callee-save) registers, support arbitrary dynamic stack growth, and work with unsuspecting frames in the middle of the stack.

Async/await has the "thread" itself do all the state saving, so it can save only what's actually live. It computes a fixed stack size up front, so can be done with zero allocation. And because all intermediate stack frames are from async functions, they can also be smaller- transient state can be moved to the actual thread stack, potentially shrinking the persistent task state.

(This also has benefits for cancellation- there is no need to use the exception/stack unwinding ABI to tear down a "stack," because it is instead a normal struct with normal drop glue.)


> Win32 Fibers are extremely easy to use (and not deprecated).

Apparently MS is considering depreciating them: http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p136...

That's in general an interesting paper about M:N threads, fibers, green threads, or whatever you want to call them.


Yeah, so what they write in section 3.1 is basically what I stated elsewhere on this comments page. It's probably not that Win32 Fibers is a bad implementation of Green Threads, but more that Green Threads is an awkward model to work with in many cases.

> Current recommendation is to avoid using fibers and UMS. This advice from 2005 remains unchanged: “... [I]nstead of spending your time rewriting your app to use fibers (and it IS a rewrite), instead it's better to rearchitect your app to use a "minimal context" model - instead of maintaining the state of your server on the stack, maintain it in a small data structure, and have that structure drive a small one-thread-per-cpu state machine.”




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: