Hacker News new | past | comments | ask | show | jobs | submit login
Virtual Threads: New Foundations for High-Scale Java Applications (infoq.com)
213 points by axelfontaine on Sept 29, 2022 | hide | past | favorite | 174 comments



This is a great writeup, and reignites my interest in Java. (I've long considered "Java Concurrency in Practice" to be the _best_ Java book ever written.)

I haven't been able to figure out how the "unmount" of a virtual thread works. As stated in this article:

> Nearly all blocking points in the JDK have been adapted so that when encountering a blocking operation on a virtual thread, the virtual thread is unmounted from its carrier instead of blocking.

How would I implement this logic in my own libraries? The underlying JEP 425[0] doesn't seem to list any explicit APIs for that, but it does give other details not in the OP writeup.

[0] https://openjdk.org/jeps/425


> How would I implement this logic in my own libraries?

There's no need to if your code is in Java. We had to change low-level I/O in the JDK because it drops down to native.

That's not to say every Java library is virtual-thread-friendly. For one, there's the issue of pinning (see the JEP) that might require small changes (right now the problem is most common in JDBC drivers, but they're already working on addressing it). The bigger issue, mostly in low-level frameworks, is implicit assumptions about a small number of shared threads, whereas virtual threads are plentiful and are never pooled, so they're never shared. An example of such an issue is in Netty, where they allocate very large native buffers and cache them in ThreadLocals, which assumes that the number of threads is low, and that they're reused by lots of tasks.


> An example of such an issue is in Netty, where they allocate very large native buffers and cache them in ThreadLocals, which assumes that the number of threads is low, and that they're reused by lots of tasks.

Fixing Netty is very high yield. Every modern Java server application I'm aware of uses Netty. Quarkus, Vertx, Micronaut, Java-GRPC, ...

Then Graal? Virtual threads in Graal with a Netty that isn't 60MB would be superb.

What about just shimming Netty? Is that in Oracle's scope? There are already selectable backends for Netty. Why not have "virtualthread-graalcompatible" that uses your already fixed Java IO? It would reduce so much pain, and make Java competitive with golang for the first time ever.


GraalVM native images have recently gained support for virtual threads. So you can have AOT compiled fast starting binaries that use virtual threads, if you want (or very soon at least, I can't recall if it's out yet or not).

The main gap vs Go would then be the speed of the AOT compile. But you normally develop on HotSpot anyway.

Netty already works with Loom. There are people doing experiments with it where it shows some small performance gains even. They are incrementally improving it so it works better when Loomified, but it does work.


Graal also has a “fast build” mode, likely way slower still than go’s compilation, but there is that. It is meant for development though, you will likely want an optimized build for prod. But yeah, one should probably just develop in the traditional way, and then test it out in native after a few iterations.


It is surely not out yet, latest stable GraalVM is still based on Java 17.


I expect Netty to make the appropriate changes, but Helidon have a new server called Nima (https://medium.com/helidon/please-welcome-helidon-n%C3%ADma-...) that's been built from the ground up to be virtual-thread-friendly.


Do you need Netty in a virtual thread world? Imo Netty made non blocking IO in Java tractable.. but virtual threads does it better and more broadly, so what role does netty play now? What other than thread efficiency does it bring that can’t be achieved more easily now?


Conversely, some applications would like a leaky abstraction they have some control over. Some caching will likely remain beneficial to link to a carrier thread.

As a member of the Cassandra community I’m super excited to get my hands on virtual threads come the next LTS (and Cassandra’s upgrade cycle), as it will permit us to solve many outstanding problems much more cheaply.

I hope by then we’ll also have facilities for controlling the scheduling of virtual threads on carrier threads. I would rather not wait another LTS cycle to be able to make proper use of them.


LTS is a designation by our sales organisation for arbitrarily chosen versions so they can offer a support service for legacy codebases -- i.e. people willing to pay for the privilege of not getting new features [1]. Why anyone would wait for something intended for the sole purpose of not adding new features to get a new feature --and so enjoying the very worst of both worlds -- is beyond me. The development organisation has no consideration of support offerings. All releases are equal, and the assumption is that those who want new features obviously do not want LTS and vice-versa.

Anyway, the mention of the perennially misunderstood Java LTS is a pet peeve of mine, so I'm sorry if this comment was overly aggressive.

[1]: There are many legacy applications that aren't actively developed. They have no use for new features, and new features sometimes requires changing configurations -- a hassle they don't have the people to do. So LTS is a subscription service that allows them to get releases without new features so they can keep running legacy apps without much maintenance. It's a great service, but obviously the opposite of what actively maintained codebases want; for them we have the regular upgrade model.


Well, it is not just Oracle that has adopted the LTS designations. AdoptOpenJDK and others are also selecting the same LTS versions to provide longer term support promises for, including security and other improvements.

A major project like Cassandra that is non-trivial to upgrade (but is desirable to upgrade, and to have security fixes for) simply cannot hop Java version every year and impose that additional burden on our users, and nor can we pick a Java version that is not guaranteed security updates past some near term horizon. So we pick versions that people are expected to have available them for the lifetime of that release in their environment.

Honestly I’m not sure what you’re upset about, I am a bit surprised at the vehemence of your response to that element of my comment. Also a little disappointed you didn’t engage with the rest of my comment; I hope that doesn’t mean I also end up disappointed with the near future of virtual threads.


A major project like Cassandra will find that it is easier to use the current version (before LTS existed, people had to upgrade to the six monthly feature releases, but because they didn't get a new version number people didn't care as much). If it does cause trouble, let us know, because LTS really isn't intended for actively maintained projects that want new features and isn't the recommended path for them. Just note that the free upgrade services called LTS are not quite the same; they just include backports from mainline and don't support the whole JDK.

Anyway, I'm sorry about my tone. I know that the change in the version numbering scheme confused people to pick the wrong upgrade path for themselves, and it's our fault for miscommunicating. But I don't know when features will land, or when those who want new features with an LTS service will be able to use them. But I can say that our process assumes that those who want long-term support are trying to avoid new features and are happier when a big feature misses the next release with LTS, so while missing one release normally means a mere 6 month delay, those who wait for LTS for actively developed codebases (even though it's due to a misunderstanding) might have to wait a further couple of years.


Well, whatever each of our perceptions about the utility of selecting an LTS, there are realities we all occupy - and LTS releases are a part of Cassandra's reality for the time being. Perhaps that will change in future, but I do not anticipate it very soon.

But, I will be pushing for the adoption of virtual threads once they become more useful for the community (which I think the previously mentioned improvements predicate). So, whatever the realities JEP425 operates within, I do hope these improvements land by Java 21, so that my job is made easier.

Either way, really excited about the work, whenever it transpires that we can use it. Thanks for your efforts delivering it so far.


Thank you and good luck!


The good work to slim down and better compartmentalize the JDK has historically created enough backward incompatibility risks for me that I prefer staying on the same version longer than 6 months. If I want security updates for the version I’m on LTS is the best (only?) way.


I think Java has never had better backward compatibility than now. The difficulties migrating to 9 were 1. due to 9 being the last major release ever, and 2. libraries that hacked JDK 8's internals and were not portable, so they broke in a big release. The overall upgrade costs now are also lower than ever before, and we know that because some companies do understand that using the current version is easier and cheaper than an old one. Having said that, if you want to stay on an old version for a long time, then yes, use one with LTS, but then you might as well upgrade very slowly (not every two years) because upgrades will be less pleasant.


I agree, Java compatibility is much better now. For Java 9 there were also runtime breaking changes like removal of javax classes and removal of javaFX.

>6 months is not really a long time for enterprise software.


But Java has always had semi-annual feature releases, and there wasn't even LTS -- people had to upgrade to a new feature release ever six months. It's just that we dropped major releases altogether and then gave the feature releases new version numbers, and that confused many people (who might not have even been aware that some of the minor releases in the past were actually quite significant feature releases). In other words, people upgraded to new feature releases every six months in the Java 7 and 8 era, too; now with major releases gone it's even easier, so it doesn't make sense that projects that were fine with such upgrades in the past all of a sudden need the new LTS model when things are even easier than before.


Would those intermediate releases make breaking runtime changes like dropping nashorn, removing APIs and changing default encoding modes?

That’d be pretty bad behavior when maintaining backward compatibility.

I know of a very large education company that trained their support staff to downgrade the Java 8 version of end users when they experienced problems (until they dropped Java on the front-end for web). Maybe the feature releases is why?


They're not "intermediate releases". In the past there were three kinds of releases, major (every few years), feature (aka "limited update", every six months), and patch (every quarter). Now there are two: feature and patch, with the feature releases getting the integer numbers now that major releases are gone. Oracle's sales arbitrarily selects some feature versions for which to offer an LTS service, and other companies follow their choice. BTW, they can choose to offer LTS even for releases that have already been made and retroactively make them "LTS releases." There's absolutely nothing special about them, and the development of the JDK ignores the availability of such offerings. We produce feature releases, and if someone wants to pick some of them to offer support services for longer durations than for other versions -- that's up to them.

Feature releases, now and before, sometimes made what you call "breaking runtime changes" that might require changing the command line. Actual breaking changes to APIs are rare, now as before (e.g. the last major release, 9, removed sound 6 methods, I think, and that was probably the biggest such change in Java's history, although the future degradation of the Security Manager is probably bigger). One difference between feature releases now and then is that, with major releases gone, feature releases can change the spec. This virtually always means adding new APIs.

> Maybe the feature releases is why?

Feature releases existed in Java 8, too, people just forget because they didn't get their own version number back when major releases existed. They were even less reliable back then. The biggest factor in Java compatibility issues is without a doubt libraries relying on JDK internals. That was less of a problem in the 6-7 era for the simple reason that Java stagnated due to lack of resources in Sun's last years. JDK 16 finally turned on strong encapsulation, so this problem is likely to recede.

I'm not saying that upgrading feature releases is risk-free, but it's always been that way, only people forgot or didn't notice so much with the old numbering scheme, and LTS wasn't available then at all. And it's also likely that the upgrades now are slightly more difficult, but in exchange there is no need for major upgrades ever again. For actively developed code, upgrading with every feature release is overall easier, cheaper and safer than staying on an old version, skipping updates, and doing a big transition every few years.

When the version number scheme changed and LTS was introduced, many companies got confused and stopped their practice of upgrading to new feature releases. At the same time, many don't understand what LTS is or that the free offerings don't actually maintain the full JDK, just backport fixes from mainline to the intersection of the existing features (e.g. Nashorn and CMS aren't getting maintained in the free "LTS" offerings).


> Anyway, the mention of the perennially misunderstood Java LTS is a pet peeve of mine, so I'm sorry if this comment was overly aggressive.

I, for one, appreciate the repeat. I usually have to go hunting for one of your much older Reddit comments when this topic comes up at work.


Java Concurrency in Practice is a fantastic book. I had DL as a professor for about a half dozen courses in undergrad, including Concurrent and Parallel Programming. Absolutely fantastic professor, with a lot of insight into how parallel programming really works at the language level. One of the best courses I've taken.


Yeah Java gets a lot of grief, but I learned a lot about concurrent programming from making sure I really understood every line of code in this book.


I’m honestly so envious you had him as a professor.


Seems like a good development. I've been doing Node.js for last few years after letting go of Java. But there's something uneasy about async/await. For one thing it's difficult to debug how the async functions interact.


Debugging asynchronicity is complex in any language, no? Blocking endpoints are mostly irrelevent because it alters the temporal flow of things, which makes your code executed in debug mode not 100% « isomorphic » with your code executed in run mode.


That's seamless though. So if you have a failure you get a single stack trace that includes everything. In JS the debuggers sometimes glue stack traces together which works for basic stuff but incurs a major runtime overhead and doesn't work for production failures.

Locally the concept of multi-threaded debugging is easier than async-await since a single flow typically maps to a single thread and you can just step over it. If something happens asynchronously it's just IO and you can ignore that part. As far as you're concerned it's just one thread that you're stepping over/into. Variable context, scope etc. are maintained the same and passed seamlessly.


I'm trying to understand what exactly is different about debugging async/await vs. debugging threads. Isn't making an async-call the same as starting a new thread, from the programmer's point of view?

In my environment the WebStorm debugger I can debug async-calls in which I have halted in the stack. But I can not inspect the variable-values in the earlier "trace" that started the async-call.

Is it just a matter of debugger capabilities or is there something that makes thread-based debugging fundamentally less confusing?

Ah maybe I get it. When starting an async-call to read a file for instance the value of variables is no longer available in the callback. Whereas in a (real) thread they are, because from my point of view the thread was simply "sleeping" while the IO was happening. When the IO is over I back in the same context except I now have the read file-contents available for my inspection.

So, reading a file in a thread-based system does NOT require you to start a new thread, whereas in async-await you essentially do have to create a new async-context (which is like a new thread) to read a file. No?


WebStorm is indeed amazing. It takes separate stack traces and glues them together which means it needs to keep old stack traces in memory then constantly compare the objects passed along to figure out which stack to glue where.

As I said, it's problematic for production. So in the IDE you can indeed see the glued stack but in the browser... Or on the server...

Then there's the context. Since glued stack traces could be in-theory separate threads (at least in Java async calls) you might get weird behaviors where values of objects across the stack can be wildly different.

And no. You don't have a separate thread doing the IO. That's exactly the idea Loom is solving. Javas traditional stream IO is thread based. But we have a faster underlying implementation in NIO which uses the native select calls. The idea here is that a thread receives a callback when the connection has data waiting and can wait for data. This means the system can wake up the thread as needed, very efficiently. So there's no additional thread.


Yes I think I got that. I was not saying that Java creates a new thread for every IO operation but that async/await in JavaScript etc. must do something like starting a new "pseudo-thread". And that is why debugging in Java is easier - because it doesn't need to start a new thread. That's what I was trying to understand. Thanks.


Debugging is hard regardless of the concurrency model but building an understanding of what the code is supposed to be doing is way easier when the code reads sequentially versus async.

As far as debugging changing the scheduling of the program, it's not so bad when the tooling evolves out of the concurrency model, which I imagine will happen once virtual threads catch on in java. For example, in erlang, you can trace processes on a running system by pid, and basically get a sequence diagram of execution with messages between processes and function calls within a process, as well as the actual terms themselves. Because execution doesn't pause, you can even do it in production (if you're careful...). So while it's not a traditional debugger, in the "pause execution here" sense, it's still a way to inspect the system that fits well into an actor model. If such a thing doesn't exist in java already, I'm sure it will soon.


So can you actually see a diagram showing one process sending a message to another? Doesn't it become a rather large diagram easily?


Yes, and yes can become big, but you can scope it to certain processes, function calls, modules, pattern matches, etc, etc. So it's fine if you know what you're doing.

Legend has it that a major cell network was briefly taken offline due to a poorly thought out trace on a production system.


No?


Does you library use any of the JDK's blocking APIs like Thread.sleep, Socket or FileInputStream directly or transitively? If so, it is already compatible. The only thing you should check is if you're using monitors for synchronization which are currently causing the carrier thread to get pinned. The recommendation is to use locks instead.


Monitors as in the synchronized keyword? Because that is kind of a big one.


Yes but it only really matters if you're blocking on IO whilst inside a synchronized block. If you're using it to protect in-memory data structures then it's not a big deal.


Agree - Java Concurrency in Practice was a revelation when it came out. Well the whole concurrent api by Dr. Lea made it all so much saner. Very excited for virtual threads!


I don't know how they did it, but you could use that jep id as a query in the jdk issue tracker [0], and then use the issue tracker id to find the corresponding github issue [1]. (I had hoped for commits with that prefix, but there don't seem to be any for that issue.)

[0] https://bugs.openjdk.org/browse/JDK-8277131?jql=issuetype%20...

[1] https://github.com/openjdk/jdk/pull/8787


> I haven't been able to figure out how the "unmount" of a virtual thread works.

The native stack is just memory like any other, pointed to by the stack pointer. You can unmount one stack and mount another by changing the stack pointer. You can also do it by copying the stack out to a backing store, and copying the new thread's stack back in. I think the JVM does the latter, but not an expert.


Well one way is to replace "synchronized" blocks with ReentrantLocks where ever you can.


I would guess that LockSupport.park() and friends have also been adapted to support virtual thread unmounting.


I think it is really important development in Java space. One reason I plan to use it soon is because it does not bring in complex programing model of "reactive world" and hence dependency on tons of reactive libraries.

I tried moving plain old Tomcat based service to scalable netty based reactive stack but it turned out to be too much work and an alien programing model. With Loom/Virtual thread, the only thing I will be looking for server supporting Virtual threads natively. Helidon Nima would fit the bill here as all other frameworks/app servers have so far just slapping virtual threads on their thread pool based system. And unsurprisingly it is not leading to great perf expected from Virtual thread based system.


> How long until OS vendors introduce abstractions to make this easier? Why aren't there OS-native green threads, or at the very least user-space scheduling affordances for runtimes that want to implement them without overhead in calling blocking code?

Windows had has Fibers[0] for decades (IIRC since 1996 with Windows NT 4.0)

0. https://learn.microsoft.com/en-us/windows/win32/procthread/f...


Fibers are just one part of it. You also need a scheduler and a way to hook into existing blocking APIs.


Copying virtual stacks on a context switch sounds kind of expensive. Any performance numbers available? Maybe for very deep stacks there are optimizations whereby you only copy in deeper frames lazily under the assumption they won't be used yet? Also, what is the story with preemption - if a virtual thread spins in an infinite loop, will it effectively hog the carrier thread or can it be descheduled? Finally, I would be really interested to see the impact on debugability. I did some related work where we were trying to get the JVM to run on top of a library operating system and a libc that contained a user level threading library. Debugging anything concurrency related became a complete nightmare since all the gdb tooling only really understood the underlying carrier threads.

Having said all that, this sounds super cool and I think is 100% the way to go for Java. Would be interesting to revisit the implementation of something like Akka in light of this.


Yes, lazy copying tricks are employed, and some work round stack frames delayed until the stack is moved by the GC on the assumption most will not live that long.

There was a lot of work done on debugging so standard Java debuggers work well.


That's pretty cool! What about the blocking issue? Presumably also if you are using JNI all bets are off?


If you are using JNI then that puts a native frame on the stack and pins the virtual thread. Loom’s strategy works because we know what can be on the Java stack, and that nothing external points into it, but we don’t know that for native frames.


> Also, what is the story with preemption - if a virtual thread spins in an infinite loop, will it effectively hog the carrier thread or can it be descheduled?

As I understood it, there's no preemption. You're supposed to not do busy waiting on virtual threads (and better not do it at all, use wait-notify or a barrier or whatever). Virtual threads are for I/O-bound tasks. For CPU-bound tasks you'd want an OS thread per CPU core anyway.


They did implement pre-emption at some point and there are parts of that still in the code. It's intended to let you more easily manage load e.g. by de-scheduling complicated batch jobs that are low priority and then re-scheduling when the service is under lighter load. But it won't ship as public API as a part of Loom, maybe it will never ship (generally once OpenJDK finishes a project it gets de-staffed and they don't go back to it).


Stacks in idiomatic Java aren't usually that deep (pointers instead of in place values) so this isn't that big of a deal, unlike say in C++.


So right now it seems like you can replace the thread pool Clojure uses for futures etc with virtual threads and go ham. You could even write an alternative go macro to replace the bits of core.async where you’re not supposed to block. Feels like Clojure could be poised to benefit the most here, and what a delight it is to have such a language on a modern runtime that still gets shiny new features!


You wouldn’t need the go macro for anything with vthreads. If you continued to use core.async, you’d switch to the blocking variants, and ignore go entirely.


Yep core async will work as is just use <!! and friends.


This is good.

I implemented a userspace 1:M:N timeslicing thread, kernel thread to lightweight thread multiplexer in Java, Rust and C.

I preempt hot for and while loops by setting the looping variable to the limit from the kernel multiplexing thread.

It means threads cannot have resource starvation.

https://github.com/samsquire/preemptible-thread

The design is simple. But having native support as in Loom is really useful.


I like it! Do you have any sense for what the perf hit is for making those loops less hot to enable pre-emption?


There is no if statement in the hot loop or in the kernel thread so there is no performance cost there.

The multiplexing thread is separate to the kernel thread so you could say it's 1:M:N thread scheduling. I should have been clearer in my comment. There is 3 types of threads.

The multiplexing thread timeslices the preemption of the lightweight threads and kernel threads every 10 milliseconds. That is it stops all the loops in the lightweight thread and in the lightweight thread and causes the next lightweight thread to execute.

So there is no overhead except for a structure variable retrieval in the loop body

Rather than

For(int I = 0; I < 1000000; I++) {

}

We have

Register_loop(thread_loops, 0, 0, 1000000);

For(; thread_loops[0].index < thread_loops[0].limit; thread_loops[0].index++) {

}

Handle_virtual_interrupt()

And in the thread multiplexer scheduler, we do this

thread_loops[0].index = thread_loops[0].limit


But that means `thread_loops [0].index` must be read / written with atomic ops, right?

I saw that's how wasmtime solves pre-emption of wasm threads with their "epoch-based interruption". [1]

I see your Rust implementation does indeed use atomic ops.

Have you _measured_ that there is no overhead to atomics?

[1] https://docs.rs/wasmtime/1.0.1/wasmtime/struct.Config.html#m...

(Also it says that malicious guest code cannot skip the check, which is nice)


I didn't use atomic ops in Java or C.

It still works.

The risk is that a thread fails to reschedule/be preempted/end the loop in this timeslice but if the exact same interleaving happens to every other timeslice after, it shall never preempt. But this is unlikely in practice.

The chance of an incrementing at the end of a loop data race with setting the index to the limit and a preventing preemption a unlikely but possible. I assume the body of the loop has the most execution time, not the loop conditional and increment.

I've not measured the overhead of the atomics. In theory the atomics are on the slow path of the loop - they happen during preemption and at the end of a hot loop iteration The loop itself is the hot part that we don't want to slow down The worst we do is slow down the execution of the next hot loop with atomic set.

Atomic reads are cheap. I'm not sure if atomic writes are cheap if uncontended


The data race on the volatile version of the C code is technically UB, so you are at the mercy of the architecture, optimization settings and phase of the moon.

This is a great answer for more detail: https://stackoverflow.com/a/60482370


Thanks Matt. I shall need to do more testing.

The right answer is probably use atomics but that has a performance cost.

I am still trying to find out if atomics suffer from contendedness.

My perfect scenario is the atomic increment should be as cheap as the non atomic increment.

But I think the bus pausing and cache coherence protocols mean that the data is flushed from the store buffer to the memory, which is slow. I don't know if it acts as a lock and has an uncontended option


It occurred to me I can detect a potential race happening in the multiplexing thread. The code in the multiplexing thread could be slow and it would be fine (it doesn't affect hot loop performance) We essentially track whether or not we set the looping variable to the index and if it changes back due to a data race, we set it back again to correct it.


So happy this is finally coming out! After years of using the library that inspired this (fibers), I'm so stoked this is coming to the wide outside world of Java. There's just no comparison in how understandable and easy to program and debug this is compared to callback and event based programming.


Reading through the source code examples has me rethinking my dislike for Java. It sure seems far less verbose and kinda nice actually.


Modern Java is a lot less boilerplaty than old enterprise Java.


Modern Java people are a lot less boilerplaty than old enterprise Java people.


And which modern web framework is least boilerplate? Springboot?


I'm a big fan of Spark.

https://sparkjava.com/


Why Spark over Javalin? I switched to Javalin last year, it seems to have more active development.


Spark is pretty complete. I'm not sure what I'd add to it or change about it that wouldn't be making it bloated.

Seems Javelin is largely comparable, but with a bit more cruft (offering async calls sticks as an unusual decision given the way the language is heading).


What places are doing modern java development?


AWS, Netflix to name a few I know.


Definitely reconsider it, while Java is still often used in an “enterprise” style, modern Java brought in plenty of advancements and it is a quite expressive, while still readable language.

Which is kind of strange, I never understood people being so hyped about Go, while straight up refusing to touch Java for “verbosity”, when in fact the latter is much more terse. We sure are not too rational when it comes to tooling :D


I only shudder really learning it that if i got a job doing java i'd be doing horrible java < 8 codebase work


After all the hoopla surrounding concurrency models, it seems that languages are conceding that green threads are more ergonomic to work with. Go and Java have it, and now .NET is even experimenting with it.

How long until OS vendors introduce abstractions to make this easier? Why aren't there OS-native green threads, or at the very least user-space scheduling affordances for runtimes that want to implement them without overhead in calling blocking code?


> Why aren't there OS-native green threads, or at the very least user-space scheduling affordances for runtimes that want to implement them without overhead in calling blocking code?

Green threads are, definitionally, not OS threads, they are user space threads. So you will never see OS-native green threads as it's an oxymoron. The way many green thread systems work is to either lie to you (you really only have one OS thread, the green threads exist to write concurrent code, which can be much simpler, but not parallel code using Pike's distinction), or to introduce multiple OS threads ("carrier threads" in the terms of this article) which green threads are distributed across (this is what Java is doing here, Go has done for a long time, BEAM languages for a long time, and many others).

EDIT:

To extend this, many people think of "green threads" as lightweight threading mechanisms. That's kind of accurate for many systems, but not always true. If that's the sense that's meant, then OS-native lightweight threads are certainly possible in the future. But there's probably not much reason to add them when user space lightweight concurrency mechanisms already exist, and there's no consensus on which ones are "best" (by whatever metric).


Kernel support for green threads (or better, M:N) was a big topic 20 years ago. Look for "scheduler activation" for example.

It kind of petered off as kernel only threads were fast enough with significantly less complexity. In might be worth revisiting again.


Yeah, scheduler activations were one approach, albeit one that wasn't deemed effective. I think FreeBSD ended up removing the feature.

A more recent attempt that's interesting is ghOSt from Google: https://storage.googleapis.com/pub-tools-public-publication-....


The problem solved by SA is equivalently solved by a scalable multiplex IO syscall like kevent and userland scheduling. There's no particular need for the upcall if blocking information is available another way. I think it's dead.


You might want to preempt after scheduling quanta expirations, page faults, etc. And there is still the issue of existing code that use the standard blocking system APIs.

Also a synchronous API can be more efficient if it is not going to block and only the kernel can know.

In practice you are right, so far SA hasn't been worth the complexity.


> To extend this, many people think of "green threads" as lightweight threading mechanisms. That's kind of accurate for many systems, but not always true. If that's the sense that's meant, then OS-native lightweight threads are certainly possible in the future. But there's probably not much reason to add them when user space lightweight concurrency mechanisms already exist, and there's no consensus on which ones are "best" (by whatever metric).

Wouldn't it make sense to implement them kernel-side when looking at how every programming language seems to have to reinvent the wheel regarding green threads?


Green threads (today) aren't a singular thing, the definition is that they're in user space not kernel space. They are implemented in a variety of ways:

https://en.wikipedia.org/wiki/Green_thread

Do you imitate a more traditional OS-thread style with preemption, do you use cooperating tasks, coroutines, what? Since there is no singular best or consensus model, there is little reason for an OS to adopt wholesale one of these variations at this time.

The original green threads (from that page) shared one OS thread and used cooperative multitasking (most coroutine approaches would be analogous to this). But today, like with Go and BEAM languages, they're distributed across real OS threads to get parallelism. Which approach should an OS adopt? And if it did, would other languages/runtimes abandon their own models if it were significantly different?


Preemptive threads with growable stacks. There was some discussion around getting segmented stacks into the kernel, but I'm not sure that's the best approach. There might have to be some novel work done in making contiguous stacks work in a shared address space.


I think the reasons green threads can work in languages is that the runtime understands the language semantics, and can take advantage of them. The OS doesn't understand the language and its concurrency semantics, and only has a blob of machine code to work with.


Not really tbh. The Go runtime has a work-stealing scheduler and does a lot of work to provide the same abstractions that pthreads have, but for goroutines.


Erlang runtime has a work stealing scheduler but it has specific constraints like "how to make sure each running execution context gets a fair share of time and can't lock up the VM (I think you can still lock the go VM with an infinite loop)... So each VM has different concerns.


> still lock the go VM with an infinite loop

This is no longer an issue now with Go's async preemption. Also, this is less a "language-level concern" and more a consequence of a poor implementation of preemptive scheduling. Moreover, this is something OS threads already support.


Yes, but os threads come with overheads during context switches and for sure during the startup/teardown because you have to take a trip to the kernel to do those things. I don't see how it should be the os's business how to do m:n green threads. I think BEAM, for example can do a lot of clever coordination-free scheduling (it implements mutexes which are aware of the scheduling layout and I think don't bother to coordinate if they aren't necessary). In Erlang, the GC is green-thread aware, since green threads in Erlang don't share much memory -- I don't think there is a good generalized way for an OS to know these sorts of things without kneecapping some pl or another.

Arguably a better place for this is the standard library of your (low level) PL. I think rust and zig do this.


Project Loom experimentally proved that the benefit of virtual lies not in its fast context switches, but in the throughput afforded by having so many threads executed at once.

I am aware of GC-aware green threads. They allow you to combine your GC pause and scheduler pause into one. Again, this is not the end of the world, even when you don’t have shared mutable state. If anything, it makes it easier to build a runtime on top of.


> If anything, it makes it easier to build a runtime on top of.

This is not the point. The point is that it would be very difficult to make an OS green thread that is aware of the runtime's GC, given that an OS will probably want to also support non-gc languages.


> constraints like "how to make sure each running execution context gets a fair share of time and can't lock up the VM

IIRC that's because it preempts based on reduction counts rather than waiting on some thread to decide to yield itself.


Yeah, that's not easy for an OS to figure out from below.


> If that's the sense that's meant

Yeah that's what I meant, a lightweight threading mechanism provided by the OS.

> there's probably not much reason to add them when user space lightweight concurrency mechanisms already exist

Yeah... I don't think there's consensus on that. It seems that many people find OS threads to be an understandable concurrency model, but find them too heavyweight. So the languages end up introducing other abstractions at either the type-level (which has other benefits mind you!) or runtime to compensate.


> How long until OS vendors introduce abstractions to make this easier?

The OS-level abstraction is called M:N threads. It has always been supported by Java on Solaris. But it's not really popular elsewhere.


Maybe ironically given all the green thread experiments in industry, but Microsoft just removed their user-mode scheduling library in Windows 11.

https://learn.microsoft.com/en-us/windows/win32/procthread/u...


bsd variants and sun os used to have n:m APIs but they got deprecated and removed long time ago. It's hard to provide a consistent behavior that works for everyone.


I said in a sibling comment that scheduler activations may have been a flawed idea, but I don’t think the space of user-space scheduling APIs is fully explored. Google’s ghOSt work is an example of that. If io-uring is proof of anything, it’s that there’s still fundamental changes we can make in how we schedule work with the kernel.


There are : Fibers in NT for example.


> After all the hoopla surrounding concurrency models, it seems that languages are conceding that green threads are more ergonomic to work with. Go and Java have it, and now .NET is even experimenting with it.

Green threads were always better, the issue was compatibility with native (C) libraries. 20 years ago a language whose C FFI was as cumbersome as Go's would be laughed at.

> How long until OS vendors introduce abstractions to make this easier? Why aren't there OS-native green threads, or at the very least user-space scheduling affordances for runtimes that want to implement them without overhead in calling blocking code?

Most of the point of green threads is to avoid the OS getting in the way. Rather than OSes adding green threading, the future looks like sidestepping more and more of the traditional OS, ending up with unikernels or something that looks like them. The likes of graal native image are already a step in this direction.


> Green threads were always better

There’s a lot of people who advocate otherwise, but I don’t think we have the mechanisms for effect polymorphism in current languages good enough to make effects ergonomic.

> Most of the point of green threads is to avoid the OS getting in the way.

I see the OS as an environment to help me run programs. Language runtimes have been taking over that role over the last few decades, sure. However, whether or not unikernels are the way forward remains to be seen. I still see it worthwhile to improve the OS, however.


The extent of the OS relevancy is inversely proportionaly to the richness of the language runtime and ecosystem.

Or as Dan Ingalls from Smalltalk fame would put it, "an operating system is a collection of things that don't fit inside a language; there shouldn't be one".

Incidently that is pretty much true for those of us doing distributed computing in cloud environments with managed languages, their runtimes could even be running directly on top of type 1 hypervisor for what I care.


I spent the last 5 years learning reactive programming. I hate it. I'm looking forward to going back to something solid.


Java Reactive works around the shortcomings of the platform by basically inventing its own language inside Java. A high cost for the developer.

Under the hood, the virtual thread feature does more or less the same as reactive. Async calls via epoll, a scheduler in user space, .....


How does this compare to Processes in Elixir/Erlang -- is Java now as lightweight and performant?


according to the article:

> Virtual threads have more in common with the user-mode threads found in other languages, such as goroutines in Go or processes in Erlang -- but have the advantage of being semantically identical to the threads we already have.


And the disadvantages of global shared memory, stop the world pauses etc. For sure it is a step in the right direction but it depends on where you stand wrt having shared mutable state in your programming model. It's also not clear to me whether virtual threads can lock up carrier threads (e.g. due to an infinite loop) or are somehow preemptible (c.f. erlang reduction counts).


Java hasn't had significant stop-the-world pauses for a few years now. ZGC, the new low-latency collector, has <1ms pauses for heaps up to at least 4 TB. As to shared mutable state, virtual threads are completely orthogonal to that. You can write code with shared state or without, or, if you want language guarantees, you can use languages that greatly limit it, like Clojure. I would love to see a new implementation of Erlang for the Java platform.


They can lock up carrier threads and are not preemptible.

It is a bit subjective, but regarding threading Java often chooses to expose the basic primitives as is, and let you build on top. Erlang is an opinionated specialization of concurrent programming en large, which may be a better for for certain problems, but not for others.

Also, I don’t think global shared memory is a problem, one is free to use ExtentLocals (threadlocals, but virtual thread friendly) in Java. Also, while small actor-scoped heaps are cool, I don’t think that they have anything on the state-of-the-art GCs Java employ, and also stop the world pauses are not really a problem that comes up, and if they do it is likely some programmer error.


" Operating systems typically allocate thread stacks as monolithic blocks of memory at thread creation time that cannot be resized later. This means that threads carry with them megabyte-scale chunks of memory to manage the native and Java call stacks."

This extremely common misconception is not true of Linux or Windows. Both Windows and Linux have demand-paged thread stacks whose real size ("committed memory" in Windows) is minimal initially and grows when needed.


Do they shrink too? How many threads can be created before address space is exhausted (even if the memory isn't backed by pages, the address space is still reserved)?


The stack for any thread other than the first is just memory like any other allocation. You can free it, resize it, copy it elsewhere, whatever you want to do. Literally just a pointer in a register. People work up weird mythologies about it, but the stack can be anything you want if you're willing to write code to manage it.


You'll run out of physical memory for the first page of the stack long before you run out of room in the virtual address space.


Really hope this makes it to Android. (probably need to wait for a decade or two though)


Is this like async/await machinery in .NET?


Yes but transparent


for the record, i really don't know much about threads, so the following questions are probably kinda stupid.

first question: so, as the article states, the ONLY performance upside of virtual threads (versus os threads) is the number of inactive threads, thanks due to lower per-thread memory overhead.

for some reason i was expecting to read something about context switching cost too.

as far as i understand, virtual thread context switches are most likely between a lot cheaper and roughly as expensive than their carrier thread context switches, depending on how much memory has to be copied around and how to find the next thread to execute.

the problem here is that virtual context switches may be cheaper, but have to be executed in addition to the os thread context switches, so the overall efficiency is actually lower because more work is spent scheduling (os vs. os+virtual).

to minimize this it might be possible for privileged applications to disable os thread context switching for the carrier threads as long as there are active virtual threads. that way, the context switching and scheduling overhead is reduced from "os vs. os+virt" to "os vs. virt". i.e. as soon as there are active virtual threads the carrier thread is excluded for os scheduler until there aren't any active virtual threads anymore (or, alternatively, the virtual thread pool is empty).

is this a thing? does this make sense? would it be worth it? do operating systems even support "manual" (i.e. by the app) thread scheduling hints? or are the carrier threads only rarely taken out of schedule because they're not really put to sleep as long as there are active virtual threads anyway, making this a non-issue?

second question: as far as i understand blocking os threads, the scheduler stores which thread is waiting on which io resource and the appropriate thread gets woken up once a waited-on io resouce is available. this is not much of a problem with with a few hundred or thousand os threads, but now with virtual threads, the io resource must now be linked to the os thread for the virtual thread executor's scheduler by the os and then to the virtual thread waiting on the resource by the virtual thread scheduler. so for example if there are 100.000 inactive virtual threads waiting for a network response and one arrives, the os scheduler has to match it to an os thread first (the one the vt scheduler runs on) and then the vt scheduler has to match it to one of the virtual threads. i.e. two lookups in hashtables with 100.000 entries each (one io to os threads, the other io to vt). is this how it works or do i misunderstand this? as async models have the same issue but work fine i guess this isn't really a problem in practice. also, as far as i understand, the os thread woken up is given a kind of resouce id it's been woken up for, instead of "well, you went to sleep for a certain resource id so it's obvious which one you've been woken up for" in blocking IO).


You do not have to context switch the kernel thread when context switching the multiplexed user thread. So the context switch should in principle be much faster. There are second order effects of course, for example the new user thread might touch cold cache lines so the context switch speed up might not make much of a difference.

Normally on an M:N setup the kernel threads are pinned one per phisical hardware thread (i.e. core or SMT thread), so as long a they are the only program running in that core, they are never preempted.


> to minimize this it might be possible for privileged applications to disable os thread context switching for the carrier threads as long as there are active virtual threads. that way, the context switching and scheduling overhead is reduced from "os vs. os+virt" to "os vs. virt". i.e. as soon as there are active virtual threads the carrier thread is excluded for os scheduler until there aren't any active virtual threads anymore (or, alternatively, the virtual thread pool is empty).

> is this a thing? does this make sense? would it be worth it? do operating systems even support "manual" (i.e. by the app) thread scheduling hints? or are the carrier threads only rarely taken out of schedule because they're not really put to sleep as long as there are active virtual threads anyway, making this a non-issue?

It's normally recommended to run with as many "physical" threads as you have CPU cores, and then the OS scheduler can generally just do the right thing (assuming nothing else is running on the machine) - you don't need to do any context switches if you have as many processors as there are OS threads wanting to run. Most OSes do offer a way to "pin" a thread to a processor (at varying levels of hint/requirement) but I've only seen them used when doing fairly extreme performance tuning.


It's funny how it took Java almost a decade to finally implement goroutines.


You mean the co-routines that Modula-2 already had in 1978, or Erlang in 1986, not such big achievement on Go's part.

By the way, JVM green threads were available in 1996, however eventually red threads became the default implementation across all major JVM vendors.


> You mean the co-routines that Modula-2 already had in 1978, or Erlang in 1986, not such big achievement on Go's part.

What are you implying, that Go should've implemented goroutines before Go even existed?


That just like everything else in Go, it is only new for those that don't have their CS history up to date.


I never said a word about its novelty, I only pointed out it's funny how it took 10 years for Java to implement exactly the same functionality, because of how complex Java is. Even when implemented, some potholes in the implementation have yet to be fixed.

There's no need for such implicit insults.


Where is the decade coming from? From what I know the loom project was started a few years ago, perhaps in 2019.


As opposed to what platform?

Also, they did that in a completely backwards compatible way, so that programs that were written before Go was even a thing will also benefit, so if you will, Java has out of the doors orders of more “goroutine”-aware code as is, than Go will likely ever will.


> As opposed to what platform?

Go.

> programs that were written before Go was even a thing will also benefit

As far as I understand virtual threads, they have a new API for creating them, so older programs will at the very least need some source change and a recompilation in order to use them. Of course, if there's a JVM flag to automatically use virtual threads in place of OS threads, please correct me :)

> Java has out of the doors orders of more “goroutine”-aware code as is, than Go will likely ever will.

I honestly doubt that older code is "goroutine"-aware, as virtual threads are used in a different way than ordinary OS threads. I don't think there's any code that spawned hundreds of thousands of OS threads before, which would be the main benefit of using virtual threads.


The section "What about async/await?", which compares these virtual threads to async/await is very weak. After reading this article, I came away with the impression that this is a dramatically worse way to solve this problem than async/await. The only benefit I see is that this will be simpler to use for the (increasingly rare) programmers who are not used to async programming.

The first objection in the article is that with async/await you to may forget to use an async operation and could instead use a synchronous operation. This is not a real problem. Languages like JavaScript do not have any synchronous operations so you can't use them by mistake. Languages like python and C# solve this with simple lint rules that tell you if you make this mistake.

The second objection is that you have to reimplement all library functions to support await. This is a bad objection because you also have to do this for virtual threads. Based on how long it took to add virtual threada to Java vs adding async/await to other languages, it seems like virtual threads were much more complicated to implement.

The programming model here sounds analogous to using gevent with python vs python async/await. My opinion is that the gevent approach will die out completely as async/await becomes better supported and programmers become more familiar.

EDIT: Looking more at the "Related Work" section at the bottom. I think I understand the problem here. The "Structured Concurrency" examples are unergonomical versions of async/await. I'm not sure what I'm missing but this seems like a strictly worse way to write structured concurrent code.

Java example:

    Response handle() throws ExecutionException, InterruptedException {
        try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
            Future<String>  user  = scope.fork(() -> findUser());
            Future<Integer> order = scope.fork(() -> fetchOrder());

            scope.join();           // Join both forks
            scope.throwIfFailed();  // ... and propagate errors

            // Here, both forks have succeeded, so compose their results
            return new Response(user.resultNow(), order.resultNow());
        }
    }
Python equivalent

    async def handle() -> Response:
      # scope is implicit, throwing on failure is implicit.
      user, order = await asyncio.gather(findUser(), findOrder())

      return Response(user, order)
You could probably implement a similar abstraction in Java, but you would need to pass around and manage the the scope object, which seems cumbersome.


async/await require yet another world that's parallel to the "thread" world but requires its own "colour" and set of APIs. So now you have two kinds of threads, to kinds of respective APIs, and two kinds of the same concept that has to be known by all of your tools (debuggers, profilers, stacktraces).

> This is a bad objection because you also have to do this for virtual threads

No. We had to change a bit of the implementation -- at the very bottom -- but none of the APIs, as there is no viral async colour that requires doubling all the APIs.

You're right that implementing user-mode threads is much more work than async/await, which could be done in the frontend compiler if you don't care about tool support (although we very much do), but the result dominates async/await in languages that already have threads (there are different considerations in JS) as you keep all your APIs and don't need a duplicate set, and a lot of existing code tools just work (with relatively easy changes to accommodate for a very high number of threads).

> The "Structured Concurrency" examples are unergonomical versions of async/await.

They're very similar, actually.

We've made the Java example very explicit, but that code would normally be written as:

    Response handle() throws ExecutionException, InterruptedException {
        try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
            var user  = scope.fork(() -> findUser());
            var order = scope.fork(() -> fetchOrder());

            scope.join().throwIfFailed();
            return new Response(user.resultNow(), order.resultNow());
        }
    }
But when the operations are homogenous, i.e. all of the same type rather than different types as in the example above (Java is typed), you'll do something like:

    try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
        var fs  = myTasks.stream().map(scope::fork).toList();
        scope.join().throwIfFailed();
        return fs.map(Future:resultNow).toList();
    }
Of course, you can wrap this in a higher level `gather` operation, but we wanted to supply the basic building blocks in the JDK. You're comparing a high-level library to built-in JDK primitives.

Work is underway to simplify the simple cases further so that you can just use the stream API without an explicit scope.


This makes sense, especially the bit about tooling. I'm unfamiliar with the state of Java tooling besides very simple tasks.

On the other hand using things like debuggers and reading stack traces in python/js "just work" for me. Maybe because the tooling and the language have evolved together over a longer period of time.

I also feel like the reimplementation of all functions to support async is not a big deal because the actual pattern is generally very simple. You can start by awaiting every async function at the call site. New libraries can be async only.


> On the other hand using things like debuggers and reading stack traces in python/js "just work" for me. Maybe because the tooling and the language have evolved together over a longer period of time.

Well, Python and JS don't have threads, so async/await are their only concurrency construct, and it's supported by tools. But Java has had tooling that works with threads for a very long time. Adding async/await would have required teaching all of them about this new construct, not to mention the need for duplicate APIs.

> I also feel like the reimplementation of all functions to support async is not a big deal because the actual pattern is generally very simple. You can start by awaiting every async function at the call site.

First, you'd still need to duplicate existing APIs. Second, the async/await (cooperative) model is inherently inferior to the thread model (non-cooperative) because scheduling points must be statically known. This means that adding a blocking (i.e. async) operation to an existing subroutine requires changing all of its callers, who might be implicitly assuming there can't be a scheduling point. The non-cooperative model is much more composable, because any subroutine can enforce its own assumptions on scheduling: If it requires mutual exclusion, it can use some kind of mutex without affecting any of the subroutines it calls or any that call it.

Of course, locks have their own composability issues, but they're not as bad as async/await (which correspond to a single global lock everywhere except around blocking, i.e. async, calls)

So when is async/await more useful than threads? When you add it to an existing language that didn't have threads before, and so already had an implicit assumption of no scheduling points anywhere. That is the case of JavaScript.

> New libraries can be async only.

But why if you already have threads? New libraries get to enjoy high-scale concurrency and old libraries too!


I agree with your point that for CPU bound tasks, the threading model is going to result in better performing code with less work.

As for the point about locks, I think this one is also a question of IO-bound vs CPU bound work. For work that is CPU bottlenecked, there is a performance advantage to using threads vs async/await.

As for the tooling stuff, I'm still not really convinced. Python has almost always had threads and I've worked on multimillion line codebases that were in the process of migrating from thread based concurrency to async/await. Now JS also has threads (workers). I also use coroutines in C++ where threads have existed for a long time. I've never had a problem debugging async/await code in these languages, even with multiple threads. I guess I just have had good experiences with tooling but It doesn't seem that hard to retrofit a threaded language like C++/Python.


> I guess I just have had good experiences with tooling but It doesn't seem that hard to retrofit a threaded language like C++/Python.

But why would you want to if you can make threads lightweight (which, BTW, is not the case for C++)? By adding async/await on top of threads you're getting another incompatible and disjoint world that provides -- at best -- the same abstraction as the one you already have.


I think the async/await debugging experience is easier to understand. For example in the structured concurrency example, it seems like it would require a lot of tooling support to get a readable stack trace for something like this (in python)

Code

    import asyncio

    async def right(directions):
      await call_tree(directions)

    async def left(directions):
      await call_tree(directions)

    async def call_tree(directions):
      if len(directions) == 0:
        raise Exception("call stack");

      if directions[0]:
        await left(directions[1:])
      else:
        await right(directions[1:])

    directions = [0, 1, 0, 0, 1]
    asyncio.run(call_tree(directions))

Trace

    Traceback (most recent call last):
      File "/Users/mgraczyk/tmp/test.py", line 19, in <module>
        asyncio.run(call_tree(directions))
      File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/runners.py", line 44, in run
        return loop.run_until_complete(main)
      File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
        return future.result()
      File "/Users/mgraczyk/tmp/test.py", line 16, in call_tree
        await right(directions[1:])
      File "/Users/mgraczyk/tmp/test.py", line 4, in right
        await call_tree(directions)
      File "/Users/mgraczyk/tmp/test.py", line 14, in call_tree
        await left(directions[1:])
      File "/Users/mgraczyk/tmp/test.py", line 7, in left
        await call_tree(directions)
      File "/Users/mgraczyk/tmp/test.py", line 16, in call_tree
        await right(directions[1:])
      File "/Users/mgraczyk/tmp/test.py", line 4, in right
        await call_tree(directions)
      File "/Users/mgraczyk/tmp/test.py", line 16, in call_tree
        await right(directions[1:])
      File "/Users/mgraczyk/tmp/test.py", line 4, in right
        await call_tree(directions)
      File "/Users/mgraczyk/tmp/test.py", line 14, in call_tree
        await left(directions[1:])
      File "/Users/mgraczyk/tmp/test.py", line 7, in left
        await call_tree(directions)
      File "/Users/mgraczyk/tmp/test.py", line 11, in call_tree
        raise Exception("call stack");
    Exception: call stack


No, the existing tooling will give you such a stack trace already (and you don't need any `async` or `await` boilerplate, and you can even run code written and compiled 25 years ago in a virtual thread). But you do realise that async/await and threads are virtually the same abstraction. What makes you think implementing tooling for one would be harder than for the other?


How does the tooling know to hide the call to "fork" in the scoped task example?


JDK methods can be annotated as "internal" and optionally hidden in stack-traces, but in this case it's unnecessary. The fork call takes place on the parent thread and isn't part of any stack trace when an exception in a child occurs. The regular structuring of exception stack traces takes care of the rest.

Remember that Java has been multithreaded since its inception, and virtual threads don't change any of the threading model. They just make Java threads cheap. It's as if they've been there all along.


Wouldn't the exception actually come from throwIfFailed? Or does it come from resultNow? How does the tool show it as corresponding to the call to findUser?


throwIfFailed throws an exception that wraps one thrown by the child as a "caused by" and lists their stack traces. Java users have had this for many years, but threads were simply costly, so they were shared among tasks. The new thing structured concurrency brings -- in addition to making some best practices easier to follow -- is that the runtime now records parent-child relationships among threads (that now make sense when threads are no longer shared). You can see these relationships and the tree hierarchy for the entire application with a new JSON thread-dump.


I can see you having objections to their arguments against async/await, but what makes you say async/await is somehow the better solution?


There are a few reasons.

async/await allows you to do multiple things in parallel. I don't see how you can do that in the virtual threading model, although I haven't used it and only read this article. You would have to spin up threads and wait for them to finish, which IMO is much more complicated and hard to read.

javascript

    async function doTwoThings() {
      await Promise.all([
        doThingOne(),
        doThingTwo(),
      ]);
    }
python

    async def do_two_things() {
      await asyncio.gather(
        do_thing_one(),
        do_thing_two(),
      );
    }

Another issue is building abstractions on top of this. For example how do you implement "debounce" using virtual threads? You end up unnaturally reimplementing async/await anyway.

Finally it's generally much easier implement new libraries with a promise/future based async/await system than with a system based on threads, but I'm not familiar enough with Java to know whether this is actually a good objection. It's possible they make it really easy.


> async/await allows you to do multiple things in parallel. I don't see how you can do that in the virtual threading model, although I haven't used it and only read this article. You would have to spin up threads and wait for them to finish, which IMO is much more complicated and hard to read

I think there's a fundamental point of confusion here. In both python and JS, you can't do anything in parallel, since node/v8 and cpython are single-threaded (yes if you dip down into C you can spawn threads to your heart's content). You can only do them concurrently, since only when a virtual thread blocks can you move on and schedule another thread in your runtime.

In c++ (idk the java syntax, imagine these are runtime threads):

    std::thread t1(doThingOne, arg1);
    std::thread t2(doThingTwo, arg2);
    t1.join();
    t2.join();
    // boost has a join_all
I'm sure there's some kind of `join_all` function in Java somewhere. Imo this is even more clear than your async await example: we have a main thread, it spawns two children, and then waits until they're done before proceeding.

The traditional problem with async/await is that it forces a "are your functions red or blue" decision up-front (see classic essay https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...).

> Finally it's generally much easier implement new libraries with a promise/future based async/await system than with a system based on threads

How so? Having written a bunch of libraries myself, I have to say that not worrying about marking functions as async or not is a great boon to development. Just let the runtime handle it.


The first part is semantics, yes I understand that python is running one OS thread at a time with a GIL (for now). Just pretend I used the word "concurrent" instead of parallel in all the places necessary to remove the semantic disagreement.

Whether threads and joining vs async/await is clearer is a matter of taste and familiarity. I find async/await much more clear because that's what I am more used to. Others will disagree, that's fine. I suspect more people will prefer async/await as time goes on but that's my opinion.

> not worrying about marking functions as async or not is a great boon to development.

I don't really see why this is a big deal. You can change the function and callers can change their callsite. There are automated lint steps for this in python and javascript that I use all the time. It's not any different to me than adding an argument or changing a function name.


Part of the difference with Java is that a lot of libraries haven’t changed in twenty years because they already work. Adding async/await would probably mean writing an entirely new library and scrapping the old already working code, while green threads allow the old libraries to silently become better.


When you add an argument you don't have to change all callsites of transitive callers.


The only difference between how virtual threads work and how async/await work is that you don't need to use await and don't need to declare async. Just call .get() on a Future when you need a value - that is basically "await".

    void doTwoThings() {
      var f1 = doThingOne();
      var f2 = doThingTwo();
      var thingOne = f1.get();
      var thingTwo = f2.get();
    }


How do you implement doThingOne?

You should read the "structured concurrency" link in the article. You have to explicitly wrap the call to doThingOne in a future under a structured concurrency scope. The code example you wrote is not going to be possible in Java without implementing doThingOne in a complicated way.


You don't have to use structured concurrency. The two methods use a virtual thread executor and return the result of submitting work. Structured concurrency is great but not necessary.

https://www.reactivesystems.eu/2022/06/17/introduction-to-pr...


great, now you have futures AND virtual threads. soo much better!


It is significantly better. The issue with futures is chaining them together and having to thread continuations by hand. Async/await is an improvement over that, but when you can just block instead they become much nicer to use.


Java Futures are lobotomized.

What is really nice is to treat Futures as monads (as in Scala, and FP languages like Haskell). Then you can chain them together using calls to flatMap aka bind. Then you can use for-comprehensions (or the equivalent):

  for {
    a <- taskA
    b <- taskB
    c <- taskC
  } yield a + b + c
There are a ton of other fancy tricks found in Haskell or libraries such as Cats.


You can have futures with "real threads" as well, futures are merely an abstraction to getting "output" of a thread in a non-blocking fashion.


> async/await allows you to do multiple things in parallel. I don't see how you can do that in the virtual threading model, although I haven't used it and only read this article.

The description of this is that the virtual threads can move between platform threads, quoting from the article:

> The operating system only knows about platform threads, which remain the unit of scheduling. To run code in a virtual thread, the Java runtime arranges for it to run by mounting it on some platform thread, called a carrier thread. Mounting a virtual thread means temporarily copying the needed stack frames from the heap to the stack of the carrier thread, and borrowing the carriers stack while it is mounted.

> When code running in a virtual thread would otherwise block for IO, locking, or other resource availability, it can be unmounted from the carrier thread, and any modified stack frames copied are back to the heap, freeing the carrier thread for something else (such as running another virtual thread.) Nearly all blocking points in the JDK have been adapted so that when encountering a blocking operation on a virtual thread, the virtual thread is unmounted from its carrier instead of blocking.

This allows for parallelism so long as the system is multicore and the JVM has access to multiple parallel threads to distribute the virtual threads across.


Two separate threads run in parallel, but one thread cannot do two subtasks in parallel without submitting parallel jobs to an executor or a StructuredTaskScope subtask manager. It's basically forcing the developer to do all the hard work and boilerplate that async/await saves you.


> It's basically forcing the developer to do all the hard work and boilerplate that async/await saves you.

It doesn't. Both require the exact same kind of invocation by the user. Neither automatically parallelises operations that aren't explicitly marked for parallelisation.


Note that does do not natively reflect structured concurrency - and a Promise.any version does even less so.

The code above would „return“ once the first Promise runs to completion. The computation that is described by the second Promise would however continue to run - which could cause unexpected side effects. The goal of structured concurrency is to totally avoid those by making sure the lifetime of child tasks is always contained within the lifetime of their parent tasks.

Bare async/await only marginally guarantees that, and the amount depends on the language. Eg the JS code above will never guarantee it, since each async function is really a separate entity on the eventloop. Rust async/await can do it if you all use are low level future combinators (like futures::join! or select!. But if the code spawns any child tasks then all bets are off too.

The Java example with scoped executor will have proper guarantees. Waiting on the executor means waiting for all tasks to complete. And as soon as the first task errors other tasks are „asked“ to stop early via an InterruptedException.


Ah thanks, I didn't think of it like that. I've never used actual structured concurrency in JavaScript or Rust, but I python I have worked on codebases that do this. Seems increasingly common.

It seems like most of the time you don't want or need the full flexibility of async/await, but I don't think the guaranteed structure is worth the benefits to me if the language doesn't support it natively. Too much boilerplate, and static analysis is usually pretty good about catching mistakes, in python at least.


async/await allows you to do multiple things in parallel. I don't see how you can do that in the virtual threading model

When comparing to JS, it is the other way around. Unless you are talking about IO bound tasks only where nodejs delegates to a thread pool (libuv).


With virtual threads, you need to write fork/join code to do two subtasks. With async await, you call two async functions and await them. So the virtual threading model ends up requiring something that looks like a worse version of async await to me.


  t1 = async(Task1)
  t2 = async(Task2)
  await t1
  await t2

  t1 = fork(Task1)
  t2 = fork(Task2)
  t1.join()
  t2.join()
What's the difference?


If Java adds some nice standardized helpers like this, they will look equivalent. The current proposal is not this clean but that doesn't mean it won't be possible. The key difference is that async/await implies cooperative multitasking. Nothing else happens on the thread until you call await. I find that an easier model to think about, and I opt into multithreading when I need it.

Anyway Rust does this roughly using roughly the syntax you described (except no need to call "fork"). Languages that use async/await do not require you to say "async" at the call site.


I was using `async` as shorthand for "whatever it takes in a particular async implementation to launch an asynchronous task". Here's an example that more or less mirrors what I showed there:

  async def main():
      task1 = asyncio.create_task(
          say_after(1, 'hello'))

      task2 = asyncio.create_task(
          say_after(2, 'world'))

      print(f"started at {time.strftime('%X')}")

      # Wait until both tasks are completed (should take
      # around 2 seconds.)
      await task1
      await task2

      print(f"finished at {time.strftime('%X')}")
https://docs.python.org/3/library/asyncio-task.html

It was a response to a particular thing you said, but I didn't quote because I thought it was obvious (my bad):

> With virtual threads, you need to write fork/join code to do two subtasks. With async await, you call two async functions and await them.

fork/join and async(for you: "whatever it takes in a particular async implementation to launch an asynchronous task")/await have the same pattern (if you need the synchronization point). That example above is a fork/join pattern but without the words "fork" and "join".


> Languages that use async/await do not require you to say "async" at the call site.

Nope, but if you start calling an async function from within a non-async function, then that function now has to become async, and all of the callers of that function have to become async. This isn't a problem with the green threads approach.


In Python you won't get any concurrency by calling two async functions then awaiting each of them, but in Javascript you will.


> Languages like JavaScript do not have any synchronous operations so you can't use them by mistake.

Can you explain what you mean by this? Isn't it the opposite - Javascript has a synchronous execution model?


Aside from NodeJS-specific APIs, JS as a whole does not generally have any synchronous I/O, locks, threads etc. SharedArrayBuffer is probably the notable exception as it can be used to build synchronous APIs that implement that functionality if I’m not mistaken.

Unless by synchronous you meant single threaded in which case JS is indeed single threaded normally (unless you’re using things like Web Workers).


I mean what the article calls a "synchronous blocking method", which javascript (mostly) does not have.


Keeping familiarity aside, why would one use Java/JVM instead of nodejs for the server of a web app? I need to call SOAP services.


What are the benefits of using nodejs over Java?


SSR React, using the same language in the server as in the browser and not wrangling between to completely different ecosystems.


I get that the other way around by using Scala.js in the browser; I find it a much nicer language to work with than Javascript or even Typescript.


Java has a solid frontend framework. I discussed it recently in relation to SSR React:

https://nocodefunctions.com/blog/java-frontend-web-app/


When one cares about performance.


And maintainability. And quality of libraries.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: