I wonder if the trick might be to repurpose technology from server hardware: partition the physical NIC into virtual PCI-e devices with distinct addresses, and map to user-space processes instead of virtual machines.
So in essence, each browser tab or even each listening UDP socket could have a distinct IPv6 address dedicated to it, with packets delivered into a ring buffer in user-mode. This is so similar to what goes on with hypervisors now that existing hardware designs might even be able to handle it already.
I've often pondered if it was possible to assign every application/tab/domain/origin a different IPv6 address to exchange data with, to make tracking people just a tad harder, but also to simplify per-process firewall rules. With the bare minimum, a /64, you could easily host billions of addresses per device without running out.
I think there may be a limit to how many IP addresses NICs (and maybe drivers) can track at once, though.
What I don't really get is why QUIC had to be invented when multi-stream protocols like SCTP already exist. SCTP brings the reliability of TCP with the multi-stream system that makes QUIC good for websites. Piping TLS over it is a bit of a pain (you don't want a separate handshake per stream), but surely there could be techniques to make it less painful (leveraging 0-RTT? Using session resumptions with tickets from the first connected stream?).
First and foremost, you can't use SCTP on the Internet, so the whole idea is dead on arrival. The Internet only really works for TCP and UDP over IP - anything else, you have a loooooong tail of networks which will drop the traffic.
Secondly, the whole point of QUIC is to merge the TLS and transport handskakes into a single packet, to reduce RTT. This would mean you need to modify SCTP anyway to allow for this use case, so even what small support exists for SCTP in the large would need to be upgraded.
Thirdly, there is no reason to think that SCTP is better handled than UDP at the kernel's IP stack level. All of the problems of memory optimizations are likely to be much worse for SCTP than for UDP, as it's used far, far less.
I don't see why you can't use SCTP over the internet. HTTP2 has fallbacks for broken or generally shitty middleboxes, I don't see why the weird corporate networks should hold back the rest of the world.
TLS already does 0-RTT so you don't need QUIC for that.
The problem with UDP is that many optimisations are simply not possible. The "TCP but with blackjack and hookers" approach QUIC took makes it very difficult to accelerate.
SCTP is Fine™ on Linux but it's basically unimplemented on Windows. Acceleration beyond what these protocols can do right now requires either specific kernel/hardware QUIC parsing or kernel mode SCTP on Windows.
Getting Microsoft to actually implement SCTP would be a lot cleaner than to hack yet another protocol on top of UDP out of fear of the mighty shitty middleboxes.
WebRTC decided they liked SCTP, so... they run it over UDP (well, over DTLS over UDP). And while HTTP/2 might fail over to HTTP/1.1, what would an SCTP session fall back to?
The problem is not that Windows doesn't have in-kernel support for SCTP (there are several user-space libraries already available, you wouldn't even need to convince MS to do anything). The blocking issue is that many, many routers on the Internet, especially but not exclusively around all corporate networks, will drop any packet that is neither TCP or UDP over IP.
And if you think UDP is not optimized, I'd bet you'll find that the SCTP situation is far, far worse.
And regarding 0-RTT, that only works for resumed connections, and it is still actually 1 RTT (TCP connection establish). New connections still need 2-3 round trips (1 for TCP, 1 for TLS 1.3, or 2 for TLS 1.2) with TLS; they only need 1 round trip (even when using TLS 1.2 for encryption). With QUIC, you can have true 0-RTT traffic, sending the (encrypted) HTTP request data in the very first packet you send to a host [that you communicated with previously].
How is userspace SCTP possible on Windows? Microsoft doesn't implement it in WinSock and, back in the XP SP2 days, Microsoft disabled/hobbled raw sockets and has never allowed them since. Absent a kernel-mode driver, or Microsoft changing their stance (either on SCTP or raw sockets), you cannot send pure SCTP from a modern Windows box using only non-privileged application code.
Per these Microsoft docs [0], it seems that it should still be possible to open a raw socket on Windows 11, as long as you don't try to send TCP or UDP traffic through it (and have the right permissions, presumably).
Of course, to open a raw socket you need privileged access, just like you do on Linux, because a raw socket allows you to see and respond to traffic from any other application (or even system traffic). But in principle you should be able to make a Service that handles SCTP traffic for you, and a non-privileged application could send its traffic to this service and receive data back.
I did find some user-space library that is purported to support SCTP on Windows [1], but it may be quite old and not supported. Not sure if there is any real interest in something like this.
Interesting. I think the service approach would now be viable since it can be paired with UNIX socket support, which was added a couple of years ago (otherwise COM or RPC would be necessary, making clients more complicated and Windows-specific). But yeah, the lack of interest is the bigger problem now.
SCTP works fine on internet, as long your egress is comming from public IP and you don't perform NAT. So in case IPv6 its non issue at all unless you sit behind middle boxes.
residential - not many. Corporate on other hand is different story, thus why happy eyeballs for transport still would needed to gradual rollout anyway.
I doubt there is, because it's just not a very popular thing to even try. Even WebRTC, which uses SCTP for non-streaming data channels, uses it over DTLS over UDP.
So in essence, each browser tab or even each listening UDP socket could have a distinct IPv6 address dedicated to it, with packets delivered into a ring buffer in user-mode. This is so similar to what goes on with hypervisors now that existing hardware designs might even be able to handle it already.
Just an idle thought...