Most application webservers (by default) handle one request per thread. For mostly IO bound stuff (which many projects are), it makes sense to me that threads become a bottleneck in relatively ordinary scenarios.
The scenario where your IO could handle way more than a thousand concurrent requests if only the thread overhead was reduced? When does that ever happen?
Each OS thread costs memory. With the version of Java I have, the default is to allocate 1MB of stack for each thread. So, 10,000 threads would require 10,000 MB of RAM even if we configured ulimit to allow that many threads. In contrast, asking the kernel to do buffered reads of 10,000 files in parallel requires much less memory - especially if most of those are actually the same physical file. Of course, they won't be read fully in parallel.
For example, this program:
var threads = new Thread[20000];
for (int i = 0; i < 20000; i++) {
threads[i] = Thread.ofVirtual().start(() -> {
try {
Files.copy(FileSystems.getDefault().getPath("abc.txt"), System.out);
} catch (IOException e) {
System.err.println("Error writing file");
e.printStackTrace();
}});
}
for (int i = 0; i < 20000; i++) {
threads[i].join();
}
Run as `java Test > ./cde.txt` takes about 4.5s to run on my WSL2 system with 2 cores, writing a 2 GB file (with abc.txt having 100KB); even this would be within the HTTP timeout, though users would certainly not be happy. Pretty sure a native Linux system on a machine beefy enough to be used as a web server would have no problem serving even larger files over a network like this.
1. You are not solving a real problem. The use case you describe (basically a CDN) is already exotic, the scenario where such a system would have already been implemented with Java and its basic IO seems implausible.
2. You did not compare against fewer threads to see if threads are actually the bottleneck rather than IO. Also, all your threads are competing for stdout.