Why wouldn’t the first assumption be that the bulk of these differences arise from compiler and runtime library (memcpy, etc.) efficiency distinctions rather than the OS?
I appreciate that to the end user it doesn’t matter which parts of the stack are better tuned, but framing this as “Windows vs. Linux” seems unjustified without more evidence.
I can't find any mention of the compilers, compiler settings or whether any prebuilt binaries were used. That makes it even harder to distinguish between compiler and runtime performance impacts.
I appreciate that to the end user it doesn’t matter which parts of the stack are better tuned, but framing this as “Windows vs. Linux” seems unjustified without more evidence.