> My experiments suggest that about a quarter of this could be eliminated simply by enabling and then mandating the use of LZMA (XZ) compression for sdists and wheels (which currently must both use ordinary Gzip compression).
That won't happen as there is a slow effort behind the scenes to get Zstandard in the standard library, which is far better.
Why is the effort "slow"? As far as I can tell, LZMA is still better than ZSTD at compression. ZSTD is just faster. So LZMA seems to be a better solution to the bandwidth cost problem.
While I agree to your conclusion, the correct analysis needs to look at both the expected transfer rate and the decompression speed because otherwise you might end up picking a very slow algorithm (like, literally 10 KB/s). LZMA is thankfully reasonably fast to decompress (> 10 MB/s) so it is indeed a valid candidate, though the exact choice would heavily depend on what is being compressed and who would actually do the expensive compression.
Currently the model is that compression is done by the package uploaders, but I don't see a reason why uploaded files couldn't be (re)compressed on the server. Again, there would be vastly fewer compression events than decompression (after download) events. Aside from that, it's better if the standards allowed for multiple compression formats. Any effort to start using a new one for existing files in a different format, could then be phased in (prioritizing the packages where the biggest savings are possible).
> I don't see a reason why uploaded files couldn't be (re)compressed on the server.
The reason why would be of course the computational expense and latency in the package availability, which will vastly limit the algorithm choice. LZMA is probably still okay under this constraint (its compression speed is in the order of 1 MB/s for the slowest setting), but the gain from using LZMA instead of Zstandard is not that big anyway.
I presume that the vast majority of big wheel files are for compiled and bundled shared libraries. (Plain Python source files are much easier to compress.) You can estimate the potential gain from looking at various compression benchmarks, such as the `ooffice` file for the Silesia benchmark [1], and the difference is not that big: 2426 KB for 7zip (which uses LZMA2 with BCJ filter) and 2618 KB for zstd 0.6.0---less than 10% difference. And I believe, from my experience, that BCJ is responsible for most of that difference because x86 is fairly difficult to compress without some rearrangement. The filter is much faster than compression and at least as fast as decompression (>100 MB/s), so there is not much reason to use LZMA when you can instead do the filtering yourself.
I think you're thinking of UV, this is a project that they adopted. As far as I understand UV is where development is happening and actually Rye is soft deprecated.
Very interesting; I'm quite excited for these future performance improvements! In small-medium sized projects the time isn't very noticeable but larger projects definitely are noticeable in CI.
edit: I should say for larger test suites rather than the project itself, but larger projects do tend to have larger test suites.
Yes, I saw the issue was already reported in hatch and being worked on so I assumed it'd make its way into rye eventually! Thanks for closing the loop!