Hacker News new | past | comments | ask | show | jobs | submit login

Deduplicating compressed data is difficult if you want to be able to reconstruct the original. Zip for example has many ways to compress the data, so it's difficult it decompress a zip and keep the information necessary to reconstruct the exact same compressed file.

I'm aware of one tool for zip:

https://pypi.org/project/deterministic-zip/

of course that only helps if it's used to create the original zip.

Is there a list of "reversible" formats somewhere? Are there other ways to deal with this?




It would be great if this was built into zip and tar.

I've been doing an awful find, xargs, and touch combo to get deterministic zips, but the cli args vary between unix distributions.

It would be better if zip and tar could just take a single timestamp arg, and use that for all entries.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: