Hacker News new | past | comments | ask | show | jobs | submit login

> Git’s packfile format uses delta compression, which is essentially “storing diffs”.

FWIW it's closer to lossless compression e.g. LZ77's behaviour: git's delta compression has two commands which are LITERAL and COPY, and technically it doesn't need to copy linearly from the base object, it would be unlikely[0] but a delta could start by COPY-ing the end of the base object, then add literal data, then COPY the start of the base object (or COPY the end again, or whatever, you get the point).

[0] outside of handcrafting, as the packing process would need to find such a situation which it would estimate useful




Many years ago, I wrote a some code that made pack files with out of order copy commands. (Just messing around in Go trying to learn how pack files worked.)

I was too lazy to implement the diffing algorithm, so I used the suffixarray from the go standard library and greedily grabbed the longest match at each point in the input. It's been a while, but I think I got a 12% improvement on my test case. (Probably wouldn't do that well in general.)

I'm assumed the git code just runs off of a diff (longest common substring), but I never checked.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: