Hacker News new | past | comments | ask | show | jobs | submit login

The former, until you run git gc, in which case the latter happens.



Wow! Running git gc just shrunk my .git folder from 18MB to only 3MB.


git gc runs automatically when some threshold is met in the number of "loose" objects (objects that were created by e.g. git add and that haven't been packed yet). I don't remember what the threshold is, though.


I believe loose objects are also packed when pushing to a remote or fetching from it. I wonder if the remote's .git folder is also about 3 MB in size.


Indeed, the git protocol exchanges packs.


did it have binary data? if so try moving to lfs and running bfg cleaner and watch it shrink even more


No binary data. Just an ordinary git repo with a bunch of source code files (mostly C and Lua).


thats even more impressive really, reckon decent history and never pruned before


Do you know if there's an easy way to do this without running git gc? Say by just supplying a diff or something?

In particular I'm thinking of the case where there's an append-only file keeping a log, to commit the changes every so often without having to make a blob copy of the file first (which may be relatively large)?


There isn't one at the moment.

The closest would be to create a pack manually that contains a diff against the previous version but that'd require manual work.

It would be possible, for example, to modify git-fast-import to a) allow to take diffs as input b) allow to store those diffs (these are different things to deal with). The downside is that the more packs there are, the slower object lookup is, which can make everything much slower. Newer versions of git have cross-pack indexes to deal with that, though, but I don't think that's enabled by default.

Another option would be to add a new format for loose objects that allows to store diffs, but that has backwards compatibility implications.


This is sort of a worst-case scenario for git: there are patch-based DVCSes that would handle this scenario better (darcs and pijul), but those have their own set of trade offs (I think they end up being slower for large histories).


Applying a patch in Darcs usued to be in O(2^n), and now apparently O(n^2), where n is the size of history.

Applying a patch in Pijul is in O(p c log n), where p is the size of the patch and c the size of the largest "deletion-insertion conflict" p is involved in, where a "deletion-insertion conflict" is a situation where Alice deletes a block of text while Bob adds stuff in that same block.

Note that this is a rough bound, since all non-conflicting operations in a patch are in O(log n), except those involved in a "deletion-insertion conflict", which are in O(c log n).

So, Pijul is in fact faster than Git for merging (and rebasing). The only tradeoff at the moment is that going arbitrarily far back in history isn't as fast as it could be (this will be fixed very soon).


Interesting, I’ve been following these systems for a while, but I didn’t realize that pijul had solved the performance issues.

Is pijul’s on-disk format stable yet?


> Is pijul’s on-disk format stable yet?

Probably. The patch format is very unlikely to change. The repository format may change a little bit still.

I'd say it's probably ok to try and learn it now, but you should maybe wait for a few weeks before using it for something serious. On the other hand, we use it for itself, and I use it personally for most of my projects.


Eh? Why would git gc get rid of the old blob? It will still be referenced by the old commits.


It doesn't. It packs it.


Is git gc the only thing that can trigger packing?



There is also git pack-objects and git repack.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: