I have to admit that I learned a lot of these things fairly recently. The large repository stuff has been added into core piece by piece by Microsoft and GitHub over the last few years, it's hard to actually find one place that describes everything they've done. Hope it's helpful.
I've also had some fun conversations with the Mercurial guys about this. They've recently started writing some Hg internals in Rust and are getting some amazing speed improvements.
I'm also thinking of doing a third edition of Pro Git, so if there are other things like this that you have learned about Git the hard way, or just want to know, let me know so I can try to include it.
"git fza" shows a list of modified/new files in an fzf window, and you can select each file with tab plus arrow keys. When you hit enter, those files are fed into "git add". Needs fzf: https://github.com/junegunn/fzf
"git gone" removes local branches that don't exist on the remote.
"git root" prints out the root of the repo. You can alias it to "cd $(git root)", and zip back to the repo root from a deep directory structure. This one is less useful now for me since I started using zoxide to jump around. https://github.com/ajeetdsouza/zoxide
This one I'm less sure about. I haven't yet gotten it to the point where I really like using it, but I'm sharing since someone might find it useful as a starting point:
It's intended to be used for creating a cherry-picking branch. You give it an branch name, let's say "node", and it creates a branch with that as its parent, and the short commit hash as a suffix. So running "git brancherry node" creates the branch "node-abc1234" and switches to it.
The intended workflow being you cherry pick into that branch, create a PR, which then gets merged into the parent.
Be prepared to hand out the difftastic URL and install instructions a lot :) I get asked "what git setting is that?" when I do diffs while sharing my screen.
meta-tip: you can also put your aliases that start with '!' into stand-alone shell scripts named `git-fza` (e.g.) and then call it as `git fza` which will search your PATH for `git-fza` and invoke it as if it's built-in.
I do this for some of my more complicated aliases because I generally think it's poor form to embed shell scripts into configuration languages. (Looking at you, yaml.)
The scripts have to be in your `PATH` and be executable from wherever you're running `git`.
Say you have a script named `git-foo`. At the shell prompt, all of these should work:
$ which git-foo
$ git-foo
$ git foo
If the first or second commands fail, then `git-foo` is not in your PATH or is not executable. If those both work but the third command fails, I have no explanation. Here's the code which runs commands:
It basically prefixes your `PATH` (or a suitable default if `PATH` isn't set) with `GIT_EXEC_PATH` (defaulting to a compiled in value if not set) and then uses the normal Unix execvp machinery to run the command.
You can try:
$ GIT_TRACE=1 git foo
But I'm not sure that will tell you anything helpful.
$ which git-foo
$ type git-foo
git-foo is /Users/my.user/.local/bin/git-foo
$ git-foo --help
Help output from git-foo
$ GIT_TRACE=1 git foo
14:26:18.849078 git.c:749 trace: exec: git-foo
14:26:18.849815 run-command.c:657 trace: run_command: git-foo
git: 'foo' is not a git command. See 'git --help'.
I can't say I've ever seen `which` and `type` disagree before ...
And today-I-learned that while bash expands `~` in PATH entries other programs do not. The fix was changing my PATH from:
git has had this behavior for at least a decade. As well, macOS does not ship with git - it's installed as part of either the Command Line Tools package and/or Xcode and is reasonably up to date.
For many years now, macOS has included what are effectively wrappers in /usr/bin for the various development tools and that use the xcode-select mechanism to run the actual command. If neither Xcode nor the CLT package are installed, you'll get a prompt to install the CLT package.
In the part about whitespace diffs, you might want to mention ignore-revs-file [0]. We check an ignore-revs file into the repo, and anyone who does a significant reformat adds that SHA to the file to avoid breaking git-blame.
One other thing you might want to be mention, which is obvious after thinking about it, is that updating the ignore-revs file has to occur in a commit after the one that you want to ignore, since you don't know what that first commit's ID is till after you make it. :-)
One thing about git I learned the hard way is the use of diffs and patches (more accurately, 3-way merges) for operations like merging, cherry picking and rebasing. Pro-git (correctly) emphasizes the snapshot storage model of git - it helps a lot in understanding many of its operations and quirks. But the snapshot model can cause confusion in the case of the aforementioned operations - especially rebasing.
For example, I couldn't understand why the deletion/dropping of a commit during a rebase caused changes to all subsequent commits. After all, I only asked for a snapshot to be dropped. I didn't ask for the subsequent snapshots to be modified.
Eventually, I figured out that it was operating on diffs, not snapshots (though storage was still exclusively based on snapshots). The correction on that mental model allowed me to finally understand rebasing. (I did learn later that they were 3-way merges, but that didn't affect the conclusions).
That assumption was eventually corroborated somewhere in Pro-Git or the man pages. But I couldn't find those lines again when I searched it a second time. I feel that these operations can be better understood if the diff/patch nature of those operations are emphasized a bit more. My experience on training people in rebasing also supports this.
PS: Thanks for the book! It's a fantastic example of what software documentation should look like.
I guess while it's true the storage layer is snapshot based, as you say, that only gets you so far conceptually, and it's probably best to focus on the _operation_ you're doing, as rebase, cherry-pick, apply-patch, etc are easier to think in terms of diffs.
When I used to use Phabricator, the fact that I could always fall back to handing it a raw patch file to submit changes also made it easier to reason about (regardless of what the server and client were actually doing).
What I'd stress out is that rebasing is nothing else than automated cherry-picking, as it's hard to imagine cherry-picking in any other way than 3-way merge or patch operation.
> Eventually, I figured out that it was operating on diffs, not snapshots
The snapshot include all the history that led to the current snapshot. So even if you did a squash instead of dropping, you're changing everything that depends on that
> The snapshot include all the history that led to the current snapshot
Git snapshots don't contain any history, other than the commit chain (reference to the parent commit/s) in the commit object. While the storage format is a bit complex, they behave fundamentally like a copy of the working tree at the point of commit.
> So even if you did a squash instead of dropping, you're changing everything that depends on that
Squashes don't change the subsequent commits/snapshots either, other than the commit ID and chain. The tree itself remains untouched. You can verify this.
Yes, but since they change the commit id, and since each commit has a pointer to its parent commit, you have to rewrite all the commit objects, even if the tree doesn't change.
The context of my first comment is about how a rebase affects the contents of the tree. It's about predicting what happens to subsequent snapshots. Bringing commit ID and parent commit into that context complicates and obscures the point I'm trying to convey.
I met you and we chatted for a bit at a bar after hours at a tech conference years ago, before you dropped you were a GitHub co-founder towards the end. You actually gave me some advice that has worked out well for me. Just wanted to say thanks!
One question that I have is what is happening to large file support within Git? Has that been merged into the core since Microsoft changes have also made it into core. Obviously there is a difference in supporting very many small files or a few very large files but won't it make sense to roll LFS into core as well?
What a great question. If I recall correctly, the LFS project is a Go project, which makes it difficult to integrate with Git core. However, I believe that the Git for Windows binary _does_ include LFS out of the box.
There was a discussion very recently about incorporating Rust into the Git core project that I think had a point about LFS then being viable due for some reason, but I'd have to find the thread.
I remember watching your FOSDEM talk on YouTube, where you asked whether people have rerere turned on _and_ know what it is, in one question. I have it on, but only the faintest of clues what it is! Just git things, I suppose.
Hey little feedback on the terminal images in your posts. I'm viewing this on a phone, and it would be better if the terminal images were just the terminal (some are) and not surrounded by a large blank space which is your wallpaper. This would make it a bit easier to read on small screens, without the need to zoom in!
First off, I loved your presentation. And your book. As someone who actually bothers to read most of github's "Highlights from Git" blogs, that the, I was somewhat familiar with some of them, but it was still very informative.
Also liked your side-swipe at people who prefer rebase over merge, I'm a merge-only guy myself...
I also took a look at GitButler and it looks like it could potentially solve one of my pain points.
If you're looking for things which are confusing to beginners, for a future version of your book, there are many useful / interesting / sometimes entertaining git discussions/rants here on HN. One of the recent ones is:
I watched the FOSDEM talk yesterday, and I laughed hard when I heard "Who use git blame -L? Does anybody know what that does?" because it suddenly looked like the beginning of a git wat session. But it was really informative, I learned a lot of new things! Thanks
A part of Git's complexity is due to the fact that it was originally meant to be just the plumbing. It was expected that more user-friendly porcelain would be written on top of the git data model. Perhaps that is still the best bet at having a simple and consistent UI. Jujutsu and Got (game of trees) are possible examples.
It's a collection of hacky tools for manipulating a DAG of objects, identified by a SHA-1 hash. If you look at it this way, you wouldn't expect any consistency in the CLI interface.
I don’t think this is a fair characterization. The reason git is confusing is that its underlying model doesn’t resemble our intuitive conceptual model of how it ought to work.
This was classic Torvalds — zero hand holding. But he gets away with it because the way git works is brilliantly appropriate for what it’s intended to do (if you just ignore the part where, you know, mere mortal humans need to use it sometimes). I ended up writing my masters thesis a decade ago about the version control wars, and I (somewhat grudgingly) came away in awe at Torvalds’ technical and conceptual clarity on this.
> The reason git is confusing is that its underlying model doesn’t resemble our intuitive conceptual model of how it ought to work.
No. The reason git is confusing is that the high-level commands have very little thought put into them, they are indeed “a collection of hacky tools to manage a DAG of objects”.
That the underlying model shines through so much is a consequence of the porcelain being half-assed and not designed. The porcelain started as a bunch of scripts to automate common tasks. The creators and users of those scripts knew exactly what they wanted done, they just wanted it done more conveniently. Thus the porcelain was developed and grouped in terms of the low level operations it facilitated.
If you mean the plumbing part, I recalled it from memory. I don't have anything from Linus to back this up. But have a look at this from the Pro-Git book [1]:
> But because Git was initially a toolkit for a version control system rather than a full user-friendly VCS, it has a number of subcommands that do low-level work and were designed to be chained together UNIX-style or called from scripts.
Note that its author (schacon) is also the author of the article and is replying in this discussion thread.
I also remember reading somewhere that this design was the reason for the complexity in the porcelain. Will update if I find a reference.
Boy, I can't find this either (but also, the kernel mailing list is _really_ difficult to search). I really remember Linus saying something like "it's not a real SCM, but maybe someone could build one on top of it someday" or something like that, but I cannot figure out how to find that.
You _can_ see, though, that in his first README, he refers to what he's building as not a "real SCM":
Here is what I found based on your lead ("real SCM", from 17 Apr 2005):
> That was really what I always personally saw "git" as, just the plumbing beneath the surface. For example, something like arch, which is based on "patches and tar-balls" (I think darcs is similar in that respect), could use git as a _hell_ of a better "history of tar-balls".
So, I found the git-pasky project in the _very_ early days (like a couple days after Linus's first git commits) and iirc, it was an attempt to build an SCM-like thing over the plumbing that Linus was working on:
I wouldn't say it's very bold at all. I don't have any links but if you've been using git for the past decade, you would have heard something along these lines. "A toolkit for building VCS's" is one thing I remember reading. There was little in the way of polish when it came to porcelain commands when people started using it. I think there are still many people who don't use it who still think it's still this way.
this describes all of unix. as soon as scripts were allowed to use commands, those commands could never be changed. lest we have a nerd riot on our hands
Ha, replacement? You can't even get them to fix bugs. If you fix a bug in a unix command you'll break every script in existence and bring the world down. It's idiotic.
The user's a file! The internet's a file! Keyboard is a file! What are checkboxes? This is a volunteer project! You can't expect us to include UI in the OS! We'll just bikeshed forever so sorry, write your own, lol.
You clearly have no idea what you are chattering about. The saying "Don't break userspace" is for the kernel. It has nothing to do with userspace programs potentially affecting other userspace programs.
Please don't post personal attacks to HN, and please follow the site guidelines in general, including the one about not calling names, and also the one about not fulminating:
I found and fixed a bug in Debian’s vixie-cron where if the system time clock changed without restarting crond, it wouldn’t modify its runs until the next DST event.
This was well-received without complaint or concern for breaking people’s [insane] workflows that may be relying on that behavior.
This describes all of programming. They are called dependencies and they tend to be versioned. Breaking changes affect literally every aspect of software development. Software that isn’t maintained will no longer function at some point in the future.
> as soon as scripts were allowed to use commands, those commands could never be changed
That's not a script thing, that's an API surface thing, and even then only applies to backwards-incompatible changes. You can change the arguments to git or chmod just as easily as printf() or fork()
As I said here already, the difference is that scripts are interpreted, rarely if ever check what version they're running on before they attempt to do something, and the authors of the scripts have been explicitly encouraged to memorize a heapload of letter permutations and throw a thermonuclear systemd-sized fit if something changes.
I understand your sentiment but git is really not all that hard. And knowing a few things that go beyond bog-standard checkout/commit/push, especially history-rewriting activities, will greatly improve quality of commit-history - which might not be of much use for you but might help other engineers working on your project to make easier sense of what's going on.
And on another note, git is probably one of the longer-lasting constants in our industry. Technologies develop and change all the time, but for git, it looks like it's here to stay for a while, and it's probably one of the tools we interact with most in day-to-day dev-work. Might be worth having a bit of a look at :)
Isn’t that where most interest starts? A computer really is a tool. I know for me, it was an unfortunate discovery at the very start of my interest in computing that to do the things I wanted I had to deal with all these tedious bits of programming.
Even today I’d like to skip most of the underlying tedious bits although I understand knowledge and willingness to deal with much of those underlying tedious bits are what keep money flowing into my account regularly. That’s about the only saving grace of it. There are so many ideas I’d love to explore but the unfortunate fact is there’s a lot of work to develop or even glue together what one needs to test out, not to mention associated infrastructure costs these days. Even useful prototypes take quite an endeavor.
My feeling is that the git interface is a leaky abstraction. I also don't want to learn git tricks, but unfortunately I learned more about it than I wanted to.
> do not want to learn git tricks. I just wanna use it as simple as possible.
Simplicity is in the eye of the beholder. A single trick can save you a whole lot of work. Take for example interactive rebate which allows you to update your local branches to merge and reorder local commits. If you had to do everything by hand you would certainly have to work a lot more.
I had the same experience for a long time and then I took a bit of time to have a deeper look behind the curtain and I have to say, once you grasp the data-model of git itself (a branch is a pointer to a commit, a commit is a pointer with metadata to a tree, a tree is...), many of the commands start to make sense all of a sudden, or at the very least "stop looking dangerous".
As it's one of those rare tools that's probably meant to stay for quite some time and we interact with quite frequently, it was time well spent for me, and it turns out it's really not as hard as the scary-looking commands imply.
This generic statements can be said about basically any technology (MS MFC anybody? (L)DAP? IBM Websphere studio J2EE abominations?) if you are smart enough / have enough time to dig around. It doesn't help discussion at all (and plenty of folks complain about git all the time), since one can't avoid being branded as lazy/stupid if its not grokking this uber important yet trivial tool like me (TM).
But then there is Mercurial, used it decade and a half ago and it contained literally everything good about distributed model I could ever wish for, with maybe 50% of Git's complexity. Yet cargo-culting gonna cargo-cult, if Linus uses it so must we since we are not subpar and the rest be damned.
Yes sure its the tool to stay, and eventually can be learned well. But its design is far, very far from the most important software design principle (KISS).
> But its design is far, very far from the most important software design principle (KISS)
My suggestion was more to look at the underlying data-model, which really isn't that much harder to grasp than what your average undergrad datastructure course teaches. Git really does solve a rather complex problem in a quite elegant way - it just so happens that the packaging around it (the cli) is indeed a bit more controversial.
I can only speak for myself but once I started to look at git less like "a tool" and rather from a perspective of datastructure/algorithms etc. (which are inherently agnostic to how they're implemented), it started to make sense to me rather quick (a matter of hours actually, which is nothing compared to the countless of hours I've already wasted trying to find the right cheat-sheet-incantation before).
As long as you remember that the reflog exists (and it hasn’t run gc, but usually you immediately know when you’ve messed up), you’ll be fine. It’s exceedingly hard to break your repo beyond repair without trying to do so.
It's unfortunate that the weight of ecosystem and tooling (and the 800 point Microsoft-owned GitHub gorilla) has effectively locked the profession into using git. I don't hate it, I'm just keenly aware that a better approach is possible.
I wish someone with deep pockets would hook the pijul team up with the money and talent they need to make pijul a full-featured alternative with first-class hosting tools. The way it models change is principled and based on solid theory, and I'm convinced that a markedly better tool than git could be built on that foundation.
This is a complete non-sequitur. Whether I use a point-and-click interface or a CLI has nothing to do with the fact that I have to use a git-based workflow and can't just copy files to the server as a deployment.
Totally agree. However, then coworkers who don't understand even the simple git commands mess up their branches (somehow), and... then my git tricks save the day (unfortunately).
I don't totally disagree. I love Git and I find all these things very cool, but I know it's overhead a lot of people don't want. The post is on the blog of the new GUI that I'm trying to build to make the cool things that Git can do much faster and more straightforward, so maybe check it out if the CLI isn't your favorite thing.
Beyond a junior engineer, I’d expect an engineer to know more than the basics if they’ve been using git for their entire career so far.
Git is the power saw for software engineers. You don’t want someone who can’t keep all their fingers and toes anywhere near your code.
Not knowing git, when you’ve been interacting with it for years, is a red flag for me. I’m not expecting people to know the difference between rebase and rebase --onto, but they should at least know about the reflog and how to unfuck themselves.
Learnt something new about core.fsmonitor. Thanks.
On the subject of large monorepos, I wish "git clone" has a resume option.
I had this issue back in 2000s when trying to clone the kernel repo on a low bandwidth connection. I was able to get the source only after asking for help on a list and someone was kind enough to host the entire repo as a compressed tar on their personal site.
I still have this problem occassionally while trying to clone a large repo on corporate vpn that can disconnect momentarily for any reason(mainly ISP level). Imagine trying to clone the windows repo(300GB) and then losing the wifi connection for a short time after downloading 95%.
It is wild that both git and docker, the two major bandwidth intensive software of modern development stack don't have proper support (afaik) to resume their downloads.
I suppose you could do this by shallow cloning and then expanding it multiple times. But yes, the fetch/push protocols really expect smaller repos or really good inet connections and servers.
I read (and upvote) anything git related by Scott Chacon. He was instrumental in me forming my initial understanding of the git model/flow more than 10 years ago, and I continue to understand things better by consuming the content he puts out. Thanks Scott!
Id like to see anyone else solve the challenge of many people contributing code towards different releases, different features, hotfixes, tagging releases, going back to find bugs, with an "easier" interface.
It's like people who want a low level language that hides all complexity of the system - they are literally exclusive to each other. Im happy with git, its not that hard to learn, and some people need to just grow some (metaphorical) balls and learn git.
That's why I'm a huge shill for gitkraken. It's a paid product so I'm a little hesitant sometimes but I've used them all and nothing compares to the power it unleashes. It completely lifts the curtain on the black box that many developers experience in the terminal and puts the graph front and center. It exposes rebasing operations in an effortless and intuitive visual way that makes git fun. As a result, I feel really proficient and I'm not scared of git at all. I can fix just about anything and paint the picture I want to see by carefully composing commits rather than being at the mercy of the CLI. I still see CLI proficiency as a valuable skill but it's so painful sometimes to watch seasoned 10 yr developers try to solve the most basic problems or completely wreck the history in a project because they're taught you can't be a real engineer if you don't use the git CLI exclusively. Lately I've resorted to arguing "use the CLI but you should at least be looking at the graph in another window throughout the day - which you can do for free in vs code, jetbrains, or even the CLI"
For example: anytime one of my teammates merges a pr, I see it and I rebase my branch right away. As a result my branch is always up to date and based on main so I never run in to merge hell or drop those awful "fix conflicts" commits in the history.
I never really understood why the majority of developers insist on using the git CLI, when modern UI clients like GitKraken [0] are perfectly usable and very helpful. :shrug:
Knowledge of the CLI transfers to writing scripts. Knowledge learned from using Git in scripts transfers to day-to-day use.
Also, if I SSH into my Raspberry Pi that I'm using as server, I don't want to feel useless just because I'm forced to use a CLI.
I'm not entirely against using a GUI, it's just that at this point I'm more efficient using the CLI, and I don't want to spend effort searching for a GUI that is:
* Good-looking.
* Is native, not an outdated vendored copy of a web browser.
* Doesn't have telemetry, or at least it's disabled by default.
* Is fully open source; not open-core or proprietary.
* I can reasonably expect that it won't disappear 5 years in the future.
* That it doesn't make things more confusing. Like for example Visual Studio[1] having a button that very ominously says "Accept merge", when it really means "Mark conflict as resolved". If an IDE wants to use its own cute way of labelling things, good; I can accept more friendly terms that make it more approachable for a wider audience. But at least don't make things confusing for people that already expect certain words to mean certain things.
* That I can trust that it won't "helpfully" do fancy stuff, like having a button saying "Commit changes" that "helpfully" also pushes to remote. I don't know if any GUI does this, but my trust is low.
[1]: I had to use it at a previous company because it was the only realistic way to work with their codebase.
Shrug back at ya. I find the cli perfectly usable. I use an editor plugin as well but if I'm already on the command line I use it there. Having to switch to a different program just to make a commit kills the desire to commit often.
I do as little with git as possible unless im facinng some very specific issues, so for me at least it seems overkill to use a GUI for essentially just push, pull, and checkout.
Looks neat, but I tend to get way too distracted by graphical interfaces. I assume it's really a question of personal preference. CLIs are faster to use, but have a bigger learning curve. (we will probably not solve that debate here, but I do wonder sometimes, whether to recommend the CLI or not)
Most of my git usage on the CLI is nothing fancy, just a few commands, but I keep a text file for some tips/tricks I don't use regularly.
Thanks, I knew about -committerdate but not that you can set it as default sort, super useful. A few notes...
1. git columns gets real confusing if you have more data than fits the screen and you need to scroll. Numbers would help...
2. git maintenance sounds great but since I do a lot of rebases and stuff, I am worried: does this lose loose objects faster than gc would? I see gc is disabled but it's not clear.
3. Regarding git blame a little known but super useful script is https://github.com/gnddev/git-blameall . (I mean, it's so little known I myself needed to port it to Python 3 and I am no Python developer by any stretch.)
> git maintenance sounds great but since I do a lot of rebases and stuff, I am worried: does this lose loose objects faster than gc would? I see gc is disabled but it's not clear.
“gc” is disabled for the scheduled maintenance. It’s enabled as task when running “maintenance run” explicitely.
It would not collect loose objects faster than gc would, because it just runs gc.
Is there a way to do the reverse sort? A simple solution to branches going off screen seems to be have the latest branch last. That's what I do (although with a custom script).
I stopped pretending as if I know what I am doing and instead use visual Git tools, such as SmartGit or the one that comes with IntelliJ. Being a Git "command-line hero" is for show offs.
Porcelain can be just infuriatingly confusing. For example, "Yours and Theirs" can mean the opposite in different contexts. The whole user interface has no common style or theme - it needs a new "visual" layer in order to not drive one up the wall.
Same. Having the diff all the time is nice, and a visual check at a glance of what is about to happen is very nice, without the need to run a bunch of extra commands.
I know enough about the app, and git in general, to get my job done. On the rare occasion I need more, I can look it up. I think I’ve only had to do that once or twice in all the years I’ve been using it.
Same here. Personally for everyday tasks I always use a visual Git tool, specifically Tortoise Git.
For complex tasks, like fixing someone else's mess (or my own), I always start with a visual tool to look at the history and the commits, also look at the reflog (again, in a visual tool, it's much faster for me), understand what the mess is and if I can find anything to salvage, look at some diffs.
Then if it's just a commit I need to return to, I do a reset --hard. If I need to combine stuff from several commits, then I usually use the commandline.
These are not mutually exclusive. I share your sentiment in that I only use visual tools for diffs and conflicts and stuff, but I’ve gained a lot from learning about commit objects, reflogs, what a rebase does in the background, interactive rebases, hard/soft resets, etc
I’m a bit surprised to see this new article given Chacon’s recent popular comment here.[1] Although I guess I shouldn’t be since I noticed in his bio last time that he was working on something called “Git Butler”.
I'm pretty comfortable with Git for all my every day use. There's one place where I lose my bearings: its' the 'History Simplification' section of 'git log' man page. I'd love it if someone could "simplify" that section. I coudld have done it, but I ought to understand that first.
Tips and Tricks are sometimes the way for developers to throw jabs at each other on the 'workflow' battles that every dev team faces.
The bells and whistles are all there to run everything from a lemonade stand to a Walmart Super-Center. Hence the overwhelming complexity.
Just lately I started running 'git init' in all the new folders I create in my development box. Heck, is good to see whats going on everywhere you work, not only in the designated git repos. But going back to the well known complexity of the git API's I recall the the song that goes: "It takes a man to suffer ignorance and smile"
I have to admit that I learned a lot of these things fairly recently. The large repository stuff has been added into core piece by piece by Microsoft and GitHub over the last few years, it's hard to actually find one place that describes everything they've done. Hope it's helpful.
I've also had some fun conversations with the Mercurial guys about this. They've recently started writing some Hg internals in Rust and are getting some amazing speed improvements.
I'm also thinking of doing a third edition of Pro Git, so if there are other things like this that you have learned about Git the hard way, or just want to know, let me know so I can try to include it.