I wrote my self a note about this last year to demystify the internal data structure [0], if you want to check it out. But the TLDR straight to rabbit holes; is that commits are a directed acyclic graph (DAG) [1]. And the file tree snapshot is a fully persistent trie [2]. And the commit DAG points to different snapshots. Branches are name shortcuts to commits.
GM Daniel Naroditsky's channel[1] is the best instructional I have seen. His speedrun series is very good. He takes the time to stop and explain his thought process while playing the game. And then after few games he does a deeper analysis and answer any questions. The moment that sold me was when he showed how they use chess engine to do preparations at their rating level.
On a side note, I have been working on a bughouse[2] chess web app in my spare time[3]. It's not ready for HN traffic yet though. Bughouse is not getting a Queen's Gambit anytime soon. But I would love to see more steam behind online bughouse as well. It is my favorite variation. The need for team work and communication makes it very lively.
Thanks for sharing. I too agree on this: "... Git is complex is, in my opinion, a misconception... But maybe what makes Git the most confusing is the extreme simplicity and power of its core model. The combination of core simplicity and powerful applications often makes thing really hard to grasp..."
If I may do a self plug, I had recently written a note on "Build yourself a DVCS (just like Git)"[0]. The note is an effort on discussing reasoning for design decisions of the Git internals, while conceptually building a Git step by step.
While this is nice, I think it should be emphasised that the blob-tree-commit-ref data structure of git is not essential to a DVCS. One of the disadvantages of everything being git is that everyone can only think in terms of git. This makes things like Pijul's patch system, Mercurial's revlogs, or Fossil's sqlite-based data structures more obscure than they should be. People not knowing about them and considering their relative merits has resulted in a bit of a stagnation in the VCS domain.
What makes it hard is that it’s taught wrong. All this pull/checkout/commit/push whereas for me it took a long time to discover that fetch/rebase/show-branch/reset/checkout —amend, and especially the interactive -p variants, are the core tools that really make it a pleasure to use. They give you flexibility and let you write and rewrite your story, whereas the commands you’re introduced with provide no control to the user. It’s remarkable the number of users who think you can’t rewrite a Git branch.
The rebase command is only safe if you never share a branch. Most people use a distributed revision control system to work with others and if you do work with others then rebase is dangerous and should not be used.
Almost every rebase user I've spoken with has no idea what the danger is despite it being clearly discussed in the manual page for rebase and despite rebase being listed as dangerous every time it is mentioned in any manual page.
For the sake of new users everywhere, please stop recommending rebase.
This is overly broad dissuasion. `rebase` should not be used if the branch is already shared and built upon by others, but is completely harmless to use on a feature branch.
In fact, it produces cleaner feature branches for review. Tracking the trunk branch with merges into your feature branch makes for a lot of noisy commits and difficult history to read through when the time comes to diagnose a bug. Rebasing, on the other hand, allows you to neatly put all the changes from your branch (and only those changes) into one or a few neatly-packaged commits.
Not rebasing does not affect reviewing a branch in the least, unless your diff software is seriously broken.
Comparing a branch to trunk shoudl only shows the actual difference. That you merged trunk multiple times shoudl have zero bearing.
The only way it could ever confuse anyone is if they review every commit and somehow fail to pass over merge commits.
The single most aggravating thing in git are its self-appointed super-users who /almost always/ properly use its power until one day they don't. Then they make life miserable for everyone else while we all "just wait, I'm fixing it".
Feature branches really should be rebased to have clean self-contained logical commits prior to merging, and this goes beyond just making it easier to review (although that is an important point). It's good git hygiene!
No individual commit should break the build; one reason is to keep git-bisect working well for future users bug-hunting, without getting stopped because someone didn't keep the commits on their dev branch clean prior to merging (N.B., a maintainer should also reject such PRs). And keeping commits clean usually means needing to rebase occasionally to organize the commits.
And each commit should be reviewed individually, in addition to the whole of the branch / PR.
Not to mention that each commit should be logically laid out, with well-defined changes and well-written commit messages. This usually means needing to rebase a branch when developing non-trivial features or bug fixes, to fold in review feedback.
But as mentioned elsewhere, generally on feature / dev branches, the expectation is that the commits are unstable, subject to change, and should not be built upon (without prior coordination, at least).
Master and stable release branches, on the other hand, should never change or be rebased.
Absolutely. Personally, I like to spam commits while I'm developing, just to make sure I keep track of what I've done while I'm iterating, but none of this is relevant to the final merge. Any individual commit on my feature branch may not even compile, but when that feature is applied to master, I want it to be a clean atomic commit that can easily be reverted or applied to other branches.
1. I didn't say this affects branch diffing, but rather trawling through history on a single branch.
2. "self-appointed super-users [...] [who break everything]" is a strawman and borderline ad-hominem. If you follow the guidelines I put forth, there won't be any issues collaborating with others.
Also, as a general note, it's actually very difficult to completely destroy information that's been committed at some point. If you're really running into issues with this, don't let fear direct you away from enjoying the greatest features of git. Experiment! Keep trying. Read a good git book (https://git-scm.com/book/en/v2). And learn to use the reflog. Everything you've committed is backed up for a long time even if you've removed all named references to those commits.
> I didn't say this affects branch diffing, but rather trawling through history on a single branch.
Exactly. What's the point of all the "oops, a typo" or "applying code review remarks, part III" commits. Just rewrite. This is the workflow you get eg. with Gerrit.
I strongly disagree. Git rebase is a very useful command, but of course you need to know what you are doing. Git becomes very weird if you try to avoid its core concepts... as far as I'm concerned, use rebase, get yourself cut (or better yet, learn about it before using it) and try not to repeat mistakes.
That said, git porcelain is simply awful. It is inconsistent and full of dangers - there is no way I would dare try new commands without reading up on them, because the names are often misleading. Sometimes I really wish GitHub / GitLab used Mercurial as their foundation. I think the world of programming would be much easier....
This is exactly what I'm talking about! Thank you!
There is /nothing/ "dangerous" about rebasing. You just don't rebase branches that are publicly shared without coordinating with the other users, so for some cases (like "master" of an open source project) you don't rebase.
But for your internal workflow, rebase is a KEY TOOL. It's how you write your story of commits. You can't just perfectly nail your commit history the first time you code, unless you are a genius. And what if you're working on a feature, but then you want to commit a certain series chunk of changes to master, so that other features can use that change. Rebase is how you do anything like this. It's core to using and enjoying the beauty of git.
That's true, but it's an obscure feature, not a convenient "undo" button; to the average insecure git user you might as well tell them that their work has been eaten by a dragon and they can recruit another dragon to get it back.
Reflog may not be known by the average git user, but I think it is pretty convenient - just call `git reflog`, copy the hash before the rebase, and checkout/reset to that hash. Doesn't get much easier than that.
I rebase pushed branches all the time if I’m the only one who is working on them, even for repos with multiple team members working on them. In almost a decade I can’t think of a single time this has caused problems.
Yes, don’t rebase branches with multiple authors doing parallel work. But I can’t think of many times I’ve even had to work on a feature branch with multiple authors.
In a gerrit flow, you have to use it quite often, but only on the detached micro-branches that gerrit forces you to use. And the UI provides a nice convenient rebase button.
In a more traditional "trunk" flow, where you're pretending it's svn, you probably want "pull.rebase=true", otherwise you generate a lot of entirely spurious merge commits. This really confused me the first time I used git, long ago.
The "dangerous" case is, as you say, using it to rewrite history that has already been pushed (heresy!). Generally the system will warn you that this requires "--force", at which point you need to stop and think about what you've done.
(It took me a long time to overcome my feeling that history rewrites were inherently wrong - defeating the point of a VCS in some way. I've adapted to it somewhat with the "neatening things up before submitting" view, which took a while to learn. In a traditional VCS there's only one view of the code and everyone shares it.)
In a lot of workflows I've used, we _always_ rebase against <parent branch> before pushing a PR, the virtue being that the history is cleaner. I definitely don't think it deserves the huge "NEVER USE" sticker you're slapping on it. I use it multiple times a day.
[0] https://sransara.com/notes/2019/build-yourself-a-git/ [1] https://en.wikipedia.org/wiki/Directed_acyclic_graph [2] https://en.wikipedia.org/wiki/Persistent_data_structure