> So to relieve some of that tension and ease up the final merge you’re heading towards, you decide to perform a control merge now and then: a merge of master into your own branch, so that without polluting master you can see what conflicts are lurking, and figure out whether they’re hard to fix.
It is indeed useful, and just so you won’t have to fix these later, you would be tempted to leave that control merge in the tree once you’re done with it, instead of rolling it back with, say, a git reset --hard ORIG_HEAD and keep your graph pristine.
if you use git "linear history" style and will do a rebase at the end anyway, why not do "control rebases" immediately, instead of control merges?
At least from my experience, a rebasing on a branch that has substantially diverged can get painful quickly, however doing many small rebases and trying to keep up with master usually works without many problems.
I've long lived by the "rebase early, rebase often" mantra. Makes this sort of thing mostly a non-issue (tho rerere addresses orthogonal problems, like making the same fixes multiple times in a stack of commits). Besides, you should be coding (and testing) against as close as possible to the final state, as otherwise you discover problematic changes later, and have to re-do more.
Personally I slightly prefer merge-y history, as it's more accurate about history... but tooling tends to be a fair bit simpler when it's linear, so I usually stick to linear.
> why not do “control rebases” immediately, instead of control merges?
In this context the “control” in “control merge” is referring to not checking in the result. So if you don’t push the rebase, it would probably have exactly the same effect as a control merge... you’d save the conflict resolutions in the rerere database for later.
If you’re going to push the merge/rebase, then it depends on whether you’re working with other people in your branch. If you are, especially if it’s more than a few people, rebasing could inflict serious merge conflicts on everyone with any pending un-pushed changes. Rebase should be avoided in this case.
Once I read "this is ugly and pollutes your history graph" as a justification for any git feature, I'm immediately turned off.
What really is wrong with having a graph that actually reflects the real history of changes as they were made at the time?
Why not keep things simple and improve git-log so that anything you find ugly or that looks like pollution can be hidden while viewing the history of a repo? In this case, for example, would it not be far more ergonomic to add a "--hide-control-merges" flag to git-log?
This rerere feature described looks like a lot of the foot-guns git provides.
I know I'm in a minority here - I've given up arguing with collegues that it's possible to have a branch&merge work-flow without constant re-basing. Allowing the history graph to be manipulated and changed seems to have become part of the most popular git work-flows even though it's a recipe for serious pain in a distributed VCS.
(This isn't a criticism of the article btw - more of the git design/philosophy.)
Edit here at the top to clarify: I’m talking about rebasing your own commits before pushing. It sounded like the parent comment was talking about blanket use of rebase. I don’t advocate rebasing other people’s pushed commits, other than a few very exceptional circumstances.
> What is really wrong with having a graph that actually reflects the real history of changes as they were made at them time?
You’re inflicting irrelevant noise and cognitive load on yourself and other people. Noise in the history can also cause all kinds of trouble with merges and with bisect and other git features. A clean history is easier to work with.
Given that Linus advocates rebasing to clean the repo history, I’m curious where the whole “we must preserve the real history”, and dogmatic belief that messy history is some sort of “correct” git design/philosophy.
There is no “real” history. Commit order is arbitrary, especially between orthogonal changes by different people. Anything you modify later using rebase can (and is) done before rebase. There is no sacred set of events to preserve, and it has real and practical ramifications to let the repo get messy when you’re on a large team.
Why not embrace the idea of preserving the semantic history, the intent, as cleanly and clearly as possible, rather than focus on preserving arbitrary noise that doesn’t mean anything to you, let alone others?
> I’ve given up arguing with colleagues that it’s possible to have a branch & merge work-flow without constant re-basing.
It is possible, but it’s not desirable.
> Allowing the history graph to be manipulated and changes seems to have become part of the most popular git work-flows even though it’s a recipe for serious pain in a distributed VCS.
Would you elaborate on what pain you’re talking about? Rebasing is something that happens before push to master, if you’re using it properly, and that’s what the article here is talking about. If anyone’s rebasing the remote’s branches, that is bad, but that is not common practice.
Aside from that, you’re talking about git design and failing to acknowledge that rebase and manipulating the graph history is the design of git. Updating local history is a good thing, and it was designed that way intentionally.
I completely agree that rebase and manipulating the graph history is part of the design of git. But also you can use git effectively without manipulating the history (in a non-incremental way). And you don't lose anything in terms of branching and merging - in fact you gain in terms of safety and simplicity in a distributed system.
The argument I guess is about what you consider the cost of inflicting "irrelevant noise and cognitive load" on other users. I believe it's not significant and is mostly a function of the deficiencies of the history browsing tooling. On the other hand, I believe that the cost of making rewriting history part of the workflow is significant in terms of the load it places on users to learn features like rerere, for example.
If I've understood you correctly then I think neither of us can be right or wrong - it depends what each of us considers a greater "cognitive load" or cost which is going to be based on our personal experiences and preferences rather than something empirical.
It’s still not clear what you’re talking about exactly, will you clarify? Perhaps some concrete examples of the safety and simplicity gains you’re referring to?
Are you saying, and arguing with your team, that individuals should always branch and merge, even when they want to check in single commits, or small numbers of commits?
I don’t consider what happens before push to be “history” at all. Do you? I do consider what happens after push to be history, and for pushed commits, I agree, people should not rebase them.
Branching was designed for the isolation and safety of multi-person teams working on features that take non-trivial amounts of time. Arguing over what individuals should do when not working in teams in a branch is possibly a bit of bike-shedding. I don’t know if that’s what we’re doing here, but on the other hand there are some clues: rebasing is something people do to their own private branch before push, but not normally to other people’s commits after they’ve pushed.
I do think it’s noisy and unnecessary to have two commits for every single-commit change in master. A lot of teams specifically disallow that practice and require direct checkins to master for single commit changes, just to prevent the noise.
If you’re advocating never using rebase, and never rewriting your own history before you push, then I think that is a dogmatic approach that misunderstands the goals and tools in git and fundamentally mistakes git’s philosophy.
If you’re saying that people are rebasing your changes after you pushed them, then I agree with you completely.
I’m not calling anyone wrong. But this isn’t a disagreement over which way has cognitive load. You asked what was wrong with noisy history, and I answered. Your example of having to learn rerere only applies to the team lead, the person doing the merges, and it only comes up once in a while. Not everyone needs to learn rerere, and nobody needs to use it all the time. My example, noisy history, applies to everyone on the team at all times. On my teams, I prefer that people learn git well, and keep their histories clean.
Bit late to reply but regarding the advantages of not allowing rebase operations in a distributed system: if you can only change the version tree by adding nodes to leaves and by adding attributes to nodes (e.g. for merge arrows, tags, etc.), then combining distributed changed versions of a version tree is relatively trivial and users need never fear pushing and pulling.
Or more formally, there is a partial order on version trees (inclusion) and if all changes to version tree are monotonic with respect to this partial order, then it will always be possible to combine distributed versions without conflict.
An analogy might be that I prefer storing raw facts in databases, where practical and performing aggregations/filtering/etc. as queries during retrieval rather than storing filtered and aggregated data.
Typical rebase operations like squashing commits or removing merge history destroy information. Rather than decide up front what information is interesting, why not keep it and provide better tooling for filtering and aggregating commit information?
And what I'm suggesting here isn't radical - other VCSs offer branching and merging without requiring operations similar to rebase. Git itself supports such a workflow and many tutorials introduce git with a simple workflow that does not involve rebasing. I've used such systems in the past and because they had decent version browsing tooling, there was very little "noisy history" overhead. If you wanted a linear view of a branch like "master", you could view the history that way; if you wanted to drill into the branches, sub-branches and merges involved in a particular "master" version, you could do that also.
So yes, you are talking about individual branching & merging in order to commit a single change. And you are talking about never using the rebase command. Why? What are the tangible benefits to never using rebase on your own private commits? You didn’t give me any concrete examples. This seems exceedingly academic and it also seems like you’re either not understanding normal git workflow, or are trying to work in an ad for Pijul or something?
If you use git like I’ve proposed - which is how most people actually use it - by calling commits “history” only after they’ve been shared with someone else, e.g. pushed, then you get all the same guarantees that you are proposing, and the workflow is one of adding only leaves.
Rebase is something you usually do locally before adding nodes to the shared tree, and really has no bearing on published history, nor does it affect published history or change the order or monotonicity of shared events, because you never rebase already published commits. It is not only possible to use rebase in a non-destructive way, it is the most common workflow, and you’re arguing against that part in favor of something that offers zero advantages.
It’s not possible to never have a merge conflict. It is possible to do better than git, but if two people change the same file in the same location, you have a problem, no matter how many monotonic partial orderings you have.
> other VCSs offer branching and merging without requiring operations similar to rebase
Git does not require rebase for branching and merging, and you already know this because you’re advocating single commits in master from individuals get branched and then merged instead of rebased locally. Lots of tutorials don’t mention rebase because it’s irrelevant in the context, and because rebase isn’t required.
Be specific: what systems exactly are you talking about using in the past that had no rebasing and better tooling? There’s no discussion here without concrete examples.
To get the best of both worlds: You could use normal merges while working on a branch, and then when it's done do one rebase (which also removes merge commits) before merging back into master.
I don't see how rebasing often and keeping a linear history is in any way a worse workflow than letting merge bubbles happen. To the contrary, I've changed my preferences from the latter to the former because it turned out to be more readable and to make it easier to find commits that introduced bugs or contain badly done merges.
If there's more than one person working on the code base, rebasing is only safe if you restrict pushing/pulling in some way. Properly interleaving pushes and rebases is just more mental load and another way to shoot yourself in the foot with git.
And I don't see the point just to get prettier git-log output - I've never had an issue finding commits or diffing across more than one commit.
But like I said, I've given up arguing about it and have reluctantly embraced my inner rebaser. A small part of me still feels it's a regression.
If your team is rebasing already public history, then you’re right. Is that what you’re talking about? History should be cleaned up locally before pushing elsewhere. Once pushed, history shouldn’t be rewritten except in emergencies, and even then it should be discussed and communicated and everyone should agree a rewrite is better than more pushes to fix the problem. A case where a rewrite is preferable is when someone accidentally checks security keys into git, for example.
Rebasing before pushing should be embraced, are you sure that’s not the majority of what’s happening?
How so? And which part? I think it’s difficult to rebase already pushed history accidentally, because that requires a force push. You have to opt-in and take action to do it, so it’s not easy to screw up unintentionally.
Well you can say having to force is a solid barrier but you'll see plenty of comments suggesting that as well as plenty of comments stemming from having had to clean it up.
Sure, but that’s completely different from your hyperbolic analogy of looking down the barrel of an unloaded gun. Force push has a solid barrier, it is not default behavior, it gives you warnings, the manual gives warnings, and if you read anything online you’ll find many warnings against. If people still do it not knowing what they’re doing, yes it can definitely cause problems, but you can’t call it an accident or easy to screw up.
It's ultimately a tooling problem - Git doesn't have a good way of "tagging" feature branches to be ignored. It's concept of branches is limited - There's only places where there's more than one parent commit, and which is which is unclear. If you handle it badly, you end up with an unreadable history, littered with 'f' commits and junk, rather than having a nice linear history you can run regressions on.
I don't think this is a problem with Git, per-se - I think it's a problem with the tooling we use to interpret it, and the tooling we use to commit to it. But there's a lot of "valid but invalid" ways you can use it, and writing good tooling for different flows is hard, so nobody's invested in it a bunch.
There aren't any problems. If anything, it's a more honest reflection of history, whereas rebase will erase past current state and remove your ability to wind backwards to the way the world actually was.
If you view a neat ahistorical history as a primary artifact of development, then you need rebase.
Nothing wrong with the first graph. Avoiding history noise is a minor concern. Perhaps a more practical reason to use control merges without pushing them into the branch is to keep the branch completely stable, allow everyone in the branch to test their own work without being affected by churn in master. Then the integration testing can all happen right before merging back into master.
That’s a choice though, it’s equally valid and sometimes desirable to merge from master frequently, and keep the integration testing going, at the expense of some bumps for people in the branch.
The .git/cache is a submodule, so others can easily rebase also. And a lot of helper scripts to rebase and pull --rebase all work branches automatically. All current branches are constantly rebased to master, others default to pull --rebase (aliased to lb), with the shared. git/cache there are no conflicts for years.
I'm also doing this for several forks automatically, like adding patches and CI smokes to popular repos like clisp, libffi, openssl, pcre2, coreutils and many more. These are rebased hourly, conflicts appear maybe once a year.
Actually that's something i don't get (but i'm not a git pro by any means). The article doesn't seem be about the pain to rebase, but rather the pain of regular merging polluting the history. Regularely rebasing the branch on master doesn't pollute anything, keep the branch in sync with master, and makes for a happy final commit.
This is a good point. The author wanted to talk about rerere, and control merges are a great motivator for that. If he rebased the branch, then he wouldn’t be able to write about git rerere. :)
That said, rebasing a branch several times before merging back to master is something you can easily do if you’re the only person in the branch. When there are multiple people pushing to the branch, and when the branch changes are large, rebasing becomes impractical and even dangerous. You don’t want to rebase a branch while others are working in it, because it would require a force push.
I’d say rebase the branch when you can, which is usually if it’s a private branch and you’re the only person who made changes. Merge into the branch as you go if the branch is large and has multiple people. Commit the merges if you want people in the branch to do integration testing along the way. Use control merges and don’t commit them if you want to keep the branch stable and buffer the people in the branch from changes in master.
When you rebase a lot you might have to do the same merge operation multiple times, unless you change the settings like is shown in the article. The trick here saves the merge resolution locally so you can automatically merge again if you have to. I don't think this gets pushed so anyone else (or you but in a different repo) on that branch would also need to repeat the merge operation though.
If you save the merge commit, the merge itself is in the history so there's nothing to repeat.
This was a nice read; git rerere is one of the git features that has been a tad mysterious for me, I thought it was on by default.
I’m pretty sure I’ve used it, probably because I read somewhere, just like this article says, that it should be turned on by default. What I didn’t know and wondered while reading is how to clear a rerere, and it looks like you can “git rerere [clear|forget]”. https://git-scm.com/docs/git-rerere I have resolved a conflict and then discovered later that I screwed it up, so needed to do it again.
You are incorrect. This is only the case when the remote copy of the branch you intend to over-write with the force push has already been modified and became out of sync from your local copy of that branch prior to the rebase.
It is indeed useful, and just so you won’t have to fix these later, you would be tempted to leave that control merge in the tree once you’re done with it, instead of rolling it back with, say, a git reset --hard ORIG_HEAD and keep your graph pristine.
if you use git "linear history" style and will do a rebase at the end anyway, why not do "control rebases" immediately, instead of control merges?
At least from my experience, a rebasing on a branch that has substantially diverged can get painful quickly, however doing many small rebases and trying to keep up with master usually works without many problems.