Understand where author is coming from - but doesn't squash-n-merge (newish github feature) solve the issue of needing to rebase and the issue of having too many merge commits?
Squash-n-merge has nice property of removing unnecessary local information that probably doesn't matter at a meta level (commits are nice when reviewing PR, doesn't matter much later)
(squash-n-merge isn't new on github, unless you are not talking about the same thing I'm thinking about)
Yes squash-n-merge is often needed in github's PR workflow because no one need those un-bisect-able fixup commits in the final merged master/main branch, and also they make the diff between different states of the PR more readable, but it comes with its own problems.
Main problem is commit message. As the contributor (the one sending out the PR for maintainer to review), you have no control on what the final commit message in the merged single commit is. The maintainer doing the merge decides that for you, and by default github generates that message by combining all the commit message titles (the first line of the commit messages) of all the commits in that branch, and that's almost never the good choice for the final commit message.
Another problem with that is the email in the final commit. When the maintainer use squash-n-merge, github uses your default email on file on your github account, regardless whichever email(s) you configured your git to use and associated with those individual commits inside the PR.
As a result, squash-n-merge is more suitable for contributors less familiar with open source contribution, for example people not yet realized the value of a good, concise commit message, and people don't have different email addresses for different projects. For advanced contributors, there's no wonder they would prefer force-push with rebase-merge when they are making contributions on github, because rebase-merge makes sure the exact state of their final commit is preserved, including commit message, email address associated with it, and gpg signature if they use that. But github's rebase-merge strategy has its own issues, as described by the author and more.
That comes with all the problems with force push and rebase, bar the history during code review one.
For example this still has a commit message issue, just on the maintainer's side: As the maintainer if you are going to use rebase to merge this PR, that means you need to accept whatever commit message the contributor wrote as-is. Are you happy with that? If not, you can't even leave inline comments on that, and it's usually pretty hard to communicate and give feedback on how you want the commit message to be.
> un-bisect-able fixup commits in the final merged master/main branch
If you require PRs to create merge commits you get the nice world where git bisect --first-parent bisects at the PR level, you don't have to worry about the individual commits inside the PR/below the PR level when bisecting, but you still have that commit history "as-is" for deep archeological dives when you need it.
(And you can use --first-parent to cleanup git log and git praise too.)
And those commits rarely provide useful information because they're of the variety where people fix syntax errors, add missing files, remove changes they didn't mean to commit, etc.
There's plenty of information in all those types of commits even if you personally don't find that information "useful". I've had to do the sort of archeology digs to figure out "what syntax errors did a build tool miss", "why is this type of file often missed to be added, and how often do we miss it", "what was still TODO in this feature effort that got removed at the last minute", etc. All of which needs information from those sorts of "low level" commits.
In the instance where a file was missed and added in a later commit, then running git blame would show the sha1 referencing a commit that has a title that says something like "Added missing file". That's not going to tell me anything about why that file was added.
Instead, if you had a commit that explained what the file was for or if some of the lines in that file were added by a commit that explained the change and why it was made, then that would be useful history.
Many times, investigations start with running git blame on a file you plan to make changes to. The usefulness of commit messages associated with each line in a change and whether the diff associated with the commit shows a logical change rather than a fix for a syntax error is the difference between an investigation that leads to results versus one that leads to a dead end.
I already mentioned `git blame --first-parent` just a few comments up! You get the sha1 referencing a commit that has a title like "Merged PR #327". You can dig down deeper than that --first-parent level if need be, but you have the power of the git graph to show/hide details if when you do/do not need them.
Sometimes (often!) you want clean history in the upstream but also patches separated by bugs they fix, features they add -- issue/ticket numbers, whatever. And you may want regression tests to come before bug fixes, that way you can see the regression test failing, then the test passing after applying the bug fix. Different upstreams are likely to have different rules.
So squash-and-merge is a bad one-size-fits-all. Rebase is a much much better approach: you keep the history as submitted and you lose the useless merge commit. There's no "unnecessary local information" if the submitter did the work of cleaning up their history before submitting. That means doing interactive rebases locally to squash/fixup/edit(and-possibly-split)/reword/drop/reorder their commits -- this is something every developer should know how to do.
Squashing commits into one mega-commit isn't great for future investigations of the commit history (code review, bisects etc). It is much better to create separate logical commits, rebase them and pull in the result, either as a branch fast-forward merge, or with a merge commit where appropriate.
If I've correctly understood what you mean, I've wanted this for some time now. A way to preserve history while adding a single, linear integration of changes.
Sure, you can use git to do this, but the git killer will have it as an expected capability.
I also think that octopus merges are basically always a disaster because they can't be meaningfully reviewed and put your repo into an unknown state. Maybe there's some way to get the advantages of merge commits (preserve all history!) without the disadvantages (jumble all history!).
Yup, and you can use --first-parent to git bisect, git log, git praise to interact at the "macro-level" of those merge commits by default, and dive in to the fuller graph only as necessary.
a year from now, are you actually going to want to test each individual change in a pull request, or are you going to want to test it as an entire unit?
I agree that code review you want smaller units but my experience has been that 1-2 years later, you no longer care about the individual units and instead you want the entire patch/PR all together.
I'm pretty sure you want reasonable meaningful commits. On tiny projects it may not matter, but on larger projects it's definitely a huge benefit, because chances are you'll have to investigate a bug in that code, re-learn why it was done this way, etc. And maybe bisect the git history to find which exact commit caused the issue.
Which is why larger changes are often split into smaller patches that may be applied and tested incrementally. If you just merge the whole pull request as one huge patch / in merge commit, you just lost most of that.
I definitely will want to do that, especially when bisecting a random bug that was introduced with one of the changes in that PR. The smaller the unit of change the better, as long as they are logically separate changes.
I think it's more about: in a year from now will you understand the purpose of a change to some code you're debugging? If the commit says "merged PR 2234", answer is probably not.
• Setting `merge.ff=no` in git config to force merge commits by default.
• Creating a series of logical commits on `my-feature-branch`.
• Merging `my-feature-branch` into `main` with a bona fide merge commit.
• Using `git branch -d my-feature-branch` (NOT capital `-D`) to delete the feature branch safely and without worry, since `-d` only deletes the branch if the commits are present on HEAD.
• Using `git log --oneline --graph` to see a clean representation of the actual history.
A pull is just a fetch followed by a merge. So to solve this problem, just fetch instead of pull!
Then do `git merge --ff-only` and if it doesn't work, do the rebase or whatever else to resolve the conflict.
I did this long before I set `merge.ff=no`. I hate it when pull creates crappy graphs — it's something I try to help all my colleagues to avoid. I often wish that `git pull` didn't exist.
Why is `git pull` a "danger" if you always use `git fetch`? The configuration setting for merge.ff only affects the local machine. It doesn't generally impact other developers.
(Unless you're doing something like setting the system gitconfig on a shared dev box, and setting merge.ff to anything other than the default would be really heavy handed in such an envronment.)
Well, there are different views of "appropriate". When you do team development and everything is done via feature branches, it's nice to have merge commits so that the integrity of the each feature development effort is preserved via a merged branch in the history. If everything is flattened, it's harder to see where the branches (standing in for development initiatives) begin and end.
You can't always get fast-forward merges anyway. Long-lived branches with merge conflicts are undesirable but unavoidable in the long run. At least some of the time, you're going to have merge commits even when your "appropriateness" test says there shouldn't be one.
A good clean fast-forward merge of a single-commit PR is fine. But I've also worked at multiple jobs where every merge to the production branch created a merge commit and that's also fine. It adds a bit of complexity to the history graph, but it's not meaningless complexity.
If your commit history is majority single-commit PRs then having additional merge-commits everywhere would be noisy, so in that case it would be too much. I don't tend to work on actively developed projects that match that pattern, though. Most feature development involves multiple-commit branches.
The point of using a "proper" merge commit would be to avoid amending/rebasing the original commit and allow the original commit to live as-developed in the final branch.
The only thing changing in the original commit is to include a reference to the PR number in the commit message. There would be no change to the tree referenced by the commit.
It's an entirely different commit at that point. If work has already started in another branch based on the original commit (for whatever reason), it can cause merge problems down the road. Again, you are likely going to suggest that you can just rebase this other branch on top of the modified commit, but that's still sweeping possible merge commits under the rug, and again just because that rebase is usually automatic including that the tree references should be the same doesn't mean it is always automatic or doesn't have dangerous repercussions (including training junior devs to rebase often and giving them plenty of ammo for avoidable footguns).
I'm talking about a single commit PR in Github or Gitlab. If it's based on the latest version of the base branch, then amending it to include the PR number would allow Github to generate a link to the PR page associated with that commit. That would make the merge commit superfluous at that point.
So something like:
git commit --amend
and editing the commit message. This doesn't introduce any further change to the tree associated with the commit.
But because the commit has different metadata after amending, it now has a different SHA and is a different commit.
For illustration, a minor inconvenience of amending the commit is that `git branch -d my-feature-branch` no longer succeeds for the original branch, because it looks for the actual commit SHA, not the tree.
You may not care about the effects of changing the commit, but those effects are real and other people care.
Assuming you were the one who amended the commit before pushing it up to the remote, there's no reason that you would not be able to delete the branch because your local working copy has already updated contents of .git/refs/heads/my-feature-branch.
For those who have cloned the repo for testing, they can simply run git checkout my-feature-branch; git fetch origin; git reset --hard @{u} to get their local repo in sync with the remote.
So there's no reason that amending the commit will affect anyone until they branch off of the repo to do their own work. But that's nothing that a rebase can't fix.
Yes, of course there are workarounds; no matter what scenario you or I come up with, the other will be able to propose a different way of doing things. I chose a deliberately trivial example because I was illustrating a fundamental aspect of Git's design, not trying to stump you. But we're talking past each other.
But one of my comments (†) is the great-grandparent of your first comment on the subthread? (∆) And the concept of preserving commits precisely is fundamental to my comment two generations above that, the one about "nirvana" (‡) ?
Perhaps we would benefit from an `hn log` function which displays the linear parentage history for comments? (It would be easier to design that `git log` because every comment has exactly one parent, there are no `hn merge` comments.)
Or in your working copy has my authorship info been lost? That can happen if a committer uses plain old `patch -p1` to apply a diff from the mailing list rather than `hn am`. :D
> Or in your working copy has my authorship info been lost?
Well, none of the text that you originally wrote in the comment you're referencing wasn't preserved on the working copy. And, unless it's quoted, and one could search for when it was introduced by running git log -S"a line from your comment", no one is going to search for it specifically. IOW, the thread moved on :).
Squash-n-merge has nice property of removing unnecessary local information that probably doesn't matter at a meta level (commits are nice when reviewing PR, doesn't matter much later)