Understand where author is coming from - but doesn't squash-n-merge (newish gith...

fishywang · on March 30, 2021

(squash-n-merge isn't new on github, unless you are not talking about the same thing I'm thinking about)

Yes squash-n-merge is often needed in github's PR workflow because no one need those un-bisect-able fixup commits in the final merged master/main branch, and also they make the diff between different states of the PR more readable, but it comes with its own problems.

Main problem is commit message. As the contributor (the one sending out the PR for maintainer to review), you have no control on what the final commit message in the merged single commit is. The maintainer doing the merge decides that for you, and by default github generates that message by combining all the commit message titles (the first line of the commit messages) of all the commits in that branch, and that's almost never the good choice for the final commit message.

Another problem with that is the email in the final commit. When the maintainer use squash-n-merge, github uses your default email on file on your github account, regardless whichever email(s) you configured your git to use and associated with those individual commits inside the PR.

As a result, squash-n-merge is more suitable for contributors less familiar with open source contribution, for example people not yet realized the value of a good, concise commit message, and people don't have different email addresses for different projects. For advanced contributors, there's no wonder they would prefer force-push with rebase-merge when they are making contributions on github, because rebase-merge makes sure the exact state of their final commit is preserved, including commit message, email address associated with it, and gpg signature if they use that. But github's rebase-merge strategy has its own issues, as described by the author and more.

Ar-Curunir · on March 30, 2021

You can work around all of this as a contributor by squashing on your own end before the final merge.

fishywang · on March 30, 2021

That comes with all the problems with force push and rebase, bar the history during code review one.

For example this still has a commit message issue, just on the maintainer's side: As the maintainer if you are going to use rebase to merge this PR, that means you need to accept whatever commit message the contributor wrote as-is. Are you happy with that? If not, you can't even leave inline comments on that, and it's usually pretty hard to communicate and give feedback on how you want the commit message to be.

WorldMaker · on March 30, 2021

> un-bisect-able fixup commits in the final merged master/main branch

If you require PRs to create merge commits you get the nice world where git bisect --first-parent bisects at the PR level, you don't have to worry about the individual commits inside the PR/below the PR level when bisecting, but you still have that commit history "as-is" for deep archeological dives when you need it.

(And you can use --first-parent to cleanup git log and git praise too.)

u801e · on March 31, 2021

And those commits rarely provide useful information because they're of the variety where people fix syntax errors, add missing files, remove changes they didn't mean to commit, etc.

WorldMaker · on March 31, 2021

There's plenty of information in all those types of commits even if you personally don't find that information "useful". I've had to do the sort of archeology digs to figure out "what syntax errors did a build tool miss", "why is this type of file often missed to be added, and how often do we miss it", "what was still TODO in this feature effort that got removed at the last minute", etc. All of which needs information from those sorts of "low level" commits.

u801e · on March 31, 2021

In the instance where a file was missed and added in a later commit, then running git blame would show the sha1 referencing a commit that has a title that says something like "Added missing file". That's not going to tell me anything about why that file was added.

Instead, if you had a commit that explained what the file was for or if some of the lines in that file were added by a commit that explained the change and why it was made, then that would be useful history.

Many times, investigations start with running git blame on a file you plan to make changes to. The usefulness of commit messages associated with each line in a change and whether the diff associated with the commit shows a logical change rather than a fix for a syntax error is the difference between an investigation that leads to results versus one that leads to a dead end.

WorldMaker · on March 31, 2021

I already mentioned `git blame --first-parent` just a few comments up! You get the sha1 referencing a commit that has a title like "Merged PR #327". You can dig down deeper than that --first-parent level if need be, but you have the power of the git graph to show/hide details if when you do/do not need them.

u801e · on March 31, 2021

Does the --first-parent flag handle the case where the the line was change as part of a conflict resolution in the merge commit itself?

cryptonector · on March 30, 2021

Sometimes (often!) you want clean history in the upstream but also patches separated by bugs they fix, features they add -- issue/ticket numbers, whatever. And you may want regression tests to come before bug fixes, that way you can see the regression test failing, then the test passing after applying the bug fix. Different upstreams are likely to have different rules.

So squash-and-merge is a bad one-size-fits-all. Rebase is a much much better approach: you keep the history as submitted and you lose the useless merge commit. There's no "unnecessary local information" if the submitter did the work of cleaning up their history before submitting. That means doing interactive rebases locally to squash/fixup/edit(and-possibly-split)/reword/drop/reorder their commits -- this is something every developer should know how to do.

pabs3 · on March 30, 2021

Squashing commits into one mega-commit isn't great for future investigations of the commit history (code review, bisects etc). It is much better to create separate logical commits, rebase them and pull in the result, either as a branch fast-forward merge, or with a merge commit where appropriate.

earthboundkid · on March 30, 2021

When someone invents the git killer, it will have a feature called “subcommits” that will be blindingly obvious in hindsight.

oftenwrong · on March 30, 2021

If I've correctly understood what you mean, I've wanted this for some time now. A way to preserve history while adding a single, linear integration of changes.

renewiltord · on March 30, 2021

You get this by forcing merge commits for every non-single-commit change.

earthboundkid · on March 30, 2021

Sure, you can use git to do this, but the git killer will have it as an expected capability.

I also think that octopus merges are basically always a disaster because they can't be meaningfully reviewed and put your repo into an unknown state. Maybe there's some way to get the advantages of merge commits (preserve all history!) without the disadvantages (jumble all history!).

urxvtcd · on March 30, 2021

Not sure why you are being downvoted. You actually can use merge commits this way, by viewing the diff produced by

    git diff $merge_commit^...$merge_commit^2

when interested in whole change-set introduced by a branch or looking at individual commmits when interested in well, individual commits.

WorldMaker · on March 30, 2021

Yup, and you can use --first-parent to git bisect, git log, git praise to interact at the "macro-level" of those merge commits by default, and dive in to the fuller graph only as necessary.

properdine · on March 30, 2021

a year from now, are you actually going to want to test each individual change in a pull request, or are you going to want to test it as an entire unit?

I agree that code review you want smaller units but my experience has been that 1-2 years later, you no longer care about the individual units and instead you want the entire patch/PR all together.

pgaddict · on March 30, 2021

I'm pretty sure you want reasonable meaningful commits. On tiny projects it may not matter, but on larger projects it's definitely a huge benefit, because chances are you'll have to investigate a bug in that code, re-learn why it was done this way, etc. And maybe bisect the git history to find which exact commit caused the issue.

Which is why larger changes are often split into smaller patches that may be applied and tested incrementally. If you just merge the whole pull request as one huge patch / in merge commit, you just lost most of that.

pabs3 · on March 30, 2021

I definitely will want to do that, especially when bisecting a random bug that was introduced with one of the changes in that PR. The smaller the unit of change the better, as long as they are logically separate changes.

dboreham · on March 30, 2021

I think it's more about: in a year from now will you understand the purpose of a change to some code you're debugging? If the commit says "merged PR 2234", answer is probably not.

rectang · on March 30, 2021

Nirvana is:

• Setting `merge.ff=no` in git config to force merge commits by default.

• Creating a series of logical commits on `my-feature-branch`.

• Merging `my-feature-branch` into `main` with a bona fide merge commit.

• Using `git branch -d my-feature-branch` (NOT capital `-D`) to delete the feature branch safely and without worry, since `-d` only deletes the branch if the commits are present on HEAD.

• Using `git log --oneline --graph` to see a clean representation of the actual history.

masklinn · on March 30, 2021

> • Setting `merge.ff=no` in git config to force merge commits by default.

I'd rather `merge.ff = only` so git never creates a merge commit from under me. It's a big issue because of `git pull`, that thing should not exist.

Most git tools are wholly unable to deal with really merge-heavy graphs, too.

rectang · on March 30, 2021

A pull is just a fetch followed by a merge. So to solve this problem, just fetch instead of pull!

Then do `git merge --ff-only` and if it doesn't work, do the rebase or whatever else to resolve the conflict.

I did this long before I set `merge.ff=no`. I hate it when pull creates crappy graphs — it's something I try to help all my colleagues to avoid. I often wish that `git pull` didn't exist.

masklinn · on March 30, 2021

> A pull is just a fetch followed by a merge. So to solve this problem, just fetch instead of pull!

Of course, that’s what I do. But “git pull” is still a danger, and configuring merge.ff=only protects against that danger.

rectang · on March 30, 2021

Why is `git pull` a "danger" if you always use `git fetch`? The configuration setting for merge.ff only affects the local machine. It doesn't generally impact other developers.

(Unless you're doing something like setting the system gitconfig on a shared dev box, and setting merge.ff to anything other than the default would be really heavy handed in such an envronment.)

pabs3 · on March 30, 2021

I would only use merge commits when it is appropriate, like a commit series porting usage of a dependency from an old version to a new one.

rectang · on March 30, 2021

Well, there are different views of "appropriate". When you do team development and everything is done via feature branches, it's nice to have merge commits so that the integrity of the each feature development effort is preserved via a merged branch in the history. If everything is flattened, it's harder to see where the branches (standing in for development initiatives) begin and end.

You can't always get fast-forward merges anyway. Long-lived branches with merge conflicts are undesirable but unavoidable in the long run. At least some of the time, you're going to have merge commits even when your "appropriateness" test says there shouldn't be one.

pabs3 · on March 30, 2021

Do you at least agree that merge commits for single-commit PRs aren't "appropriate"?

rectang · on March 30, 2021

I don't feel strongly about the issue.

A good clean fast-forward merge of a single-commit PR is fine. But I've also worked at multiple jobs where every merge to the production branch created a merge commit and that's also fine. It adds a bit of complexity to the history graph, but it's not meaningless complexity.

If your commit history is majority single-commit PRs then having additional merge-commits everywhere would be noisy, so in that case it would be too much. I don't tend to work on actively developed projects that match that pattern, though. Most feature development involves multiple-commit branches.

WorldMaker · on March 30, 2021

Merge commits for single-commit PRs helpfully record which PR # was merged if you need to review/audit the PR sometime later, if nothing else.

u801e · on March 31, 2021

The original commit could be amended to include that information.

WorldMaker · on March 31, 2021

The point of using a "proper" merge commit would be to avoid amending/rebasing the original commit and allow the original commit to live as-developed in the final branch.

u801e · on March 31, 2021

The only thing changing in the original commit is to include a reference to the PR number in the commit message. There would be no change to the tree referenced by the commit.

WorldMaker · on March 31, 2021

It's an entirely different commit at that point. If work has already started in another branch based on the original commit (for whatever reason), it can cause merge problems down the road. Again, you are likely going to suggest that you can just rebase this other branch on top of the modified commit, but that's still sweeping possible merge commits under the rug, and again just because that rebase is usually automatic including that the tree references should be the same doesn't mean it is always automatic or doesn't have dangerous repercussions (including training junior devs to rebase often and giving them plenty of ammo for avoidable footguns).

u801e · on March 31, 2021

I'm talking about a single commit PR in Github or Gitlab. If it's based on the latest version of the base branch, then amending it to include the PR number would allow Github to generate a link to the PR page associated with that commit. That would make the merge commit superfluous at that point.

So something like:

  git commit --amend

and editing the commit message. This doesn't introduce any further change to the tree associated with the commit.

rectang · on March 31, 2021

But because the commit has different metadata after amending, it now has a different SHA and is a different commit.

For illustration, a minor inconvenience of amending the commit is that `git branch -d my-feature-branch` no longer succeeds for the original branch, because it looks for the actual commit SHA, not the tree.

You may not care about the effects of changing the commit, but those effects are real and other people care.

u801e · on March 31, 2021

Assuming you were the one who amended the commit before pushing it up to the remote, there's no reason that you would not be able to delete the branch because your local working copy has already updated contents of .git/refs/heads/my-feature-branch.

For those who have cloned the repo for testing, they can simply run git checkout my-feature-branch; git fetch origin; git reset --hard @{u} to get their local repo in sync with the remote.

So there's no reason that amending the commit will affect anyone until they branch off of the repo to do their own work. But that's nothing that a rebase can't fix.

rectang · on April 1, 2021

Yes, of course there are workarounds; no matter what scenario you or I come up with, the other will be able to propose a different way of doing things. I chose a deliberately trivial example because I was illustrating a fundamental aspect of Git's design, not trying to stump you. But we're talking past each other.

u801e · on April 1, 2021

I don't think we're talking past each other because you weren't involved in this subthread until your comment about using git branch -d.

rectang · on April 1, 2021

But one of my comments (†) is the great-grandparent of your first comment on the subthread? (∆) And the concept of preserving commits precisely is fundamental to my comment two generations above that, the one about "nirvana" (‡) ?

    [article]
     properdine
      pabs3
       rectang (‡)
        pabs3
         rectang (†)
          pabs3
           WorldMaker
            u801e (∆)

Perhaps we would benefit from an `hn log` function which displays the linear parentage history for comments? (It would be easier to design that `git log` because every comment has exactly one parent, there are no `hn merge` comments.)

Or in your working copy has my authorship info been lost? That can happen if a committer uses plain old `patch -p1` to apply a diff from the mailing list rather than `hn am`. :D

u801e · on April 2, 2021

> Or in your working copy has my authorship info been lost?

Well, none of the text that you originally wrote in the comment you're referencing wasn't preserved on the working copy. And, unless it's quoted, and one could search for when it was introduced by running git log -S"a line from your comment", no one is going to search for it specifically. IOW, the thread moved on :).