Squashing commits into one mega-commit isn't great for future investigations of ...

earthboundkid · on March 30, 2021

When someone invents the git killer, it will have a feature called “subcommits” that will be blindingly obvious in hindsight.

oftenwrong · on March 30, 2021

If I've correctly understood what you mean, I've wanted this for some time now. A way to preserve history while adding a single, linear integration of changes.

renewiltord · on March 30, 2021

You get this by forcing merge commits for every non-single-commit change.

earthboundkid · on March 30, 2021

Sure, you can use git to do this, but the git killer will have it as an expected capability.

I also think that octopus merges are basically always a disaster because they can't be meaningfully reviewed and put your repo into an unknown state. Maybe there's some way to get the advantages of merge commits (preserve all history!) without the disadvantages (jumble all history!).

urxvtcd · on March 30, 2021

Not sure why you are being downvoted. You actually can use merge commits this way, by viewing the diff produced by

    git diff $merge_commit^...$merge_commit^2

when interested in whole change-set introduced by a branch or looking at individual commmits when interested in well, individual commits.

WorldMaker · on March 30, 2021

Yup, and you can use --first-parent to git bisect, git log, git praise to interact at the "macro-level" of those merge commits by default, and dive in to the fuller graph only as necessary.

properdine · on March 30, 2021

a year from now, are you actually going to want to test each individual change in a pull request, or are you going to want to test it as an entire unit?

I agree that code review you want smaller units but my experience has been that 1-2 years later, you no longer care about the individual units and instead you want the entire patch/PR all together.

pgaddict · on March 30, 2021

I'm pretty sure you want reasonable meaningful commits. On tiny projects it may not matter, but on larger projects it's definitely a huge benefit, because chances are you'll have to investigate a bug in that code, re-learn why it was done this way, etc. And maybe bisect the git history to find which exact commit caused the issue.

Which is why larger changes are often split into smaller patches that may be applied and tested incrementally. If you just merge the whole pull request as one huge patch / in merge commit, you just lost most of that.

pabs3 · on March 30, 2021

I definitely will want to do that, especially when bisecting a random bug that was introduced with one of the changes in that PR. The smaller the unit of change the better, as long as they are logically separate changes.

dboreham · on March 30, 2021

I think it's more about: in a year from now will you understand the purpose of a change to some code you're debugging? If the commit says "merged PR 2234", answer is probably not.

rectang · on March 30, 2021

Nirvana is:

• Setting `merge.ff=no` in git config to force merge commits by default.

• Creating a series of logical commits on `my-feature-branch`.

• Merging `my-feature-branch` into `main` with a bona fide merge commit.

• Using `git branch -d my-feature-branch` (NOT capital `-D`) to delete the feature branch safely and without worry, since `-d` only deletes the branch if the commits are present on HEAD.

• Using `git log --oneline --graph` to see a clean representation of the actual history.

masklinn · on March 30, 2021

> • Setting `merge.ff=no` in git config to force merge commits by default.

I'd rather `merge.ff = only` so git never creates a merge commit from under me. It's a big issue because of `git pull`, that thing should not exist.

Most git tools are wholly unable to deal with really merge-heavy graphs, too.

rectang · on March 30, 2021

A pull is just a fetch followed by a merge. So to solve this problem, just fetch instead of pull!

Then do `git merge --ff-only` and if it doesn't work, do the rebase or whatever else to resolve the conflict.

I did this long before I set `merge.ff=no`. I hate it when pull creates crappy graphs — it's something I try to help all my colleagues to avoid. I often wish that `git pull` didn't exist.

masklinn · on March 30, 2021

> A pull is just a fetch followed by a merge. So to solve this problem, just fetch instead of pull!

Of course, that’s what I do. But “git pull” is still a danger, and configuring merge.ff=only protects against that danger.

rectang · on March 30, 2021

Why is `git pull` a "danger" if you always use `git fetch`? The configuration setting for merge.ff only affects the local machine. It doesn't generally impact other developers.

(Unless you're doing something like setting the system gitconfig on a shared dev box, and setting merge.ff to anything other than the default would be really heavy handed in such an envronment.)

pabs3 · on March 30, 2021

I would only use merge commits when it is appropriate, like a commit series porting usage of a dependency from an old version to a new one.

rectang · on March 30, 2021

Well, there are different views of "appropriate". When you do team development and everything is done via feature branches, it's nice to have merge commits so that the integrity of the each feature development effort is preserved via a merged branch in the history. If everything is flattened, it's harder to see where the branches (standing in for development initiatives) begin and end.

You can't always get fast-forward merges anyway. Long-lived branches with merge conflicts are undesirable but unavoidable in the long run. At least some of the time, you're going to have merge commits even when your "appropriateness" test says there shouldn't be one.

pabs3 · on March 30, 2021

Do you at least agree that merge commits for single-commit PRs aren't "appropriate"?

rectang · on March 30, 2021

I don't feel strongly about the issue.

A good clean fast-forward merge of a single-commit PR is fine. But I've also worked at multiple jobs where every merge to the production branch created a merge commit and that's also fine. It adds a bit of complexity to the history graph, but it's not meaningless complexity.

If your commit history is majority single-commit PRs then having additional merge-commits everywhere would be noisy, so in that case it would be too much. I don't tend to work on actively developed projects that match that pattern, though. Most feature development involves multiple-commit branches.

WorldMaker · on March 30, 2021

Merge commits for single-commit PRs helpfully record which PR # was merged if you need to review/audit the PR sometime later, if nothing else.

u801e · on March 31, 2021

The original commit could be amended to include that information.

WorldMaker · on March 31, 2021

The point of using a "proper" merge commit would be to avoid amending/rebasing the original commit and allow the original commit to live as-developed in the final branch.

u801e · on March 31, 2021

The only thing changing in the original commit is to include a reference to the PR number in the commit message. There would be no change to the tree referenced by the commit.

WorldMaker · on March 31, 2021

It's an entirely different commit at that point. If work has already started in another branch based on the original commit (for whatever reason), it can cause merge problems down the road. Again, you are likely going to suggest that you can just rebase this other branch on top of the modified commit, but that's still sweeping possible merge commits under the rug, and again just because that rebase is usually automatic including that the tree references should be the same doesn't mean it is always automatic or doesn't have dangerous repercussions (including training junior devs to rebase often and giving them plenty of ammo for avoidable footguns).

u801e · on March 31, 2021

I'm talking about a single commit PR in Github or Gitlab. If it's based on the latest version of the base branch, then amending it to include the PR number would allow Github to generate a link to the PR page associated with that commit. That would make the merge commit superfluous at that point.

So something like:

  git commit --amend

and editing the commit message. This doesn't introduce any further change to the tree associated with the commit.

rectang · on March 31, 2021

But because the commit has different metadata after amending, it now has a different SHA and is a different commit.

For illustration, a minor inconvenience of amending the commit is that `git branch -d my-feature-branch` no longer succeeds for the original branch, because it looks for the actual commit SHA, not the tree.

You may not care about the effects of changing the commit, but those effects are real and other people care.

u801e · on March 31, 2021

Assuming you were the one who amended the commit before pushing it up to the remote, there's no reason that you would not be able to delete the branch because your local working copy has already updated contents of .git/refs/heads/my-feature-branch.

For those who have cloned the repo for testing, they can simply run git checkout my-feature-branch; git fetch origin; git reset --hard @{u} to get their local repo in sync with the remote.

So there's no reason that amending the commit will affect anyone until they branch off of the repo to do their own work. But that's nothing that a rebase can't fix.

rectang · on April 1, 2021

Yes, of course there are workarounds; no matter what scenario you or I come up with, the other will be able to propose a different way of doing things. I chose a deliberately trivial example because I was illustrating a fundamental aspect of Git's design, not trying to stump you. But we're talking past each other.

u801e · on April 1, 2021

I don't think we're talking past each other because you weren't involved in this subthread until your comment about using git branch -d.

rectang · on April 1, 2021

But one of my comments (†) is the great-grandparent of your first comment on the subthread? (∆) And the concept of preserving commits precisely is fundamental to my comment two generations above that, the one about "nirvana" (‡) ?

    [article]
     properdine
      pabs3
       rectang (‡)
        pabs3
         rectang (†)
          pabs3
           WorldMaker
            u801e (∆)

Perhaps we would benefit from an `hn log` function which displays the linear parentage history for comments? (It would be easier to design that `git log` because every comment has exactly one parent, there are no `hn merge` comments.)

Or in your working copy has my authorship info been lost? That can happen if a committer uses plain old `patch -p1` to apply a diff from the mailing list rather than `hn am`. :D

u801e · on April 2, 2021

> Or in your working copy has my authorship info been lost?

Well, none of the text that you originally wrote in the comment you're referencing wasn't preserved on the working copy. And, unless it's quoted, and one could search for when it was introduced by running git log -S"a line from your comment", no one is going to search for it specifically. IOW, the thread moved on :).