Hacker News new | past | comments | ask | show | jobs | submit login
Things I wish Git had: Commit groups (blog.danieljanus.pl)
289 points by nathell on July 3, 2021 | hide | past | favorite | 193 comments



I agree with the author's sentiment. The way this problem is typically framed is as a dichotomy between preserving a "true" history of what really happened on the micro/commit scale VS presenting a "clean" history that makes the story of the change easy to follow on the macro/PR scale.

This is a false dichotomy. I'm greedy, I want BOTH. Give me story mode when I'm just browsing the repo, but offer me the option to switch into commit-by-commit mode when I want more detail.


Git does provide this feature :)

For a view of commits that shows when they entered the current branch, rather than when they were authored (i.e. group all commits from the same PR together in the history):

  git log --topo-order
For only viewing merge commits (as if you were using a squash-and-rebase strategy, but without having to rewrite history at merge time):

  git log --merges
Assuming you are using branches as groups of commits — kind of the point of a branch — and you're using merges with --no-ff (which is the default on Github for the Merge button — https://docs.github.com/en/github/administering-a-repository... — and is necessary in this scheme to prevent fast-forward "merges" that would mess up viewing the merge history) rather than squashing and rebasing, `git log` shows you the "true" history, `git log --topo-order` shows you the chronological history of when commits entered the main branch, and `git log --merges` shows you the zoomed-out "clean" history of PR merges.

Many people don't know about these features because Github doesn't have an option to view the log that way: only git does. TBH I wish Github had offered a --topo-order and --merges selector to their commit log view before offering a myriad of PR merging/rebasing options: git provides a wealth of solutions to viewing commit history without forcing you into destroying parts of it with squash-and-rebase.

Edit: IMO, part of this issue is because of git's bad default log view. The --topo-order view is a much more useful view for reading the history of the repo (both for knowledge building and for debugging) than the default chronological-by-commit-timestamp view; typically you don't care what date a commit was authored on, as much as you care when it entered the main branch. I can't think of a single time I've actually wanted the default log view, and yet it's the default, and thus it's what Github shows.


Yes! Thankyou for that. I knew about some of the flags discussed on this topic but not --topo-order.

For my new project I've already decided some months ago to use merges with --no-ff when merging in work by other people, along with a kernel style "everyone gets their own repo" model.

A lot of the discussion around Git workflow is basically noise created by the weak and hardly ever improved GitHub UI. True. They prefer building AI to building better log views on their core product. But, even much more advanced Git clients like the one in IntelliJ don't properly expose all the different ways of looking at git logs, and git has so many flags and options that it's nearly impossible to know them all. Even if one day you spend the time to read the user guide from cover to cover, newer versions will add new features and you won't see them. If the GUIs were better, I suspect all these discussions would quickly dry up.


I've often thought this too, you shouldn't have to rewrite i.e. lie about what happened. Git history ideally should be completely immutable but there should be a view that tidies up what happened for those that like to see individual features/bugs/hotfixes all listed in history in a nice way.

I don't like the name thought, commit groups seems odd, I prefer feature view or something else. The reason is commit groups sounds to me like groups of people who are allowed to commit.


The ugly truth though, is that the process of programming can be shamefully bad. Often times the in-progress branch commits are the opposite of the platonic ideal commit message, for common human reasons. By 'common human reasons' I mean programming is messy, and it's often unclear why something does or does not work until after something functional is arrived at.

The important thing is that your tools, specifically git, works for you, and not the other way around, so 'bad' commit messages like 'before', followed by 'after' are totally fine while working in the branch as long as it gets cleaned up during merge. However, they are the antithesis of useful months or years down the line while playing code historian. (Hilariously, the answer to the question "what idiot wrote this code", is sometimes the reader!)

So is it a lie that sometimes after lunch on a Friday, the act of programming is often a series of off-by-one, off-by-two, off-by-one-the-other-direction compile-test-edit-commit loops? And that we'd prefer to be thought of as a genius that wrote some fundamental-to-the-company code 5-10 years later, with beautiful commit messages that live up to some platonic ideal, rather than "that one dumbass"?

It's a lie the same way that people who wear makeup are 'lying'. It's true under a very specific, weird framing, but it doesn't really agree with reality.

There are notable high profile exceptions like publicly viewable patch series against the linux kernel, but you're deluded if you think those aren't edited before being released for human consumption.


This is exactly right.

It encourages people to Commit Whenever They Want To, secure in the knowledge that people will never see my -- I mean their -- feeble intermediate attempts at working code. Committing frequently is good. It means that reflog will often have interesting stuff in it, bisecting your feature branch might have a decent chance of finding obscure bugs discovered during development, etc. etc.

It's a godsend once you truly Get It that all that ugly intermediate nonsense can be removed before merging. I suspect that people who advocate against the "rebase+rewrite" philosophy do Not Get It and work in a way where even their branch-local commits are pretty meticulous and tested, etc. Nothing against those people, but they can still work fine in an environment with rebase+rewrite being the default.

I also have literally NEVER heard a good argument for keeping the full commit history (with WIP commits, etc.). There is nothing to be learned from it unless you're specifically investigating peoples' commit habits.

ETA: Some IDEs have a Local History thing where you can basically see all versions of a local file (snapshot every 60s or whatever). Do you want that in your repository history? I don't think so.


> I also have literally NEVER heard a good argument for keeping the full commit history (with WIP commits, etc.).

Maybe their company ranks employees by number of commits?


That is still not a good argument.

I do feel for these people though. Depending on their company they could try and fight it and change it (can be possible depending on things like company size, is this being introduced or well established, your influence level w/ the deciders etc) or simply and RUN and never look back.


I think the person you replied to might be jesting :)...

... but, yes, if you find yourself in that type of situation and powerless to change it[0], move on.

[0] "Maybe give it a couple of tries. If that doesn't change anything, give up. No reason to be a damn fool about it." (Paraphrasing Churchill, I think? Anyway, not claiming originality.)


> There are notable high profile exceptions like publicly viewable patch series against the linux kernel, but you're deluded if you think those aren't edited before being released for human consumption.

As they should, and tons of them certainly have gone through draft version with "garbage" commits on temporary git branches on the computers of their author. The thing is, that kind of draft version before you even want to communicate with other humans (or at least a high number of them) is rarely useful for long term understanding of the history, so it is extremely useful to have a somehow cleaned-up history in project. That you do some cleanup just before submitting a series by email is just a detail: and even then the rewriting is arguably even more pronounced than most other project because your first version of a complex change is likely to be rejected because the maintainers want some improvements on some aspects, so you submit a second (then maybe a 3rd, etc) series and most of the time you rewrite the history in each (and not just add a new patch on top of them)

So I don't think this is even an exception, on the contrary! Git history should not change on branches that are widely used, like the branch of Linus for the kernel, or maintainers, etc. For work in progress used by a single dev or highly synchronized two or three persons, maybe even skipping some processes (e.g. code reviews) while doing it at first, I better not see that in the "clean" history of the project because 99% of the time this has very little value and extremely high noise.


> So is it a lie that sometimes after lunch on a Friday, the act of programming is often a series of off-by-one, off-by-two, off-by-one-the-other-direction compile-test-edit-commit loops?

No, exactly the opposite: it's a lie to pretend that you can write perfect code the first time, and it does no-one any favours in the long run: not yourself, and certainly not those who come to learn from you in the future. And it's not normal or expected the way makeup can sometimes be; junior programmers will be genuinely deceived and this will cause real harm.

Keep the history, warts and all. Best case someone might learn something from it. Worst case you're no worse off.


The “keep everything” approach conflicts with the “have clean commits” approach. If we keep everything, instead of “Build A; Build B; Build C” we wind up with “Build some of A, B, and C;” * 8.


If you're never rewriting history then branching and merging become cheap, so it's easy to do each of A, B and C on its own branch.


But again, like the parent comment said, if you’re never rewriting history, that means you’re stuck with how the code was actually developed, which frequently is not that clean. Many people don’t write three separate features most of the time, they write three features together and at the same time.


If the real history is that you developed all three features in an interleaved way, isn't that more useful (e.g. for bisection) than a fictional history that you haven't actually tested? Most likely your cherry-picking/rebasing won't be perfect, so you'll still have parts of A mixed in with B and C and you'll have things like one commit depending on changes from a future commit. The history of how the code was actually developed might be "messy" but it's more likely to at least compile (because presumably you were compiling it from time to time while you were developing).


> isn't that more useful (e.g. for bisection)

Why would my messy history be useful for bisection? The places I committed, it may not have even fully compiled except for the very last commit. In that case, to separate the code in a useful way (such as 3 commits, one for each feature, each of which compiles on its own), you'd have to do a bit more work and create new commits, which again means either rendering the original commits pointless or disregarding them.


Surely your test-edit cycle involves at least some compiling. Maybe not every commit will compile, but most changes that compile will have a commit. At the very least a "real" commit has a much higher chance of compiling than an "artificial" one that you constructed retrospectively.

If you really do make most of your commits not compile then I can sort of sympathise with squash-merging, but if you merge then worst case it's a one-liner to bisect while only looking at "mainline" history (i.e. only the merges to master, the equivalent of what you'd get if you'd squash-merged), whereas if you squash-merge then there's no way to bisect back through the original history.


Compiling vs. not-compiling is only the tip of the iceberg; there’s a lot of other aspects of my development that don’t make sense until the very end. Arguably, the whole feature is basically useless until it’s finished; why would I keep working on it in any meaningful way after it is complete and running? If that happens, chances are that the ticket wasn’t atomic enough. There are exceptions of course, such as substantial rewrites for bug fixes, but those are by nature not the norm. As a result, my commit messages are only for me, which saves time in development. Commit messages “wip” and “working now” mean something to me, but definitely hold no value to whoever is doing a git blame in the future, which is another benefit of squashing.


If it compiles then I can use it in an automated bisect, which is the main thing VCS history is useful for IME. I'm a big believer in "refactor mercilessly" and "make the change easy, then make the easy change", so while obviously the final feature will not be working in the intermediate commits, the work will touch on other code areas and there's always the possibility that this will introduce a subtle bug that slips past the current test suite, and if that should happen then I want to be able to bisect down to the smallest possible diff before I start trying to understand it manually. I also find that a small commit with a useless message is actually a much more useful blame result than a big commit, even if the big commit contains a detailed explanation of the overall change.


> I also find that a small commit with a useless message is actually a much more useful blame result than a big commit, even if the big commit contains a detailed explanation of the overall change.

That's pretty interesting. I know with me, that is definitely not true, because 90% of all commits would just be the message `wip` which makes Git Blame incredibly hard to use.


What are you trying to get out of the blame? I do sometimes git tag --contains to find the overall feature that the blame-output commit was part of, but most of the time the most useful thing is just to see the diff for that commit or frankly even just the list of files it touches.


Much of the time it’s asking what the motivation behind a line of code is, such as why we take some crazy convoluted approach to what seems like it should be a simple task. Editor plugins such as Git Lens display the blame output so it is much more convenient if that information is in the commit rather than in an associated tag.


Git history should not be kept immutable.

Git release branch history should be kept immutable, because this is a way to see how things were in past, for troubleshooting, for ensuring that the code under source control is actually the code you deployed, and / or the code your downstream depends on, etc.

On your feature branch, you can do anything, as long as you can later cleanly merge with the main branch from which releases are cut.

I'd say it's a completely normal practice to rebase, split, and fixup your commits on the feature branch, in order to present a clean picture to the reviewers, to easily show that a new test catches the error in the old broken code, and the new fix actually makes that test pass, etc. Nobody cares about what happens on your feature branch but you. Its commit history is not holy, it's a tool like other tools. Somebody depends on oncoming progress of my feature branch while developing their own? Well, `pull --rebase` regularly, same as you do with the main branch.

Squash that history during the merge to the main branch. Do not delete the feature branch, uncheck that checkbox in Github repo settings. Clean history representing completed features: check. Detailed explanation of development in code: check.

On the topic of the article: to my mind, "commit groups" can be sufficiently well implemented as branches, or as tags and ranges of commits between tags. For some very complicated cases, they can be implemented as actual text tags inserted into commit messages.


The whole point of using a DVCS is to be able to publish and pull from each other's branches. If your feature branches are private, or you have to check with someone before pulling from their branch (which you do if they're in the habit of rebasing), then you're missing out on most of git's value.


This is an absolutely ridiculous strawman. Just because not every branch is a shared branch doesn’t mean you aren’t getting “DVCS value”.


I've been a programmer for long enough to have used SVN seriously. It really wasn't so different - it honestly did feel much the same as when I've worked in places that used a rebase-heavy git workflow.


This both applies and does apply to the case at hand.

Let's call it differently: dabble branch and share branch. It's the share branch where you interact with others. You are not bound to have exactly one release branch, and often you don't, when you backport stuff to older releases. But this is a branch you keep in order because you share it with others.

Your dabble branch is your playground. You can do weird things, make stupid mistakes, fix them, etc. You do not share that branch with others much, except to let them see its current state. They do not depend on it, and not expect it to be nice.

When your portion of work is done, and you (maybe several of you) want to share it with other collaborators, not involved in the process of your dabbling, but interested in the result of it, you may choose to clean it up. You can reorder commits into logical spans, and meld them. You can split a commit that does two unrelated things, and describe each separately. You get rid of all the noise (if you produced any), and form a nicer picture for your collaborators to review and understand. You do it because you care about their time and sanity.

Then you merge the result of your dabbling into the share branch, squashing commits into one. This keeps the history of the shared branch(es) observable. If anybody wants to step back, they have your original dabble branch, which you now abandon and create a new one.

Dabble branches should be short-lived, a couple of days. You can have many long-lived share branches for features that take long to develop, etc. Share branch history usually does not need cleaning up, so there's usually no point to rewrite it. It allows to merge it periodically with other share branches, if any.


You should hopefully be talking to each other at least every couple of days - the real advantage of having visibility of each other's branches comes when you use them to share work on a much smaller timescale.

As for caring about your reviewers' time and sanity, rewriting commits that they may already have seen is the opposite of that IMO. Any decent review tool will let you review a single combined diff for the whole branch, and that's what a reviewer who hasn't been following your progress will use. Meanwhile if a reviewer did happen look at your branch yesterday, taking away their ability to view just the changes since then is doing them no changes. (This is especially true when it comes to applying changes from review feedback - if I requested a couple of small fixes then I want to review a commit where you made those small fixes, I don't want to have to re-review the whole PR because you rebased)


*doing them no favours. Sorry for the writing mistake.


I want to be able to preserve two parallel commit histories: one where the the commits are ordered by time, and another where the commits are ordered by 'story'. Git could cryptographically verify that the end-states of the two histories are identical, and allow me to alter the storied history at will (shifting hunks between commits, splitting/combining commits, reordering commits etc), where during merge both histories are preserved.

I don't really follow your naming critique though. "commit groups" seems like a fine name, they are groups of commits. What you describe I would call "committer groups".


What is your use case? How do you develop so that your story does not belong to one (or more) feature branches where it can be held, unmixed with other stories? Can your story continue through multiple merges to the main / release / whatever branch(es)?

I'm asking totally unironically; every company's flow may be different, and for good reasons which I'm oblivious about. So I'd gladly read if you had time to explain.


Something like --date-order to view commits chronologically vs --topo-order to view them topologically sorted?


I used to think this way. Now I think that we as programmers already suffer from too much information.

I spend considerable time these days making my commits as clear and readable as possible, and between rebasing, squashing and amending messages, it's quite likely there there might be dozens of intermediate commit IDs for every single commit ID that enters the repo.


It's pretty easy though. Leave all the commits. Put meaningful messages in your merge commits, and your clean macro history is just merge commits and the messy one is the rest.


Merge commits prevent having clean history, because they lie to you. If you "git show" a merge commit, it hides almost all changes because they've been automatically merged, but it doesn't follow that they were merged correctly. This can cause things like gotofail.


You can show the full changes by one of these:

> git show --first-parent COMMIT # perspective of target branch = branch you were on when merging

> git show -m COMMIT # perspective of both branches

Lengthy explanation and more ways to do it: https://stackoverflow.com/questions/40986518/git-show-of-a-m...

Also the following works in my experience and is more easy to memorize because the command literally says what you want to know:

> git diff COMMIT_BEFORE_MERGE..MERGE_COMMIT


I think the scenario you've described assumes that the branches have diverged -- that each branch has a commit not on the other. But if you always rebase your branch onto `develop` before merging, your branch definitively has no conflicts. If you then use `--no-ff` to merge, it will create a merge commit anyway.

This article shows what I mean: https://euroquis.nl/blabla/2019/08/09/git-alligator.html . (The only thing missing is the rebase before merge, since the example image shows a case where a merge could have hidden some changes.)


If you do that then, the diff in git show for the merge commit will always be empty. But I think it's a solvable problem, there just needs to be a better/easier way to say, show the diff between the one side of the merge (usually the first/left) and the result of the merge when showing a merge commit. Or maybe there is an option I'm not familiar with.

That said the rebase followed by --no-ff merge is my preferred approach as well.


If [commit] is the merge commit for the feature, then this should work:

    git show [commit]^1..[commit]
The ^1 incantation means "first parent", so in the feature branch style I'm describing, this would show the changes on all of the commits for the feature merged at [commit]. You can use `git diff` instead of `show` in the same way, if you just want the cumulative diff.

It seems to work on the repository I've been contributing to, as it shows each of the commits that were part of the feature branch. The merge commit itself is empty, as desired, avoiding the up-thread concern of unreviewed auto-merged code.


I want a clean merge history I can run git bisect against. The utility of git bisect is much diminished when the commit history is full of broken commits. It works best when all the commits successfully run the test suite at the time, which is perhaps not the highest standard of "clean commit" we could ask for but my impression is that it puts it fairly high vs. the "commit everything" criterion.


> I want a clean merge history I can run git bisect against.

"git bisect" supports "--first-parent" too.


It doesn't matter what git supports if I'm bisecting into history where the tests don't run cleanly on every commit. Now I don't know whether the test case I'm bisecting with is broken because it's revealing the bug or if it's broken because the commit is broken. git bisect is literally mathematically useless if you get even a single --good or --bad wrong.


All of these arguments are of the flavor “git supports —-flag-1%-of-users-know so that works.” But there’s a nontrivial cost to teaching all of your users a non default workflow, which you have to pay for every new person that comes along.


Doesn’t this require that you get everything right before that merge commit?

I often do this, but then have some integration related change that’s required after.


The change after would just be a new branch and then a new merge. First merge is adding the feature second merge is the bug fix.


A pragmatic middle ground is learning fixup and autosquash and making the story mode before merging the code.

Your story mode == commit mode provided you can divide at a good enough granularity. This is generally not a problem though.


I use the semi linear history function of gitlab: it's like rebase & merge, but doesn't fast forward (ie it always creates a merge commit)

This way to see feature per feature I do git log --merges and to see the commits git log --no-merges


I use merge commits as the “clean” history and non-merge commits as the “true” history. Are there problems that this approach doesn’t solve? My only gripe is that apps like GitHub don’t give me the option to display commits how I want, but that’s a problem with inflexible tooling in general.


Couldn't you get this with squash and merge, along with not deleting merged branches? The main branch would have the story, and you could go into the actual squashed branch to get the actual commit history?


That creates a mess. It's not clear which remote branches are still active, and can be difficult to link the branches back to the merges historically for hunting bugs.

The best way I've seen by far: Prepare a fast-forward merge, then merge it with --no-ff. You end up with a linear history of commits grouped by the merge commits, can see either view in git log using --first-parent or not, and bisect can find the actual commit when needed.


This might be a really nice trick. Use the technically meaningless "merge" commits to deliniate the 'commit groups'.

It limits how commits are grouped a bit. But that limitation might be good to prevent feature creep.


Does this have some kind of distinct result from rebasing the branch before merging does? I'm not thinking of anything that would be different, so I'm not sure if using the more obscure command (`git merge --no-commit` vs `git rebase`) is for a specific reason.


Did you mean to respond to someone else? I never mentioned --no-commit, I'm talking about rebasing if needed - set up a fast-forward merge, which may or may not need a rebase.

This top-level comment shows what the commit history looks like doing what I said: https://news.ycombinator.com/item?id=27723435

I also didn't just say rebase because that could also mean squashing commits manually and isn't what I'm talking about.


No, I've just never heard rebasing a branch described as "setting up a fast-forward merge." It implies doing something very different to me from a straightforward rebase, though with this elaboration I can see how you could describe it that way.


I interpret the author's proposed model as: "story mode" is

    git log --grouped
and "commit-by-commit" mode is

    git log
Are you proposing something different?


In my organization we kind of do this. We use one merge commit per pull request, without squashing. However, we also rebase before merging, which results in a commit graph that looks like a cactus:

      o-o-o   o-o   o-o-o
     /     \ /   \ /     \
    o-------o-----o-------o-->


I also prefer this method - rebase and force a merge commit. It’s pretty much the grouping functionality that the author wants. You can tell a short story or break a merge into a couple parts, and bisect + revert work. You can also link each merge with a merge/pull request and a ticket. Also, an occasional quick fix commit to master works fine and makes sense. Sometimes a merge without a rebase makes sense as long as it doesn’t make the graph too confusing. Different workflows work for different teams, but I like this one.


The "semi-linear merge" (as Azure DevOps calls it) is our preferred merge approach as well, though reasonable cases can be made for alternative merges depending on circumstance, e.g., an old version hotfix merge into master is of course done as a regular merge.

In addition, if people are feeling charitable, the branch will be cleaned up prior to merge with an interactive rebase to squash out "Wip" commits and hopefully leave a nice clear set of self-contained commits that provide a logically separated view of the work that went into the change.


> In addition, if people are feeling charitable, the branch will be cleaned up prior to merge with an interactive rebase

If you're not doing that, might it not be better to squash? The point of not squashing is to preserve valuable granular commits. If you have WIP commits, those don't have much value.

I suppose it depends on how clean the feature branch is. Personally, I have horrible commits, knowing I will clean them up later.


Interactive rebase allows you to optionally squash a commit down (or edit or otherwise manage them). The idea being that you can keep the important commits that express the purpose of the branch while cleaning up those that are not important for that purpose.


This is the best option in my opinion. You can also "optimize" by doing the forced merge only for branches that have 2+ commits. That way, if most of your branches have a single commit you won't have many useless merges. The branches with multiple commits will still be visually grouped.


GitLab offers this in the web UI. Even Azure DevOps does. GitHub apparently doesn’t, though.


GitHub is weird. Even their rebase workflow works differently than git by creating a new hash :/


A rebase always creates a new hash, because the commit hash is a hash over the contents of the commit. This contents include, among others, the parent commit(s) and the time of committing (which in case of a rebase will be different than the time of authoring).


I think that's not what the GP means.

"git rebase" doesn't create a new hash if there's no change as a result of the rebase. But GitHub's PR rebase button always creates a new hash even if there is no reason to. (Its PR merge button does not do this; it will merge a one-commit PR without creating a new commit).

To add to the inconsistencies, GitHub doesn't sign the new commit when using the rebase button so it doesn't show up with the green "verified" icon - even if there was no need for a new commit anyway. Yet when using the merge button it does the opposite - if it doesn't need a new commit, your signed PR commit is merged to the main branch and shows as verified, and if it does need a new commit GitHub signs it (if the PR commit was signed) so it still says "verified" (even though it's really GitHub's key that was used, not the author's).

For this reason, when I merge PRs I avoid the GitHub UI, and use "git rebase -S" locally followed by "git push". This does what the PR rebase button should do.


FTI: "git rebase -S" signs the commits.


Sure but that's not really helpful, the only thing it does is avoid breaking history visualisation tools which tend to deal very badly with "wide" histories.

For instance one of the biggest annoyances with git is it's a pain in the ass to find the the merge of a commit into the mainline (aka the next child with more than one ancestor… probably), which can make it difficult to go back from a commit to a PR unless it was a single-commit pull request.


> Sure but that's not really helpful, the only thing it does

It encodes the necessary information in the commit graph, without introducing a completely new concept (commit groups). It’s true that Git doesn’t give you the tooling to get that information out of the box, though.


> It encodes the necessary information in the commit graph

A bog-standard merge already does that.


https://github.com/mhagger/git-when-merged

edit: also https://stackoverflow.com/questions/8475448/find-merge-commi..., and also github shows this (but only for github PRs, not other merges)


For visual history this seems like a nice approach. In a large organization, can it be done atomically or is there a risk someone else merges before you during the rebase?

But I believe this also suffers from the problem described in OP where you can't tell if HEAD^ (or any parent) is from master or from feature branch.


> In a large organization, can it be done atomically or is there a risk someone else merges before you during the rebase?

It can be done atomically by having a tool perform the rebase and merge, and never merging anything by hand.

That’s necessary to implement the “not rocket science” rule anyway.


There's no argument for any other method, really.


You can do this today in standard git with https://www.davidchudzicki.com/posts/first-parent

Summary: every feature is merged with 'main' as the first parent. Then, whenever interacting with history, you tell git you only want it to consider --first-parent


This. There is no need to introduce a new concept like "commit groups" to permeate through everything. Just use merges and display them nicely.

The problem is getting all the visualization tools on the same page.

Bazaar got this right. Their official UI displays a linear history with the ability to expand any merge commit to show the side branch.

https://commons.wikimedia.org/wiki/File:Bazaar_Explorer_-_Lo...


Thanks, I did not know that about Bazaar. I wish that were the norm in Git-land.

But I can see why people might still desire more explicit constructs to handle history organization. The expectation that the commit graph should be both logical and historical causes a sort of cognitive dissonance, which these sorts of conventions don't fully resolve.


Merges are truly horrible, though.

They're a commit with two parents, one of which has no real meaning anymore because it doesn't exist anywhere else. They break a lot of tools because of this.


The first parent is (generally) main.

The second parent is the last commit of a series of commits starting at a commit, sometime in the past on the main branch, followed by all the new commits that represent the feature that is being merged.

Merges are horrible, if you don't have a mental model of git that aligns exactly to the above.


> one of which has no real meaning anymore because it doesn't exist anywhere else.

What do you mean? That's the feature branch, what's the issue?


In your .gitconfig

    l = log --graph --abbrev-commit --date=relative
Im not sure about collapsing history though.


The author makes the argument that 'first parent' being the original branch is a mere convention.

And he's right. If someone does the merge while checked out on the feature branch, then commits it _as_ the master branch, then the first-parent concept breaks.


But that problem is easy to fix with 0 disadvantages: Don’t do that.

Don’t do it for the same reason that you have a convention of useful commit messages.


I masquerade as proficient in this field and I'm pretty sure I've done that just by accident with git a few times. Assuming any developer on your team is much better than monkeys on typewriters when it comes to git will lead to disappointment.


I've probably done it a bunch of times on my side projects too, but in any environment where code review is enforce you should probably be enforcing using the tooling to do the merge too, and not giving individual developers permission to push to the main branch.


Make a precommit hook that enforces this. Done.


This only enforces it where you have the hook installed. And it cannot be pre-installed on people who have newly cloned the repo.

Which makes it nearly useless. Not totally - the people who know about it benefit - but it's extremely far from safe. The only way to really enforce this is to add this kind of thing to your "main" remote repo that everyone pushes to.


"Don't do that" is not a good solution to errors which are hard to spot and easy to make!


The other part is - merging is software development, write a commit message better than "Merge branch 'foo' into main"

e.g. Add new feature foo to main menu

...


We need Github and other Git commit visualization tools to offer a "first parent" option!


Yeah just need to make git bisect --first-parent, and then the solution be obvious.


git bisect already supports --first-parent: https://git-scm.com/docs/git-bisect


Oh great! That's new since last I tried.


Emphatic agreement, this is something I've wanted for a while. For the purpose of history management/traversal (reverting, bisecting, or just understanding) you want certain code changes to be atomic, implying that they should all be understood/tested/reverted together. But for the purpose of code review you want something more granular and less monolithic, so that a single reviewer can more easily understand a large change, or so that review can be more easily divvied out among certain code owners. Currently getting all all of these properties requires decomposing the commits during the review phase and then squashing before merge, but this is a bad practice because now no reviewer has actually signed off on the code that's getting merged and you're just hoping that nobody's slipping any last-minute or malicious changes in during the squash.


Agreed; however, since this is HN, I'd like to suggest that it should be totally doable to improve on the squashing workflow using tooling. Programmatically it's easy to verify that the signed-off commits induce the same diff as the squashed commit; they're either equal or not. Then, if the interface where signoffs are registered (e.g. github PR) enforces that all PR's are signed off while allowing a squashed, un-signed-off commit if it has the same diff as a signed-off commit, then you can squash at will. Optionally you could also require that the signers additionally sign-off the final commit, under the tool-provided guarantee that the diff is identical.


The author identifies the problem which I think is a fundamental failure of the git model which is that commits aren't associated with branches. But proposes a different solution. I wonder why they didn't decide to attach a branch identifier to each commit. This would solve the problem as I see it as you could truly view a branch history; typically for example you'd be asking for logs of the "master" branch history. The topological view can help but the lack of notion of a branch as anything but a pointer to one commit means you effectively do loose history unless you go for a burdensome tagging scheme.


> This would solve the problem as I see it as you could truly view a branch history; typically for example you'd be asking for logs of the "master" branch history.

Which "master"? Since git is fully distributed, there's no central repository which contains the true "master" branch. It's perfectly valid to have two independent lines of development, both naming its current branch "master" (in separate clones of the repository), and later merge one of them into the other. As another example, consider the branch named "for-linus"; take a look at the Linux kernel git history, and see how many independent branches all named "for-linus" are merged on each release.


Every remote could have its own list of "revs that were on branch". Gitlab, or whatever is your server of choice, would then automatically record this on merges. Most code review tools already has this information stored in their databases outside of git.

It would need to be transferred on the side and not in the commits themselves to not rewrite commits in case of merges from other remote. Similar to tags, but instead of only pointing to single commit it could be a list, let's call them labels. Git log would then add option like --labeled-by=origin/merged-on-master


But commits change branches. None of the commits started on the "master" branch, they started on some developer's branch (which might also be called "master" in a different repo, but is still separate).


I'm not sure I understand your point?

Say for example, I'm looking at a freshly cloned repo. There's a first commit and most-recent commit on master - I can identify them with git-log. The problem is that I cannot view the path of commits between these two if I'm only interested in the commits made when the current branch was master (which is generally the case unless I want to drill into a feature branch).

Disallowing merges makes the problem go away but that removes a lot of options in terms of work-flow.


> I'm only interested in the commits made when the current branch was master

When I'm working none of the commits are made when the current branch was master. They are made on branches, where commits are finalised, tested and signed, and then master is fast-forwarded to match the branch tip commit (or something intermediate if I'm satisfied with that). Conflicts are detected by the fast-forward and reconciled locally in the branch, re-tested and signed off again.

It's rare that the current branch is master, and no commits are created while on master.

So what would you like to see?


I don't follow - If you are on a freshly cloned repo, none of the commits were made when the current branch was master, they were made on another user's git repo before your repo existed.


Would `git log --merges` solve this? Assuming you use a merge-based workflow this would show only merge commits without any of the details of each individual commit from the merged branch.


[flagged]


Please don't break the site guidelines by going on about downvoting like that. It just invites more downvotes—this time, correct ones, however unfair they were in the first place.

It's true that bad downvotes occur, but it's also true that fair-minded users give corrective upvotes to reverse them: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor.... This would have been a perfect case for that, if you hadn't spoiled the post by venting about downvotes. It's true that they can be annoying sometimes, but that's not a good reason to make the thread worse.


Most western communities does not rely on dark patterns. Reddit is probably the only site besides HN that have dark patterns when it comes to voting. Downvotes does not promote what you think it does. Go look at the stats if you have them.

Edit: I would also downvote your garbage input if it possible.


This is possible by doing this:

  git checkout feature
  git rebase main 
  git checkout main
  git merge --no-ff feature


I came to the comments to post the same thing. --no-ff is under appreciated.

I like that this poster is at least thinking about what he wants the history to look like, but you’ll never get it good with a broad rule. It really takes the exercise of rewriting every change you make from a stream of consciousness set of WIP commits into a logical series of incremental patches with clear commit messages. And doing that every time you want to push anything (especially when it is “just” a WIP or feature branch). After a months practice your sense of taste will kick in and tell you if a --no-ff merge seems appropriate for the current chunk of work.


This is what I also do. I rebase all my branches and merge them with a no-fast-forward commit, which does not introduce any changes itself, but can be used to document the overall changes of a specific feature.

The best part about this workflow is that history remains linear; it is very easy to track the history of changes (since there is never a "branch" with changes on both sides) while at the same time you keep the ability to visualize where the start/end points for a given feature were.

It also works with nested branches! You simply create a new branch2 from your branch1, and then merge --no-ff branch2 to branch1.


You also get to write a nice detailed commit message for the merge commit explaining the purpose of the branch being merged, which may not be apparent from the individual commits. Without the merge commit I’m not sure where this would go.


right, and then git log --first-parent


On Github the workflow we use for this is rebase in the command line and then press the "Create a merge commit" button in the web UI.


Isn’t that what GitHub does when the rebase strategy. Because when the rebase can’t be made cleanly GitHub still asks for conflict resolution.


No. If you rebase you get no merge commit by default.


Ah yes. My bad. You explicitly want the empty merge commit.


I was going to post exactly the same. Bonus point you can make it work like that in GitLab (but not GitHub)


I too wish for this.

Some people are really great at writing commit messages, most are not. But for even for the ones who write the best commit messages, often their is a lot of discussion inside the actual MR which never makes it into git.

GitLab and GitHub write a merge commit that can be used to get back to that discussion, but a git blame or bisect doesn't take you to the merge commit, which means you have to spend a fair amount of effort to get to the merge commit.

Treating merges as a group would be amazing.

Something but addressed in the article is how you deal with groups of groups of groups of merges. IE topic, feature, dev, master branches.

It would be somewhat difficult to make the tooling graceful at ungrouping different levels of groups.


--first-parent works for both bisect and blame.


I’m one of the people who strongly advocates the squash merge. My PRs are also single commits most of the times with the PR body as the commit message. That is a hidden gem of GitHub, if you open a PR with a single commit then GitHub will use the commit message as the description.

The reason why I prefer squash commits is not only the linear history but also the fact that I’m simply not interested in the sausage making process. If a PR becomes so big that it would need multiple commits than I request smaller patches. But I also see that this heavily depends on the team size and general setup. But I use squash PR ever since it was introduced in GitHub. Before I manually rebased the feature branches to have a clean merge.


I’ve worked on multiple codebases where nobody on the team had been there when the sausage was first made. The squash commits are some of the worst commits for finding out why a weird design decision was made because all the things that came out in code review or would have had bugfix commit messages in their local branch are all squashed together in one commit. CRs help a lot but frankly I do want to know how the sausage was made when I’m look through the git history or else I would just look at the code directly. Especially when I’m investigating some bug in prod.


I don’t quite understand what you mean. As I wrote I write my PR message as the commit message. That is one thing that is also very important for me. Write in the commit message why something changed. If the PR gets so damn big that it is so much to explain everything than it’s quite frankly too big. From my experience the follow up commits for fixes or code review changes boil down to messages like “fix stupid bug”, “adjustments after code review”, etc. which are also not helpful when looking for the why. But I honestly understand what you mean. I’m just saying that limiting down the number of commits and the size of changes in them helped me over the years. I also don’t do it when I start of from a white paper. Only when the project enters the stage when PRs plus checks help mitigate potential bugs etc.


> The reason why I prefer squash commits is not only the linear history but also the fact that I’m simply not interested in the sausage making process.

I see this a lot and in my opinion squash on merge is just a very poor version of using rebase to put your commits into a proper state before merging.

A commit should be one logical set of changes. It's great if you can get a piece of work done in one change, but often larger work requires several changes.

Beginners seem to treat pull requests as places to pile commits until you get something reasonable at the end (the sausage making process). A better way is to curate your work and clean up the individual commits as changes are requested in review. Then you have a coherent history with commit messages that might tell you something useful.


There is another angle I forgot to mention. That is that I also want that each commit can be compiled (depends on the project obviously) and is tested and won’t fail tests. That is super hard to achieve when only testing the last commit of a PR through a push. I mainly want that to help when bisecting. That’s what I achieve by smaller PRs. But again I personally prefers this and lay out my projects and tasks in a way that this actually works for me. I’m not Dogmatic and see the reason for different strategies.


> that I’m simply not interested in the sausage making process

Sure, no one really cares about they sausage making process, but tracking down regressions is so much easier with a proper history. Bisect is your friend.


Sincere question: does a “squash merge” implicitly include a rebase as well when the merge target has diverged?


It's equivalent to rebasing, and squashing into a single commit.

It's also equivalent to a merge commit, and then only keeping the diff from the merging in branch.

So it's sort of a half way house, and sort of the worse of the both worlds.


Yes it has it up and downs. But it also helps me and my team to enforce smaller patches since they land as one anyways. And other teams in my company follow the open PR and develop the next 2 weeks on it and merge in full. There is so much garbage in all these commits no one ever wants to get back to. Good luck bisecting on of these histories.


How independent are those commits? If I'm developing a feature that has distinct but dependent changes, then the rebased/squash pattern doesn't work as well. Submitting in a single PR means that those distinct changes all get squashed away. Submitting in a series of PRs keeps the history, but means that the have commits on main have different ids, and so I need to keep using `git rebase -i` to remove the corresponding commit from my dev branch whenever one gets merged in.


GitHub is pretty smart nowadays. If you propose a series of PR on top of each other, GitHub will change the base when it the base of the second PR got merged and so on. There is sometimes the need to rebase but I have generally no problem with it. Mind that I don’t propose this workflow with teams that also have non tech members like artists because these Workflows bring too much friction. But I keep my changes as independent as possible. All running with tests to cover the added/changed code.


It means using `git merge --squash` which performs a normal merge but then instead of adding a merge commit, it just makes a single regular commit with the changes.


Azure DevOps has this concept, it’s called “semi-linear merge” where it will rebase your PR on top of the branch being merged into, but then create a 2 parent merge commit with the PR comments and text to merge the content, letting you easily reconstruct what changes were made in one PR, while also preserving commit history and keeping the history clean overall.


This is what GitHub used to do, before they changed it to the squash model.


My company previously had a beautiful (but unconventional and much-misunderstood) solution to this issue. To merge a feature branch into master, we would _first_ create a squash commit consisting of all the changes with the commit text containing the review message and a link to code review. Immediately after, we would _also_ add a merge commit that referenced the original feature branch, but resulted in no changes to master.

This meant that log/blame would default to showing the high-level summary of each review, but the git commit graph still has the full gory history if anyone needs to look at it. More importantly, you could safely start developing on top of someone else's unreviewed changes and git would correctly be able to track who made what changes, in a way that breaks badly with a standard squash or rebase workflow.

To view the "simplified" linear history, you run "git log --first-parent --no-merges", which unfortunately doesn't have a config option to set as default.

Unfortunately, we lost this workflow when we moved over to GitHub, and now git blame is surfacing everyone's broken wip commits because we've stopped adding the high-level summary squash commits.


I see lots of comments here on how important it is to preserve "history of what really happened", arguing that cleaning up the history "is a lie the same way that people who wear makeup are 'lying'" and that "someone might learn something from it".

I feel that it's quite over the top. Nobody goes to the commit history to "learn programming" (or at least you should not) and if you did, it would be so confusing to see things added in a way and changed back 2 commits later because it didn't work.

The code is just half the story, it needs context, it doesn't represent the train of thought behind the changes, that's only in your head. If you want to document what didn't work write a document where you explain the different approaches you took, why they didn't work and what you did at the end. That really is super useful.

Just keep the history clean, if you ever need to revert a change you will be grateful you didn't add many changes to 1 commit or didn't change the same code in 5 different commits throughout the PR. Keep the code working at each commit, if you ever need to `git bisect` you will be grateful you did.


The most important benefit of preserving the history is being able to go to an exact state of the program in the past. Pinpointing the commit that introduced a regression is often by far the quickest method to localize the problem. And it is impossible when the history is rewritten, and consists of snapshots that were never even ran.

This benefit is much more important than having a visually pleasing history. And you can always get a visually pleasing history by adding a few flags to git log.


> The most important benefit of preserving the history is being able to go to an exact state of the program in the past.

No one's talking about ever getting rid of any commit ID that got checked into `main` or `master`.

We're talking about getting rid of commit IDs that only ever existed on one machine and only during development.

From my reflog, I see this commit ID: `21b9f2e HEAD@{28}: commit: Fix and instrument nstray URL problem`

No one needs to know about my typo - ever.


I had the similar thought a little while ago (I called them nested commits): https://twitter.com/tolmasky/status/1212452048618131456?s=21

In my version, the nesting can be infinite of course (I guess the author here would call these "groups of groups" -- but that might be complicated with the flat approach of a group being a range of commits).

But basically, I want the UI of an entire "set of commits", but then a disclosure triangle to be able to see the "true history" if it's interesting to me. It is simply the case that sometimes one is more useful than the other, and other times the reverse is true. There simply isn't a one-size-fits-all solution. As far as most people are concerned, you only ever want to, for example, unroll the entire set if a test fails. Or, you want to cherry-pick the entire set to another branch. They serve as one logical unit. But if you for example care about the decision-making process that lead to that final code change, you can see it by revealing the "inner history".


That history that looks tangled and awful would look a lot better if the commits were sorted by `--topo-order` instead of `--date-order`. That sort “groups” commits that are in a single line of history.


In theory, you could implement commit groups already in one of two ways: objects in a special ref that are just lists of commit hashes (tree-style) or a branch where every commit has, in addition to the last HEAD of the branch, all the other commits in the group as parents.



So groups would be just named commit ranges? I'm not sure I see the appeal if it's already possible.

The article also seems to imply that there's one "right" way to merge branches and that teams should stick to one approach. I disagree. I use all 3 approaches whenever it makes sense: merge commits for large PRs with more than 1 commit where you want to preserve the history of the changes, squash+merge when the history is messy (usually after code review) but the change itself should be atomic, and rebase when the history is clean, though I mostly reserve rebasing for smaller single-commit PRs where a merge commit would just add clutter.

I also disagree that Git is this flawless piece of software we can't improve upon. It regularly fails to do fairly trivial merges automatically, forcing me to manually fix conflicts, or use `rerere`. Ideally my code versioning tool would understand language semantics and be as maintenance-free as possible. Git is nowhere near this and requires quite a lot of familiarity and hand holding to work as the user intended. The amount of time and effort spent understanding and using it properly is difficult to quantify, and it's still a tall hurdle for new developers.


Emphatically agree.

The git cli UX is terrible, cryptic, and forces one to think way too much. The number of GUI tools that wrap the git cli should tell us all something.

I Mercurial. It - manipulates the identical data structure (the DAG) - interops just fine with git (thanks hg-git plugin!) - its cli manages to use common sense verb names, by default, for all its functions - includes no unnecessary concepts/models (read: git's index)


> [Mercurial]...includes no unnecessary concepts/models (read: git's index)

As someone who stubbornly uses mercurial for all his code, git's index is the single greatest feature that git has over mercurial (not counting the Magit interface). The index allows me to _incrementally_ build up a commit. I can go back and fix things, add and remove things from the index, and only when I'm ready, commit. In mercurial, I have to hold so much more state in my head about what is ready to commit, what needs to be fixed, and what is code that needs to be reverted.

And `hg commit -i` is nice, but is not a replacement for the index. Git's index allows me to add hunks, then stop, go to lunch, review the status, remove some bits, add more, then commit.

I'm told that Mercurial's queues feature would support this, but I find it incredibly non-ergonomic, and I can't quite figure out how to use it as an index replacement.


Agreed...the git command API is confusing, with too many options to perform a task but with different side effects that are not obvious.


> So it tells you that these two parents have been merged together, but it doesn’t tell you which one used to be main. You might guess 8, because it’s the leftmost one, but you don’t know for sure. (Remember, branches in Git are just pointers to commits.) The only way (that I know of) to be sure is to use the reflog, but that is ephemeral: Git occassionally prunes old entries from reflogs.

This feels extremely pedantic to me. There are certainly workflows that produce this confusion, but the most common ones definitely don't.

For the most part, in especially most github-based flows, the 'left-most' (aka `git log --first-parent`) history of the main branch is precisely the history of the main branch, and the "commit groups" are the divergent "right" parents.

Can someone do something that temporarily breaks this? Sure. People `git pull`ing with divergent changes at least used to litter projects' histories with this kind of nonsense. But it's not terribly likely to make it into your main history these days if your upstream repo has a 'protected' main branch, which is so normalized at this point it ought to be considered the default state of affairs.

It seems like maybe the thing the OP really wants is just for the branch name at commit time to be stored as metadata in the commit. That would maybe help with pulling out intentions while looking at history.

Also, Mercurial had a kind of 'hard branch' feature that also might have resembled what's desired here, but as far as I can tell most users of mercurial found it more frustrating than helpful and used plugins that provided looser kinds of branching.


I disagree with how difficult merge commits are. It's not something to filter out in your view - it's a representation of an actual change made by combining two different versions of a document/code/etc... They are valuable indicators in history.

Furthermore services like Github add valuable valuable comments like the PR number that was merged.


In principle, I agree with the author. Although I don't think git needs group commits to achieve this functionality. My preferred workflow for the past couple of years has been to interactively rebase and squash/fixup! the commits so that each commit represents a functioning state of the code, which more or less achieves the same thing as what the author wants.

However, this approach only works if all the developers on a project buy in to this philosophy.

As much as I dislike squash and merge, it's better than the alternative of trawling through a git history with dozens of "WIP" and "Fix tests" commits and janky rebases/merges.


I use squash merge and make sure each PR is small enough to justify being just one commit. Big things which are too big to fit in a single PR can be tracked in other ways, such as mentioning a Github issue number or project management issue ID in the commits.

Individual commits on a PR/feature branch are useful to check what's changed since the last review (I silently despise developers who force-push their rebased commits when fixing things, making it basically impossible to re-review).

I just wish it was easier to detect locally whether my development branch has been squash-merged so I could script the deletion of them.


Teams I've worked with lately use squash merge with the PR summary set to the issue number and issue summary from the issue tracking system. So, there are links to the issue and the PR from both systems if we need to track down why something was done. Seems to work well for us.

I personally wouldn't want to see 15 commits to the main branch from one PR regardless of how descriptive the commit messages may be.


I think most of the author's pains could be implicitly solved by using tags on merge.

That way you maintain a linear history, you get clear marks of when each pr was merged and you can see what happened in between.

The obvious downside here is that you'd need to add yet another step in your process and come up with a meaningful system for this.


Is it not standard practice to first merge 'master' into 'feature' before you merge 'feature' into 'master'? If 'master' has changes that are not in 'feature' then 'feature' is out of date, testing probably needs to be redone. It also means any merge conflicts are resolved in the 'feature' branch. If 'feature' is a long lived branch then is should merge 'master' on every release anyway.

If your weakest git user cannot revert easily then you're in trouble. Enjoy being on call 24/7 otherwise. Reverting is more important than committing.

I genuinely don't understand why people care about git history so much. I've never needed to look at it in 8 years of using git.


Git history plays a critical role in code forensics, particularly in large code bases (or places where you may have a large number of potential authors). I. E. 10 commits just went to production from 3 different teams on one service. Something breaks a few hours later; was it one of the 10? Was it a corner case in something much older? Or was it someone's feature flag going live?

If you deal with a service that has been matiained for a few years, this is also an excellent way to figure out what was done why and when. Or figure out if a well meaning rebase accidentally clobbered a critical piece of ancient logic. Or determine who the heck owns something when you realize it's time to split up a larger service.

The list goes on. Note git history plays directly into git blame too; it can be an excellent tool in the right circumstances.


For stuff like "this line doesn't make sense next to this other line. Why was it added?" Sometimes a line was deleted between them, sometimes it was a bad merge, sometimes the rest of the commit tells you "oh, it was added to make the Foobar work."

I have allocated some of the valuable left-hand-only keyboard shortcuts in my IDE to searching Git history. It tells you the "why" when the "what" doesn't make sense, and the "who" when git blame shows some reformatter.


While GitHub is not Git, it offers this by having concept of a pull request/review that can be merged as single commit but kept as multiple commits in the review history. What if git incorporates this concept as a core feature?


As pointed out in other comments, git does allow this by using `git log --graph` and 'merge --no-ff'.

However, it is also necessary to use git "empty" commits as the first commit of each "group" to prevent confusing output from 'git log --graph' when nesting and sub-nesting groups of commits.

Here's an example of doing this -- https://github.com/maratbn/test_nested_sub_groups_of_commits...


I started using the convention of prefixing commit messages with the ticket they're addressing, which I think is actually pretty helpful. Makes it way easier to do interactive rebases too.


in my personal repos I follow the convention to prefix add/mod/del/fix/sty: to each commit to indicate whether it adds, modifies or deletes the API or whether it fixes or changes the private interface, or if it merely changes stylistic elements (ie no changes to functionality). This helps me quickly understand how each commit affects the whole


Yes this would definitely be a way to shut up the “squash merge all the nuances of all the commits for this feature into a single commit called “implement feature XYZ”” crowd.


This reminds me of Pijul and Darcs, which are patch based VCS's, although I'm not sure that's quite what the author of this article has in mind.


The author might enjoy `git rebase -i main`, which allows reordering, renaming, combining or pretty much everything in your own branch before you rebase and merge it to the main branch.

That way even if a commit message is not clear or you added an improvement to an earlier commit, you can reduce the clutter a lot before sending out for code review, while still having individual commits for different parts of the code change


Doing that will kind of work, but you'll lose the context of the group after merging/fast-forwarding onto the main branch.


If you always create a merge commit, you can see the group as you traverse the history via the two parent commits. One will walk the group and the other will go directly to the state before the group.


I feel like, as with many things, this is something that git "has", but that interfaces/uis for it just handle poorly.

Maybe "poorly" is unfair as this would be relatively complex to implement in a way that has good simple intuitive ux. But from the perspective of anyone using vcs daily it does seem an obvious want, so I'm surprised it hasn't been done well yet.


yeah, if this was git merge --feature foo (equivalent to git rebase main foo ; git switch main; git merge --no-ff foo) it might be nice, but then again, making custom git scripts is pretty easy.


I was disappointed when I learned that git doesnt remember which branch a commit was made in.

cus,

Each feature = one branch

Each little change = one commit

It's annoying that they throw out half of that info.


git forgets nothing. A branch is just a reference to a commit, that reference goes nowhere unless you delete it


Not GP.

I may be wrong here, but I would argue that git does forget something: it forgets where the branch reference used to point when a new commit is made on that branch. If a commit has more than one parent, that means some information is lost because it has to guess which parent the branch reference came from.


That info is in the reflog, but it will be GCd eventually.


afaik, only merge commits have two parents. Then the source branch would still point to the head of that branch. Furthermore, without having checked, aren't merge commits deterministic in the order of references? I wouldn't be surprised if this information isn't trivially available in a porcelain command but more surprised if it's lost


If a branch is just a reference, then how that reference changes over time will be lost. As astrange says, that info will be in the reflog, but the reflog is subject to GC. Once a GC happens, that info is forgotten.


I'm not sure I appreciate the point the author is trying to make because it sounds like what they want is a merge commit


> Under the hood, all the commit really says is:

> Merge: 8 6

> So it tells you that these two parents have been merged together, but it doesn’t tell you which one used to be main. You might guess 8, because it’s the leftmost one, but you don’t know for sure.

Why don't I know for sure it's gotta be the one on the left?


> Why don't I know for sure it's gotta be the one on the left?

Because of fast-forward merges.

It will be the one on the left if you, as expected, were on the master branch and merged the feature branch ("git checkout master; git merge feature"). But if you were on the feature branch, merged the master branch into the feature branch (usually, this is done to resolve conflicts), and then went back to the master branch and merged the resulting feature branch into it ("git checkout feature; git merge master; git checkout master; git merge feature"), it will be a fast-forward merge: no new commit will be created, and master will point to the merge commit which was originally on the feature branch, which is going on the opposite direction as you would expect.

The solution, as others have already mentioned here, is to do all merges to master as non-fast-forward ("git merge --no-ff feature"); that gives a consistent order to all merge commits on the master branch, and the end effect is the most similar to the "grouping" feature OP wants (if everything on master are these non-fast-forward merges, the difference between one merge and the preceding one is that "group").


> which is going on the opposite direction as you would expect.

It looks like a bug which should be fixed. Git should create a merge commit when it sees that "the direction" changes.


Thanks.

- It's still ok to "git-pull --ff=only", right?

- Is it possible to enforce this at the CI level?


Yes to both, though I don't know how to enforce it. (A push hook would do.)


Any workflow that alters history is dangerous. Just merge things normally, and don't get fancy.

If you want to futz with stuff and squash commits and rewrite commit messages because you didn't write good ones on a feature branch, fine, whatever makes you happy.

But just merge to master, don't rebase.


The downside if you don't rebase your branch is you end up with a merge commit with two parents with independent commits on both sides which makes bisecting more of a challenge then if you had a linear rebased history.


If you use the default merge messages, can't you tell which was the branch that got merged? The second parent is the branch that got merged and its name will appear in the commit message on the merge commit. This is probably something that you could infer most of the time.


--log ?


What command is that flag intended for?


git merge


Git will provide you with a default message for git merge without specifying any flags. This message includes the branch being merged. I don't think there is support in git to view history with the branches being marked explicitly. However, you can get something similar with:

    git log --pretty --oneline --graph --topo-order
That should at least group commits roughly by branch. Did that answer what you were asking?


It's not what the author was talking about, but the title reminded me of perforce changelists. I wish git had something similar. Basically, it's a way to organize and group unrelated changes until you're ready to stash/shelf/commit them.


Another way is to have a concept of the “start” of a branch. If when you create a branch it remembers where it started from then the commits on the branch after that point form a natural commit group. If you need finer commit groups you can have more branches.


The idea of commit groups makes me think of Linux kernel style patch series and https://github.com/git-series/git-series


I will admit I am always afraid to lose my changes if I squash! Maybe I should try making a new temporary branch before squashing to give myself peace of mind.


I used to have that fear also, but after playing with reflow a bit and seeing how you can get back to almost anything, I’m more comfortable.


git merge --squash on the command line will do this. It automatically dumps all your beloved handcrafted commit messages into a single squashed commit.


I wish git had a function to dump a checksum that I could call on with the command 'git sum'

Please, devs


Isn’t that just the commit hash of the head?

Or do you mean a hash of the hashes of all your heads?


I didn't get the last bit: why author can’t write meaningful commit messages with the merge strategy?


Wait isn't a "commit group" just a branch?


A branch is just a pointer, not a group.


Stop trying to hide things!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: