I recognize most of the tidying patterns listed in the article and the associated Twitter feed. My problem is the unnecessary noise in version control. If I’m trying to see when some chunk of code was changed using blame, having a bunch of small edits can make it a lot more difficult.
Still, I tend to do this kind of work on days when I’m not feeling great. I work on a large, reasonably old codebase (28 years old) so tidying busywork sometimes leads me to someplace interesting.
If the noise is behind merges, you can also use `git blame --first-parent` and it should show the big merge commit with the comprehensive explanatory message, rather than the small commit in the feature branch. You can use `git show -m` to show the diff of those merges.
git blame does include --ignore-rev and --ignore-revs-file, so maybe if people updated such a file when making small edits, it would make your life easier.
We’re actually still using Subversion for our main codebase and mostly happy with it. Being able to use --ignore-revs-file is a reason we might want to switch some day.
It can be hard to review or blame small disordered commits. But you can rebase the commits to group and squash them for review. Then you may choose to squash the entire PR when you rebase main onto it.
This has pros and cons. Let's explore them after an example.
Scenario: Big & Noisy PR
We have a PR with 13 commits :
3 fix commits, 5 refactors, 2 doc updates, a CI modification, 1 feature commit, 1 test update
The commits are in random order, and some of the commits are revisions to earlier commits. It's hard to understand the narrative of the commits and review things accurately.
Solution Part 1: Tidy the PR
1. Reorder the commits by type and relevance:
3 fix -> CI ->5 refactor -> 2 doc -> test -> feature.
2. Squash logically similar commits:
fix -> CI -> 2 refactor -> doc -> test -> feature.
A squashed commit should have a bullet list detailing each changed module and scope of change:
"""
Fix(package a): Fix a, b, c
This patch fixes:
* (module b): fix [...]
* (module c): fix [...]
* (module c): fix [...]
"""
3. Review and CI the PR
4. Add commits needed to complete the PR
The tidied PR was easy to understand: We fixed some pre-existing issues, beefed up the ci, then set the stage for the feature. The feature itself was clear and simple.
If you do just this, your history will be much cleaner.
Now I'm going to recommend something controversial:
Solution Part 2: Squash-rebase the PR onto main
Yup. Take all that work and mash it together. The final commit message should look clean and detailed:
"""
Big Shiny Feature
This patch implements [...]
Feat:
* (module d): Implement big shiny feature
Doc:
* (readme): update feature list
* (userguide): add tutorial for feature
CI:
* (workflow a): modify [...]
Fixes:
* (module b): fix [...]
* (module c): fix [...]
* (module c): fix [...]
Refactor:
* (module a): rename [...]
* (module b): delete unused [...]
[...]
"""
Benefits:
- We drop 7x fewer commits onto Main
- Project history is more legible
- Commit messages are detailed and useful
- Bisecting takes log 7 = 2.8 fewer steps
Risks:
- File diffs can be illegible if feature work intersects with refactor or fix work
- There are 7x more defects per commit
- It is harder to uncover root cause if we bisect
Conclussion
Tidying PRs before review is a no brainer - it greatly improves our review and history.
Squashing PRs onto main loses some information, but can make history easier to navigate. Since we're disciplined and detailed in our commit messages, this is often much less of a footgun than it might seem. Each commit is now a logical and self-contained unit.
Still, I tend to do this kind of work on days when I’m not feeling great. I work on a large, reasonably old codebase (28 years old) so tidying busywork sometimes leads me to someplace interesting.