*We also found that bug-fixing commits by authors tended to be larger in size an...

withinboredom · on Aug 12, 2021

I think that largely depends on the team and the code. I've worked in legacy code bases where fixing a bug can cause other bugs to show up. A single PR must address multiple issues to keep the software working.

account42 · on Aug 12, 2021

In such cases I try to split those additional fixes and apply them before your main bug fix that requires them. But yeah, sometimes that isn't possible.

bonoboTP · on Aug 12, 2021

A PR can have multiple commits.

Milner08 · on Aug 12, 2021

Sure but if the thing you're fixing requires fixing other parts how do you get to 'one commit fixes one thing'. Its simply not possible sometimes. Unless you hack in half baked fixed for the sake of a 'commit to fix X' then have to change that in the next commit to fix Y, etc. That's just a time wasting exercise.

withinboredom · on Aug 12, 2021

I took it to mean a PR level commit (aka squash/merge commit in git). I work in SVN and git everyday so commit and PR are basically the same unless talking about git specifically.

tialaramex · on Aug 12, 2021

These are single-statement bug fixes in Java code so it likely would not make sense to try to cut them into multiple commits.

Consider this line (my Java is a little rusty, but let's hope not too rusty as I start a new job next week in Java) in method get_answer of the class Query...

  return fastCache.answerOrEmpty(newQuery).replace(EMPTY,defaultValue);

Perhaps the author, upon reading the bug report concludes that the problem is they've calculated a new query but in this scenario it's the similar old query they ought to check in the cache instead and they write:

  return fastCache.answerOrEmpty(query).replace(EMPTY,defaultValue);

However a colleague who never really understood Query.get_answer() and reads the same bug report fixes the reported site of the problem, changing from

  answer = Query.get_answer("Some complicated query");

to...

  answer = Query.get_answer("Slightly different query");

Again this is a one line change, and perhaps it cures the symptom, bug closed, but the author was correct, and "Slightly different query" just happens to avoid the case where newQuery is different enough from query for the error to cause trouble.

This type of change that "fixes several issues" is fixing a correctness bug and I don't really believe you that you'd refuse to accept the first one line change because it "fixes several issues". Correctness bugs are going to do that.

CRConrad · on Aug 12, 2021

> These are single-statement bug fixes in Java code so it likely would not make sense to try to cut them into multiple commits.

But if the "fix" itself were always only a single statement then it couldn't ever be larger in some cases, and then there wouldn't be have any differences in what people submit to describe. I suppose it comes down to whether one interprets the article's "...looks at single-statement bug fixes in Java..." as "[single-statement] [bug fixes]" or "[single-statement bug] [fixes]". It feels to me like you're reading it as the former, while I think it has to be the latter.

So people could well set out to fix a bug (that the authors somehow know was) caused by a single statement, and when the code is someone else's, that's all they touch; but when it's their own, they also, "while I'm at it", fix a bunch of other stuff they notice while looking for the bug. Because they're familiar with the code, so have the confidence, and feel responsible for it, so they have the desire ("Wow, this is ugly, can't leave this in").

So all those changes together could well be several regular working-branch commits. (Maybe then rolled up into a single merge/rebase commit for PR, Idunno, it wasn't quite clear to me either.)

cratermoon · on Aug 12, 2021

I think it's important to keep in mind the difference between "I changed this one thing to fix problem A, and it happened to also fix B and C" vs "I changed this to fix A, that to fix M, and another thing to fix X".

CRConrad · on Aug 12, 2021

And I think the former happens exceedingly rarely. :-) Most cases where it does happen are probably due to M and X just being other aspects or consequences of A.

cratermoon · on Aug 13, 2021

For sure, but it's not clear that the cited paper's sampling of changes and bugs fixed reflects that. More research is needed.

dsego · on Aug 12, 2021

No need to be overly pedantic (I know this is HN but still). In this context a bug-fixing commit could mean patch, diff, revision, PR etc.

stinos · on Aug 12, 2021

I wasn't really trying to be pedantic, Im genuinely curious about what gets meant here. E.g. as you say, do they mean PR or literally one single commit? And in the latter case, is this something localized to the projects they sampled or is this a more general practice?

jlokier · on Aug 12, 2021

Sometimes, a collection of related changes has to go in together. E.g. one fix may uncover another latent bug, so that tests no longer pass, or an API change may be the right solution, but that has consequent effects, and should be done in phases.

Where I'm working now, it's the opposite to your place: It's almost a no-go to use separate commits to fix separate issues in a change that is reviewed as a whole. We have been asked to use fewer, larger commits. In practice, what I see is commits with a description (if you're lucky) that describes the main change, but there are other changes mixed in which aren't mentioned in any description.

I prefer to use separate commits to fix individual issues even when making a bugfix PR that is "larger in size and scope". That way, people can see each part and I can present the fix most clearly, and I have to figure out the best order in which each fix makes sense. I usually do this even with my own private work that nobody else sees, as there is some satisfaction and clarity in dividing up related issues. I usually construct the logical set of changes after I've figured out what they are and satisfied myself they work well together, through use of "git add -p", "git rebase -i" and per-commit testing in a "curate your Git history for others" approach.

This comes from the Linux approach, where a proposed patch series must be, in principle, a sequence of commits designed for review, that could be cherry-picked or have some of them left out.

The unit of PR should be larger than the unit of commit, and commits should each be one reasonably self-contained logical change that could in principle be cherry-picked in isolation, or individually rejected.

But lots of places have a policy which requires PRs to be "squash merged", so all the changes are one commit in the end. The individual commit history is actively destroyed. There is no point putting in the effort to lay out carefully curated commits for a PR, if most of the history will be destroyed soon after, especially if reviews don't go deep.

I've been told this should be done because it "helps with rolling back changes" but I think that's incorrect for my style of commits (because they are curated into functioning logical changes already), and it's a crude approach to rollback. However, it makes sense for those people's PRs which are chronological (commits like "try X", "try Y", "aha, fixed it"), and where part way through the PR, the project doesn't even build.

When I roll back changes that have been merged, it's rare to want to roll back an entire group of changes. By the time a problem is found with a merged PR, it's probably too late to undo it all anyway as other things will have been built on top. That's when I'm most likely to find the individual commits of the PR most useful.