We also found that bug-fixing commits by authors tended to be larger in size and scope, and address multiple issues
Too bad I cannot see examples of what this actually entails - for us one commit which fixes several issues is a complete no-go: that should be separate commits. So now I wonder whether the paper really means single commits which address multiple things, or groups of commits?
I think that largely depends on the team and the code. I've worked in legacy code bases where fixing a bug can cause other bugs to show up. A single PR must address multiple issues to keep the software working.
In such cases I try to split those additional fixes and apply them before your main bug fix that requires them. But yeah, sometimes that isn't possible.
Sure but if the thing you're fixing requires fixing other parts how do you get to 'one commit fixes one thing'. Its simply not possible sometimes. Unless you hack in half baked fixed for the sake of a 'commit to fix X' then have to change that in the next commit to fix Y, etc. That's just a time wasting exercise.
I took it to mean a PR level commit (aka squash/merge commit in git). I work in SVN and git everyday so commit and PR are basically the same unless talking about git specifically.
These are single-statement bug fixes in Java code so it likely would not make sense to try to cut them into multiple commits.
Consider this line (my Java is a little rusty, but let's hope not too rusty as I start a new job next week in Java) in method get_answer of the class Query...
Perhaps the author, upon reading the bug report concludes that the problem is they've calculated a new query but in this scenario it's the similar old query they ought to check in the cache instead and they write:
answer = Query.get_answer("Slightly different query");
Again this is a one line change, and perhaps it cures the symptom, bug closed, but the author was correct, and "Slightly different query" just happens to avoid the case where newQuery is different enough from query for the error to cause trouble.
This type of change that "fixes several issues" is fixing a correctness bug and I don't really believe you that you'd refuse to accept the first one line change because it "fixes several issues". Correctness bugs are going to do that.
> These are single-statement bug fixes in Java code so it likely would not make sense to try to cut them into multiple commits.
But if the "fix" itself were always only a single statement then it couldn't ever be larger in some cases, and then there wouldn't be have any differences in what people submit to describe. I suppose it comes down to whether one interprets the article's "...looks at single-statement bug fixes in Java..." as "[single-statement] [bug fixes]" or "[single-statement bug] [fixes]". It feels to me like you're reading it as the former, while I think it has to be the latter.
So people could well set out to fix a bug (that the authors somehow know was) caused by a single statement, and when the code is someone else's, that's all they touch; but when it's their own, they also, "while I'm at it", fix a bunch of other stuff they notice while looking for the bug. Because they're familiar with the code, so have the confidence, and feel responsible for it, so they have the desire ("Wow, this is ugly, can't leave this in").
So all those changes together could well be several regular working-branch commits. (Maybe then rolled up into a single merge/rebase commit for PR, Idunno, it wasn't quite clear to me either.)
I think it's important to keep in mind the difference between "I changed this one thing to fix problem A, and it happened to also fix B and C" vs "I changed this to fix A, that to fix M, and another thing to fix X".
And I think the former happens exceedingly rarely. :-) Most cases where it does happen are probably due to M and X just being other aspects or consequences of A.
I wasn't really trying to be pedantic, Im genuinely curious about what gets meant here. E.g. as you say, do they mean PR or literally one single commit? And in the latter case, is this something localized to the projects they sampled or is this a more general practice?
Sometimes, a collection of related changes has to go in together. E.g. one fix may uncover another latent bug, so that tests no longer pass, or an API change may be the right solution, but that has consequent effects, and should be done in phases.
Where I'm working now, it's the opposite to your place: It's almost a no-go to use separate commits to fix separate issues in a change that is reviewed as a whole. We have been asked to use fewer, larger commits. In practice, what I see is commits with a description (if you're lucky) that describes the main change, but there are other changes mixed in which aren't mentioned in any description.
I prefer to use separate commits to fix individual issues even when making a bugfix PR that is "larger in size and scope". That way, people can see each part and I can present the fix most clearly, and I have to figure out the best order in which each fix makes sense. I usually do this even with my own private work that nobody else sees, as there is some satisfaction and clarity in dividing up related issues. I usually construct the logical set of changes after I've figured out what they are and satisfied myself they work well together, through use of "git add -p", "git rebase -i" and per-commit testing in a "curate your Git history for others" approach.
This comes from the Linux approach, where a proposed patch series must be, in principle, a sequence of commits designed for review, that could be cherry-picked or have some of them left out.
The unit of PR should be larger than the unit of commit, and commits should each be one reasonably self-contained logical change that could in principle be cherry-picked in isolation, or individually rejected.
But lots of places have a policy which requires PRs to be "squash merged", so all the changes are one commit in the end. The individual commit history is actively destroyed. There is no point putting in the effort to lay out carefully curated commits for a PR, if most of the history will be destroyed soon after, especially if reviews don't go deep.
I've been told this should be done because it "helps with rolling back changes" but I think that's incorrect for my style of commits (because they are curated into functioning logical changes already), and it's a crude approach to rollback. However, it makes sense for those people's PRs which are chronological (commits like "try X", "try Y", "aha, fixed it"), and where part way through the PR, the project doesn't even build.
When I roll back changes that have been merged, it's rare to want to roll back an entire group of changes. By the time a problem is found with a merged PR, it's probably too late to undo it all anyway as other things will have been built on top. That's when I'm most likely to find the individual commits of the PR most useful.
Too bad I cannot see examples of what this actually entails - for us one commit which fixes several issues is a complete no-go: that should be separate commits. So now I wonder whether the paper really means single commits which address multiple things, or groups of commits?