Version control: best practices

VBprogrammer · on June 5, 2014

> Good, descriptive commit messages

All of the examples given in the image going that heading I'd say are mediocre at best.

The code changes should tell you what has changed, if I'm looking back through your commit history what I'm usually trying figure out is why you've changed something.

A ticket number is often the single most useful thing, ideally followed why a very brief what you've changed and as much information on why you've changed it as you think would be useful.

VBprogrammer · on June 5, 2014

As an example of what I'd like to read:

  #2313 Change expected value to a float field.

  django-haystack stores decimal fields as strings internally.
  This means that when we order by expected value small
  values (e.g '9e-08') are sorted before larger values.

babuskov · on June 5, 2014

I prefer to keep such comments in the code, not in the commit message. It is highly unlikely that anyone coming to do some changes to that part of the code would go and read a year-old commit message that he isn't even aware exists.

why-el · on June 5, 2014

There is a difference tough, a very subtle one. A commit body should describe the rationale of the change, i.e. any decisions that had to be made or trade-offs in light of developer discussions and so on. Basically a description of the issue being solved. These things are rarely needed in the code itself, as the code will most likely feature new things and the old things are history.

Some commits even feature implementation details, and those I agree with you sometimes it makes sense to include them in the code. However most implementation details replace old ones, and this is transition should be explained in a commit since this is the perfect place for it.

Here is an random example from the linux kernel, which is guaranteed to do what I described: http://goo.gl/QdKNnJ

VBprogrammer · on June 5, 2014

The example I gave (happened to be what I was working on) would be quite pointless as a comment. Since the code could easily have been written using the float datatype.

pretzel · on June 5, 2014

Try using git blame. It tells you what change caused a line to happen. I often use it to find out exactly what issue caused a seemingly crazy bit of code to happen. As the code goes away, so does the commit message - not necessarily so with a comment.

tomswartz07 · on June 5, 2014

While that's true; `git pickaxe` will also do what you describe. The added bonus is that one could use pickaxe to follow a specific line's changes way farther back than a normal comment.

http://jfire.io/blog/2012/03/07/code-archaeology-with-git/

u801e · on June 5, 2014

You can run git blame to get the information pertaining to that particular line of code. That's a really good way to document what a particular piece of code is doing and why at the time it was written.

ukd1 · on June 5, 2014

This is a great example - it's details, explains the issue and also has the bug id (presumably) in it.

Smudge · on June 5, 2014

You're right -- the diff answers the what, so being able to answer the why concisely is what makes a good commit message.

Ticket numbers are important context, but unless it is a one-off commit you can instead put that info in the merge commit (as if saying, all of these commits I'm merging were made as part of fixing Issue #17).

Some people like to use squash extensively for that reason, but then you don't have the same granularity when it comes to reverting individual bits of that commit without reverting the entire bugfix. (This assumes that every commit leaves the code in a working state with passing tests).

erichurkman · on June 5, 2014

Issue numbers in each commit can also help when you end up cherry-picking commits from one branch to another.

Smudge · on June 5, 2014

I guess in that scenario my preference would be to amend the cherry-picked commit's message to include why it was cherry-picked.

The alternative in your case (leaving it as-is, with an issue number) implies that the entire commit addresses that particular issue number, which may not be true. (It may be part of a series of commits for that issue)

reledi · on June 5, 2014

This creates a lot of noise on GitHub though. I prefer to put the issue number in the branch name and in the PR description. But for one-off commits, it goes in the commit message.

VBprogrammer · on June 5, 2014

Yeah, I prefer the issue number in each commit as it makes the commit log more readable to me. But do a decent job of everything else and that is a very small nit-pick.

jipiboily · on June 5, 2014

I agree it useful, but it's more useful overtime than anything, on the short term, issue numbers are not really useful. You are right, those examples were not our best ones ;)

smathieu · on June 5, 2014

I think the point was mostly that 'lol' or 'meh' is a completely useless comment. It's true that there's a lot of room for improvements in the given example. Still, they do provide some context to people who actually understand the application they are working on.

Does anyone know of good open source project that uses Git messages extremely well that we could use as an example?

KajMagnus · on June 5, 2014

Yes, Git itself :-) git://git.kernel.org/pub/scm/git/git.git and https://github.com/git/git

Git's commit log is probably the most detailed and informative log I've ever seen. And it's a bit fun to have a glance at Linus Torvald's old initial commits from 2005 :-)

(You might want to check the 'pu' branch rather than the 'master' branch. And there're lots of 'Merge branch...' commits that aren't super interesting.)

VBprogrammer · on June 5, 2014

Yes, they are clearly vastly superior to 'lol', 'meh' or worse 'fix'. Still, I think my point stands that they could be better.

jipiboily · on June 5, 2014

Yes, totally! We picked some examples quickly just to demonstrate that people can do better "lol", "fix" or "meh". There is still room for improvements and we have better ones, too.

tomswartz07 · on June 5, 2014

Not to toot my own horn, but I have a few minor projects that I try to meticulously document with the git commit messages.

https://github.com/tomswartz07/linux-configs/commit/3684d3ad... is an example of one.

tomswartz07 · on June 5, 2014

One of my biggest pet peeves is poorly formatted commit messages.

I have one coworker who continually fails to put the empty line between the first line and the body of the commit message. So many of our git logs are littered with 100+ character long messages.

The other commenters hit the nail on the head with the fomatting. Short, precise, and properly formatted.

babuskov · on June 5, 2014

We had an interesting policy in the company where I worked before and I use it for my own projects: only one line is ever allowed - explaining WHAT was changed.

- to explain HOW it works: comment in the code, so whoever comes to change the code later knows what should be touched and what should be left alone

- to explain WHY it was changed, use the bug tracker and just supply the #ID of bug/feature request in the comment

Having single-line comments makes it much easier to browse through history and find the thing you're looking for.

tomswartz07 · on June 5, 2014

I totally agree- That sounds like a wonderful policy to have.

Unfortunately, the powers-that-be aren't too keen on enforcing something like that in our current projects.

alwillis · on June 6, 2014

This is probably more on point--"A Note About Git Commit Messages” http://tbaggery.com/2008/04/19/a-note-about-git-commit-messa...

reledi · on June 5, 2014

Adding to GitHub Pull Requests, I recommend opening them early. This allows reviews to happen early and often, instead of reviewing one giant chunk of changes at the end which may get rejected because it has too many problems.

Plan out the tasks in your PR to communicate what still needs to be done -- I like to use GitHub Flavored Markdown task lists.

Here's a fish script I use when creating a new branch. It opens a PR before I even write a single line of code.

    # Usage: git-new-branch 120-fix-foo-issue
    function git-new-branch --description 'Creates a new branch and opens a Pull Request for it'
      git checkout -b $argv[1]
      git commit --allow-empty -m 'Empty commit to open PR'
      git push origin HEAD
      git pull-request  # Opens your default text editor for PR title. Uses hub.
    end

smathieu · on June 5, 2014

Another good thing about this is that if you have CI setup to work with Github, you can also run all of your test suite quickly and asynchronously for each commit.

j_s · on June 5, 2014

You can also convert issues (which often are created before writing code, as the reason for the changes) to pull requests:

http://stackoverflow.com/questions/4528869/how-do-you-attach...

reledi · on June 5, 2014

I think having an issue attached to your PR is much better than converting your issue to a PR. Here's a good discussion about why it's been deprecated: https://github.com/github/hub/commit/4f70dd126f46dec14fc341c...

smathieu · on June 5, 2014

Although many of these tips might seem trivial and obvious, I've often seem developer who self-identified as 'senior' not follow these practices.

Git and Github are communication channels and you should think about effectively communicating with your audience when you are using them. Very much in the same way that you are when you are writing emails.

The other interesting point about Git is that, very often, you are your own audience when you're revisiting the history of particular months after you've written the original code. You should definitely start thinking about your workflow a little more.

fathom108 · on June 5, 2014

I used to use feature branches more heavily but these days Branch By Abstraction tends to be my first choice.

http://continuousdelivery.com/2011/05/make-large-scale-chang...

dmmalam · on June 5, 2014

Looks very sensible. We tend to not have a long running dev branch, and just cut features branches off master, similar to the 'how github uses github to build github' presentation [1].

Is anyone making squashing part of their workflow? We seem to prefer just plain merges with rebase.

[1] http://zachholman.com/talk/how-github-uses-github-to-build-g...

Smudge · on June 5, 2014

Instead of squashing everything into a single master commit for each bugfix (which I've seen teams do), I use a merge commit to signify a set of commits that comprise a single feature or fix. (`git merge --no-ff`)

I make use of squashing when there is a clear group of commits that all change the same few lines of code or are conceptually part of the same incremental change. But then each incremental change (from working state to newer working state) I leave as a separate commit.

jipiboily · on June 5, 2014

We don't squash and we just use plain merges. It gives the real version of what happened. This is what I personally prefer.

rdrdss23 · on June 5, 2014

I'd be very hesitant with "commit often". Each commit should represent a finished bit of a bigger feature.. Each commit should compile (...generally. With exceptions). Commits that fix spelling, or change some spacing issues end up cluttering the history and making it hard to see what's being actually done. When someone looks over your code they should see things like:

commit1: he/she added some methods

commit2: refactored some code

commit3: he/she plugged in the methods created into the refactored code

etc.

The branch as a whole should be a finished feature. Keep the features small-ish. Try not to have a branch that living on it's own for a long time. You can merge the master branch into it, however you'll slowly be out of sync with your team b/c they don't really know what you're up to.

The only thing I absolutely detest in Git, is that is no quick method to change history on local changes. Say I add a method and make a commit - then I do a bunch of other work and 10 commits later I realize that the method I had written needed to be const. Instead of be able to edit that commit and add the const labels, I end up having to make another commit that no one cares about and no one really should have to look at. It just makes the history incredibly cluttered and make it a lot harder to find the commits I'm interested in.

The workaround is to make a new branch and cherry pick changes - but it's a huge PITA and easy to mess up

Aside:

Does anyone know what diff tool is best for working in C++? The diffs I commit (using kdiff) can sometimes be mind-boggling hideous b/c the diff program parsed things in some strange way.

icebraining · on June 6, 2014

The only thing I absolutely detest in Git, is that is no quick method to change history on local changes. Say I add a method and make a commit - then I do a bunch of other work and 10 commits later I realize that the method I had written needed to be const. Instead of be able to edit that commit and add the const labels, I end up having to make another commit that no one cares about and no one really should have to look at. It just makes the history incredibly cluttered and make it a lot harder to find the commits I'm interested in.

I think you can use git rebase --interactive to reorder the new commit and squash it with the older: http://stackoverflow.com/questions/3921708/how-do-i-squash-t...

Of course, that requires manual labor (brrr), but that's nothing that a small script can't automate. I'd do it myself if we used git where I work.

rdrdss23 · on June 6, 2014

Thanks for the pointer. In truth, I use a GUI (gitExtentions). I like it a lot more than the command line. Hopefully this is implemented somewhere. I'll look into it!

xutopia · on June 5, 2014

I do contracts for many startups and I am surprised how many don't follow a good practice.

Eclyps · on June 5, 2014

The branching model that I use with my team projects used to be like this - master was always production code, development was always code ready for QA, and features would be branched off and development. It worked great for larger projects that had releases that were weeks apart, but a lot of our projects are very small and may have several minor changes go to production in a short amount of time.

We decided to kill our development branch and just create feature branches off of master. If multiple devs are working on the same project scheduled for the same release, we will create a release branch and they can branch features off of that.

An open pull request into master is how we initiate the QA process, and all remediations are discussed in there. When the pull request is merged and closed, we all get an email so we know to update our local code. It's been working really well so far for both our small and large projects.

jipiboily · on June 5, 2014

Just an FYI, with this model, we still do multiple deploys each and every day :)

There might be a point where this won't work for us anymore, it's always time to change.

Also, this is not because a branching model works for a team that it will for all teams and projects.

Like any in-house, process the branching model will evolve with the team and it's requirements to be efficient.

Eclyps · on June 5, 2014

We ran into a couple issues (none of them significant)

1. When someone merges their branch into development for QA, it's going to be there (with or without issues) for anyone else who merges for QA after that. If the first merge has bigger issues than expected, it's a bit of a pain to roll it back and re-merge the later branch. Without the development branch as a base for our QA, we just QA things one feature/release branch at a time. This lets us easily send back branches that aren't ready for production and move on to a different branch that is before any merging takes place.

2. We weren't sure how to handle pull requests with the development branch always being there. We used to set development as our "default" branch in github, so any pull requests would be automatically merged into there. That got a bit confusing, though, as most of us were so used to merged pull requests representing a change that has already been reviewed and accepted as production-ready. And as for the merge to master, should we submit another pull request? It just seemed like an unnecessary extra step in a lot of situations.

But you are absolutely right. Different flows work for different teams and projects. I'm sure as our team evolves, so will our workflow.

jipiboily · on June 6, 2014

For #1, we use Fourchette (https://github.com/jipiboily/fourchette), which means we have a Heroku fork of our QA env for each GitHub PR.

For merges to master, we submit other pull requests that we call "Release: ...". We can then see if CI and Rainforest tests are passing, if all is green, then we merge! :)

ricardolopes · on June 5, 2014

Nothing very new here, but it still amazes me the sheer amount of devs that ignore these simple best practises. So it's always nice to have these reminders every now and then.

ataylor32 · on June 5, 2014

One thing that bothers me is when a commit has multiple changes that aren't related. A commit message will say "Implemented feature X", and that's true, but it's also making a small change to feature Y, which isn't mentioned in the commit message. It makes reading the diff harder because you have extra noise in there and it also makes it harder to answer the question of "Huh? Where did this small change to feature Y come from?".

jipiboily · on June 6, 2014

Commits with unrelated changes should be multiple commits.

robryk · on June 5, 2014

When reviewing a pull request I find it hard to see the changes made since my last set of comments and to find unanswered comments (especially when the comment was on a diff hunk that is now gone). When responding to a review I find it likewise hard to find comments that I haven't responded to yet. Rietveld or gerrit solves all of these issues very easily. Do you have any tips on how to deal with them in github/bitbucket pull request UI?

heldrida · on June 6, 2014

Nice article! I've got a question though, why do you guys branch from "Staging" ? Personally, I branch from "Master" or "Production" and the reason is that "Staging" may have "features" that need proper testing and the "development to production cycle" be longer and I don't want that to be included. Does this make sense ?

Can you explain me how you prevent that from happening ?

jipiboily · on June 6, 2014

We use Fourchette (https://github.com/jipiboily/fourchette), which means we have a Heroku fork of our QA env for each GitHub PR.

Also, once something is on develop/staging, it is about to be shipped. If there is any bug in there, they are they should be fixed ASAP.

As we grow, we might have to do things differently, we'll see, but this works really well for us right now. :)

ukd1 · on June 5, 2014

Good post, though I'd add that you should avoid rebase and force push. I've seen both used an add nothing but confusion / breakage.

See: http://geekblog.oneandoneis2.org/index.php/2013/04/30/please...

swang · on June 5, 2014

These kind of vague blanket statements doesn't do anybody good.

rebase and force push are _bad_ _if_ you run those commands over commits that are already committed into remote branches that are shared by others (mainly the master branch).

You definitely can (and probably should) rebase on your own branches and be fine with force pushing into your own branches (even "remotely", like on a PR'd branch). However once you are touching shared branches, you should definitely not run rebase/force push.

The basic rule is if you are on a branch and you need to use rebase to do squashes or amendments, you are okay as long as you do `git rebase -i origin/master` (assuming origin/master is updated and that master is what you branched off of). Running a normal rebase should not affect any committed changes as it takes your parent branch (origin/master) and runs your commits on your local branch over it.

jipiboily · on June 5, 2014

Good point! Force push should be avoided at all cost on origin. I think mercurial now has that as a feature!!!

Also, I prefer merge over rebase.

dsubhankar · on June 5, 2014

superb..