zerowidth positive lookahead

A few notes on my git workflow

A colleague asks [slightly edited]:

Why didn’t you squash your [feature branch merge] commit? I’m new to [project], but I found on [other project] that squashed commits made it much easier to git bisect. And since the [project] test suite is broken frequently, I assume we’ll be git bisecting a lot. Un-squashed commits tended to leave broken tests or functionality that never actually shipped.

Git, because it’s just “the stupid content tracker”, is flexible enough to support just about any development workflow you can think of. You can use pull requests only (e.g. GitHub), use a complex release cycle (nvie’s git-flow), or have a gatekeeper with deputies and a blessed repo (the Linux kernel). Because of this, your choice of workflow is a question of philosophy rather than anything the tool itself imposes.

Some argue that the role of a revision control system is the preserve history exactly as it was, without exception. Others, including many who commit code in our codebase, prefer to liberally rewrite and squash commits before pushing to the master branch.

I fall somewhere in between on this spectrum. When working with features, I stand by the “branch for everything” model. The question is, when and how should these branches be applied to master?

My goal when adding code to the master branch is to preserve both a clear history in the overall repository as well as the incremental development of distinct features.

For small features–one or two simple commits–the simplest thing to do is squash the commits and cherry-pick or fast-forward merge them back onto master. For larger features, especially ones that take more than a few minutes of effort and span many commits, I rebase and merge.

First, I rebase, because it helps keep the history generally linear. Along with rebasing, however, I make use of --interactive, including cleaning up fixup commits (e.g. “fixup: whoops, forgot a file”) to clean up the feature branch a little. I strive to have each commit be self-contained, including tests for that piece of the work. I’d like for each of my feature branches to have a clear flow of how I did the work, piece by piece.

Secondly, I merge with --no-ff. This is also what the “merge” button in GitHub does. I avoid fast-forward merges so that distinct features remain identifiable. For example:

before merge:            with fast-forward:      with --no-ff:

o - step 2 (feature)     o - step 2 (master)     o - merged feature into master
|                        |                       |\
o - step 1               o - step 1              | o - step 2
|                        |                       | |
o - before (master)      o - before              | o - step 1
|                        |                       |/
.                        .                       o - before
.                        .                       |
.                        .                       .
.                        .                       .

When a feature branch is merged using fast-forward, each commit of a feature is flattened and their identity as a group is lost. By forcing a merge commit, each feature remains evident as a distinct unit of work, even if nothing else was committed to master in the meantime.

I don’t squash these feature branches because to do so loses something important: how the feature came about. The master branch should not only show what was developed but also how. This is especially important for refactoring and refurbishment work since the end result can be so drastically different. A merged branch preserves this information.

Additionally, squashing feature branches breaks attribution. When more than one person works on a feature, it’s helpful to know that developer A worked on the CSS and developer B did the model and controller code. If it’s all squashed together, there’s no way to tell who was responsible for a particular change.

There are a few downsides to what I recommend:

  • Reverts are more complicated. Rather, reverting the revert is complicated, because git can get confused about what commits have been applied to the repo already.
  • You said git bisect is more difficult, but I’m not sure I buy this. If the commit before a merge is good, and the merge commit is bad, then it’s the feature branch that broke it. Also, ideally, each commit in a branch is self-contained in that the tests all pass, so it’s easier to track down what broke.

When following a process closer to GitHub’s, using pull requests and that ever-so-convenient big green “Merge pull request” button, it’s good to keep in mind that they have other affordances helping the PR-only workflow. Most importantly, they have a CI system that runs the full suite on each feature branch. We don’t have that convenience, so we have more work to do to verify things before merging (or squashing and cherry-picking, as the case may be).

As a footnote, I consider a note about git commit messages to be required reading, regardless of whatever else you do with your repositories.

Further reading:

Update, February 23, 2013

This article sparked several good conversations. I’ve since been convinced that rebasing to master is not necessary–in fact, it’s unhelpful when that feature branch is public–and that it’s just fine to merge the master branch into a feature branch as needed. In short, “know your tools” and put them to use.

Mislav Marohnić explains when rebase is useful, namely, before pushing to a public branch, as well as reiterating the benefits of merge --no-ff. He also includes several other helpful tips and explanations.

I also learned that not everyone uses a graphical tool or the command line to view the commit tree. For reference, that’s git log --graph. Also try git log --graph --decorate --all to see more information. Being able to visualize a repository’s history in this way helps a great deal in understanding what’s in it. I have a couple aliases for pretty graph output in my dotfiles.