Darwinweb

The Case for Git Rebase

April 25, 2010     

When you first sit down with git they tell you to watch out for rebase. “Git is fast. Git is great. Git gets merging right. If you screw up git reflog has your back. But watch out for rebase, as long as you avoid rebase you’ll never get in too far over your head.

Okay, I can see the wisdom in that.

In fact for the first year I avoided rebase almost entirely. I read Rebase Considered Harmful early on and it reaffirmed my choice. But as I came to understand git’s internals, and as new descriptions of rebase came online, the distinct feeling that I was missing something started creeping in.

Ironically, what pushed me to really grok rebase was the need to perform a surgical 3-way rebase for a series of commits that were drastically misapplied. Once I understood how to do that rebase was layed clear to me. Rebase is at its essence simply takes a series of commits as if they were individual patches, and applies them at a different point in history. The confusion and opacity of rebase comes in large part from the fact that the range of commits and “base” commits are determined somewhat magically by the branch names that are passed to git-rebase.

Once I understood rebase, I started using it more, but I still held back from suggesting the use of rebase to the rest of my team; the clean history was nice but I didn’t see a compelling enough advantage to become an advocate for a rebase-based workflow.

Things have changed.

A private team workflow

Git was built to manage Linux kernel development, so it’s no surprise that discussion of rebase tends to be focused on open source workflows. In open source once you push a commit to a public repo, you don’t know who else has it, and rebasing public commits will lead to dangerous cascading effects. In a private repo it’s still a good rule of thumb not to rebase pushed commits, but with small teams you can bend the rules just a hair to optimize the history linearity.

The advice I’m proposing to my team is to:

  1. Always use git pull --rebase
  2. Rebase topic branches against master (or the current base branch) before pushing for the first time
  3. Rebase topic branches just before merging and deleting them (and let other people know the branch is officially dead so they don’t keep committing to their local copy)

Why go through the trouble of all this rebasing? Won’t we be losing history? Well yes, rebase vs merge is always a tradeoff. For a long time I thought it was basically a wash: readability in exchange for precise history. However as I came to understand the tradeoffs things kept shifting towards rebase.

Readability

If you are following good agile practices and keeping your stories really short, ideally your topic branches are short and sweet and they all get created and merged within a day or two, right? In that case a few merge commits are not really a burden, and you can easily parse out the exact history of who did what, when. Right, but in the real world some branches end up sticking around longer for various reasons. It doesn’t take long to reach a threshold where the full history becomes unparseable by the human brain. Here is a recent example… this is with just 4 developers!

Once you reach that point you’ve lost the benefit of having full history, and all those merge commits are just useless noise. And it gets worse.

Bug Locality

All else being equal, the readability is not enough to tip me in favor of rebase. However when it comes to debugging, a linear rebased history is your friend as well.

Often times the combination of topic branches result in a conflict. Merging seems simpler because you resolve all the conflicts at once. With rebase you have to fix each conflict as it occurs commit-by-commit. However with each individual commit it’s easier to resolve because the commit is (hopefully) more focused than the entire branch, so the resolution is done in the same context as the original commit was. You think to yourself, “what was the purpose of this commit, and how should it be different given the wider changes that occurred on the base branch?”

With either workflow you have the possibility of bugs. Either because you flubbed the merge or, worse, because of some subtle interaction that you may not discover until much later. This is where bug locality comes in.

If you’re using git-bisect months after the fact, what commit will appear to have caused the bug? In the case of merge it’s going to be a huge commit combining two branches with many changes. This is fairly likely to be completely useless. However, if you have always rebased and have a perfectly linear history, you will always be able to trace it back to a single logical commit. This is the kind of thing that is hard to appreciate until you’ve actually seen git-bisect turn up useless a few times.

Okay, so I admit it’s not always feasible (or even worth it) to maintain a perfectly linear history. A few merge commits here and there aren’t going to hurt anybody. This is one reason I kept my rebasing to myself for a long time. However if you rebase at all then there is a third downside with merging.

Merging is Viral

If you merge all the time, you can find yourself in situations where you’d like to rebase but can’t for practical reasons. This isn’t really a weakness in git so much as the fact that rebasing is easier the closer it’s done to the actual commits. Rebasing your own commits a-la git pull --rebase is more or less the same difficulty as merging (most of the time).

However if you go back to rebase a sequence that has a bunch of merge commits in it, git-rebase will not be able to make use of any conflict resolution done in those merges. This is because the individual commits are replayed one by one in temporal order, which means conflicts that were resolved in later merges have to be re-resolved piece by piece.

Consider rebasing a long-lived topic branch back to master:

On the left you have a topic branch worked on by two people who were regularly merging. On the right you see the master branch which had it’s own line of development going on simultaneously. Now when it comes time to merge this branch down to master, you want to rebase and then delete it. The only problem is that as each commit is replayed, you hit every conflict that originally occurred and was resolved in those merge commits, except now these changes are potentially ancient history, and even if you were one to original do the merge you may not clearly remember the context of each individual commit.

If the two developers had done git pull --rebase every time, they would have resolved conflicts locally so that the later rebase to master would not have any old conflicts to resolve. In this case the conflicts were gnarly enough that rebasing was not practical. Once that happens then rebasing becomes impossible for any branch containing this sequence.

Of course eventually you expect to merge everything back to master and you get a clean slate, but the point is that little merges require bigger and bigger merges as a topic branch grows. Since you don’t necessarily know the life cycle of a topic branch when you start, keeping history clean is a smart hedge.

Conclusion

It took a couple years of daily git use on private projects but I’ve now come to believe that the benefits of a clean linear history outweigh the benefits of a perfect historical record. git-rebase maintains the commit dates also, so you can infer a good deal about the original history. An original history may give me a clue about what a developer was thinking at the time, but this is not necessarily of greater benefit than knowing in a clear order what changes were applied to the codebase. In the end a more powerful git-bisect is the trump card that puts me firmly in the camp of rebase, at least for private projects.

James Ferguson says…
December 1, 2010 at 8:53AM

Gabe,

You make some interesting points but I’d be interested to hear your take on gitguru’s approach:

http://gitguru.com/2009/02/03/rebase-v-merge-in-git/

Basically, he says rebase when bring in changes from parent branches (and presumably origin), but merge when pushing changes back to parent branches.

Also, it’d really help to understand this whole debate if you could give an example of one of these ‘huge’ commits. Do you mean all the code to resolve merge conflicts?

James

Wade says…
December 26, 2011 at 7:11AM

I agree with James. My strategy is to rebase onto my feature branches, but merge when going from my feature back to a mainline branch. While the history is certainly not as linear, that’s not an issue since I want to see the merge commits so I can know the source for all the new code in my mainline branch.