April 16th, 2016

git commit messages

My current thoughts on commit messages.

First, we had change annotations, as descriptions of what changed:

fixed bug in display code

improved caching behavior for edge case

My first objection to these is that commits are not always past tense. In a world of CVS and Subversion, they are: reworking and recommitting things is far too much work, but this is git. They are not just a record of what we did, but they are actual objects that we are going to talk about, they are proposals and often they are speculative. git is an editor.

It doesn’t feel particularly natural to be more descriptive here because we’re basically adding labels to a timeline. If we do get descriptive here, it’ll be as sentence fragments awkwardly broken up into bullet lists at best, and talking more about what we did than why we did it. Let’s talk about them in the present tense:

fixes bug in display code

case where display list is null

improves caching behavior for edge case

sometimes we write the empty entry first

A step in the right direction. Those start looking like objects we are going to talk about. However, they don’t make a lot of sense without context. Commits come with only two pieces of context: their parent commit, and the tree state they refer to.

These messages assume context in a way that leads to spelunking in the history later will not necessarily find. fixes bug implies there was a bug to fix, but not much about it. We still are talking more about history than about what we changed. One has to compare the states before and after, and there’s not a lot of incentive in this format to continue and describe the bug. The context is assumed. In talking about these commits, we’d say things like “this commit deadbeef was the problem”. We don’t really refer to the commit so much as the state it brings, and even then only weakly, in the form of what’s different about that state from previous, not what it is.

We can describe a little more but we’re still describing what we’re doing and not the state of the world.

In a world where we may rebase them, move them around and combine them, something a little more durable needs to happen. Let’s treat commit titles as names.

fix for bug in display code

a replacement handler for case with empty display list causing corruption
of the viewport

improvement in caching behavior for edge case

a check to skip writing empty entries in the cache, preventing the case
where empty entries would be returned instead of a cache miss.

Now the description we’ve left out starts feeling obvious. Now I want to know more about this bug, I want to know more about the fix, and I want to know about this improvement. These are nouns, and we have a lot of language for describing nouns.

These make sense even if rebased, and if we were to read the source code associated with this change, we would find that this describes the code added and removed, not the change from some unknown previous state. We know almost everything about the contents of this commit without having to infer it from context, and discussing it as the actual code becomes much easier. Code reviews can be improved, and we can refer to these commit hashes (or URLs) as objects and refer to them meaningfully later. “This improvement was very good”, or “this improvement introduced a bug”

Now we have objects to talk about, and detail about the state that differentiates it from other states, even without being directly attached to the history. With the need for context reduced, we can now use these commit messages in new contexts without rewording them. We add some tags with some machine-readable semantics: Tools like conventional-changelog-cli can generate change logs for summary to a user and semantic-release can bump version numbers in meaningful ways, dependent on the changes being released. We’ve pushed that decision out to the edges of the system, where all the context for doing it right lives. The result:

fix: bug in display code

a replacement handler for case with empty display list causing corruption
of the viewport.

and

fix: improvement in caching behavior for edge case

a check to skip writing empty entries in the cache, preventing the case
where empty entries would be returned instead of a cache miss.

BREAKING CHANGE: empty cache entries are not saved so negative caching
must be handled in another layer.

And in changelog format:

v2.0.0 (2016-04-16)

fix: bug in display code 886a50c

fix: improvement in caching behavior for edge case 9bce4c5

BREAKING CHANGE

empty cache entries are not saved so negative caching must be handled in another layer.

This is super useful, but I think the context reducing style of commit message is a good prerequisite for actually getting good change logs that make sense.

A side note. I think github’s new squash and merge feature is going to be the perfect place for this style: individual commits are often not quite the right granularity for tagging. The style notes here apply otherwise, but tags I think are most useful on a merge-by-merge basis.

In the absence of squashing, a change to conventional-changelog that only looked at merge commits would be excellent, leaving the small state changes visible for code review, but the merges visible as external changes in the log.