A plan for better source code diffs and merging

I’ve been using Subversion for years, but a few months ago I was really starting to feel the limitations of being able to create and merge branches easily. I’d heard that Git made this very easy indeed, and so I decided to try it.

Anyway, this isn’t yet another “I discovered Git and now I’ve achieved self-actualization” blog post, so to cut a long story short, I now use git for everything (together with the excellent GitHub).

Even though merging is a lot better with Git than Subversion, I’ve still found myself getting into situations where it requires a lot of work to merge a branch back into another branch, and it got me thinking about better ways to do merging.

While I’m no merging expert, it seems that most merging algorithms do it on a line-by-line basis, treating source code as nothing but a list of lines of text.  It got me thinking, what if the merging algorithm understood the structure of the source code it is trying to merge?

So the idea is this:

Provide the merge algorithm with the grammar of the programming language, perhaps in the form of a Bison, Yacc, or JavaCC grammar file

The merge algorithm then uses this to parse the files to be diffed and/or merged into trees, and then the diff and merge are treated as operations on these trees.  These operations may include creating, deleting, or moving nodes or branches, renaming nodes, etc.  There has been quite a bit (pdf) of academic research on this topic, although I haven’t yet found off-the-shelf code that will do what we need.  Still, it shouldn’t be terribly hard to implement.

The beauty of this approach is that the merge algorithm should be far less likely to be confused by formatting changes, and much more likely to correctly identify what has changed.

I can’t think of any reason that such a tool wouldn’t work in the exact same way as existing diff/merge tools from the programmer’s perspective. The tool would automatically select the correct grammar based on the file extension, or fall-back to line-based diffs if the extension is unrecognized (or the file isn’t valid for the selected grammar). Thus, it should be trivial to use this new tool with existing version control systems.

I’d love to have the time to implement this, although regretfully it is at the bottom of a very large pile of “some day” projects.  I think this is an interesting enough idea, and one that would be immediately useful, that if I put it out there someone somewhere might be able to make it a reality.

Any takers? I’ve set up a Google Group for further discussion, please join if interested.

One thought on “A plan for better source code diffs and merging

  1. Pingback: Why doesn’t this exist yet: Syntax-aware merge | Hypergraphia Indulged

Leave a Reply

Your email address will not be published. Required fields are marked *