Tuesday Apr 08, 2008

Why I Don't Like Subversion

Jeff Atwood recently posted an article Setting up Subversion on Windows in his Coding Horror blog. I'm not very interested in setting up Subversion on Windows, but I am interested in a statement Jeff made toward the end of the article. This sparked quite a discussion in the comments. Jeff wrote:

I find Subversion to be an excellent, modern source control system. Any minor deficiencies it has (and there are a few, to be clear) are more than made up by its ubiquity, relative simplicity, and robust community support. In the interests of equal time, however, I should mention that some influential developers -- most notably Linus Torvalds -- hate Subversion and view it as an actual evil.

Unfortunately, in his talk at Google, Linus didn't really explain why he thought Subversion is evil. He said that CVS is the "devil" and, since svn is CVS done right, it's "pointless". So I guess it's safe to say that he believes svn is evil. (He further said that anyone who disagrees with him is "stupid and ugly" but that's just his rhetorical style, if you can call it that.)

Subversion was designed to replace CVS, and I think it does that quite well. While I don't think svn is evil, I do think it has some major deficiencies and some characteristics that make it unnecessarily complex, hard to use, and error-prone. So, while I can't speak for Linus, I think I know what he means when he criticizes svn. I won't go as far as Linus and say that using svn is worse than using no version control system (VCS) at all. If I had nothing else, I'd probably use svn. But fortunately I do have something else: I currently use Mercurial. But this article isn't about Mercurial, it's about svn.

Here's a catalog of what I think are Subversion's main problems, and why I'm much happier using Mercurial than I was when I was using Subversion.

1) Centralized vs. Distributed

There has been a lot of discussion about centralized vs. distributed version control systems (DVCS) so I won't repeat it here. Probably the best overview of DVCS is a paper [pdf] that Ollivier Robert presented at EuroBSDCon 2005.

Bill de hÓra also wrote about the benefits of DVCS in a response to Jeff's article.

For me, the biggest advantage of DVCS is that changes can be propagated among repos without passing through a central repository, while preserving changeset history. This creates more ways for developers to collaborate.

I should also clarify that the word "distributed" in this context doesn't imply "distributed" over a network or a "distributed" development team (i.e. a geographically dispersed team). A more descriptive term is "decentralized". Subversion has a network protocol but it isn't a DVCS. A DVCS is still useful for a team that works in the same office every day.

2) Safety During Merges

In svn, if you're committing to the trunk, you're required to merge and resolve conflicts before you can commit. The problem is that your changes aren't stored anywhere, so if you screw up during the merge, you might lose your changes.

A more complicated scenario is that you and a colleague might decide to collaborate on a feature. How do you combine your work? You could get your colleague to mail you a patch file, which you'd apply to your working copy. This is essentially doing a merge operation by hand, without the support of the VCS. If the patch file doesn't apply cleanly, you now have a working copy with your changes (possibly modified) and part of your colleague's changes, and you have to merge the rejected hunks by hand. Or worse, the patch might apply cleanly but some part of your code that used to work is now broken. Now you have to figure out what changed, without the benefit of your or your colleague's original versions.

How can you deal with problems like these? Well, you could undo the patch with patch -R. Or, you could take a snapshot of your entire working copy before starting the merge. You could also snapshot your diffs with svn diff. To roll back to your original version, you could svn revert your working copy and then apply the patch file you had generated with svn diff. Or, you could check out another working copy, apply your colleague's patches, and then cherry-pick them into your working copy. This is all doable, but you have to remember to do it, and it's a lot of manual work for you to do without the support of the VCS.

It's possible to do this in svn if you're willing to create new branches for individual developers. For the first scenario, you'd use svn copy to create a branch of the particular rev of the trunk you started with, then use svn switch to switch your working copy over to this new branch. Next, you'd commit your changes there, and then merge this branch onto the trunk. For the second scenario you'd create two branches, one for each developer. Each developer would switch his or her working copy to the respective branch, then merge one branch into the other (or onto a third branch), and finally merge the result onto the trunk.

While it's certainly possible to use this technique in svn, I don't know if anybody actually does. Maybe some expert users do. Even though branching is very lightweight in the repository, it seems to carry a pretty heavy conceptual overhead. Many developers I've talked to consider branching and merging to be a big deal. They also consider svn switch to be deep voodoo. When I was using svn, I didn't use branches, and I did my merges directly in my working copy. Personally I found that this added a lot of stress to the merging process.

What you really want is to be able to commit changesets and be able to pull in new changes without fear and to pass changes around and merge them at will. At any time you should be able to look at your original changeset, or your colleague's, and use this information to assist with the merge. Furthermore, if bugs managed to creep into the merged result, you should always be able go back into the history and look at one or the other changeset as they existed before the merge. (You can do all of these with Mercurial.)

3) History of Merges

Suppose you're working on a branch in svn, and now it's time to merge your changes back to the trunk. You have to merge the right range of revisions from your branch onto the trunk. It's fairly easy to find out the starting rev of your branch by using svn log --stop-on-copy.

If you don't specify the right revs, it's possible to miss some of the changes you made on the branch. If you fail to specify any revs at all, you'll merge your changes onto the trunk but undo changes that were made on the trunk after you branched. There's no warning when you do this, so you have to inspect the merged result carefully to ensure that it's correct.

If you're working on a branch for a long time, you might want to make sure that you don't diverge too far from the trunk. So, you merge over the changes from the trunk that have occurred since you created the branch. If your branch is long-lived you might want to do this a second time. When you do, you have to specify the range of revs to merge, starting from your previous merge instead of from the beginning of the branch. If you specify the beginning of the branch, you'll end up merging changes from the trunk that are already present, which will probably result in conflicts. Similarly, when you merge back to the trunk, you have to specify the rev range starting from when you last merged. If you don't, the merge will be wrong.

The point here is that svn requires you to keep track of revs at which you did merges and to specify them correctly in the merge command. It's also wise to inspect the merge results very carefully, since svn will silently create incorrect merges if you botch the merge command. A VCS should really keep track of what changesets you've made and which you've merged already instead of making you do this work. (Mercurial does this.)

I believe there is an svn extension that stores this information in properties. But this isn't part of core svn; you have to add and configure this yourself. I hear that a core merge-tracking feature is in the works for a future release of svn, but it's not in any released version as of this writing.

These two issues are probably what Linus is talking about when he says that merging is more important than branching. Merging in svn requires you to do your own bookkeeping, it's possible to lose uncommitted changes unless you do extra work, and it's easy to create mismerges. It's no wonder that people consider branching to be conceptually heavyweight. Creating new branches is easy; it's merging them back in that's the problem.

4) Namespace of Branches

I think branching is a pretty hard concept to begin with, independent of which VCS you're using. Indeed, at the part where Atwood describes setting up the branches and tags directories, he says "none of this means your developers will actually understand branching and merging" and refers to his previous article on branching.

In svn, a branch is a lightweight copy of a subtree from one location in the hierarchy to another. (A tag is just a special case of a branch that isn't intended to be modified. This discussion uses "branching" to refer to both branching and tagging.) Indeed, svn has no direct support for branching: it's just copying. The svn book puts it thus:

The Key Concepts Behind Branches

There are two important lessons that you should remember from this section. First, Subversion has no internal concept of a branch -- it only knows how to make copies. When you copy a directory, the resulting directory is only a "branch" because you attach that meaning to it. You may think of the directory differently, or treat it differently, but to Subversion it's just an ordinary directory that happens to carry some extra historical information. Second, because of this copy mechanism, Subversion's branches exist as normal filesystem directories in the repository. This is different from other version control systems, where branches are typically defined by adding extra-dimensional "labels" to collections of files.

I claim that having svn branches reside in the same namespace as directories of files is actually a misfeature which adds the potential for confusion to the already complex notion of branching.

In a filesystem hierarchy, directories (folders) are used for grouping related files and subdirectories. In svn, directories are also used for branching. The fact that svn treats them all the same is of no help to the user. In fact, you must treat them differently, otherwise things will get totally screwed up.

For example, I suspect that every novice svn user makes the same mistake -- exactly once -- of checking out the root of an svn repo. As evidence of this, the top of the repo browser for Subversion itself contains the following:

NOTE: Chances are pretty good that you don't actually want to checkout this directory. If you're looking for this project's primary branch of development, navigate instead into its trunk/ subdirectory, and follow the checkout instructions there.

So, how do you know whether a directory is a group of related files or a branch? The answer is, you don't; you just have to know. (Well, you can try to find out by running svn log --stop-on-copy but you still have to make some inferences.) There is a pretty strong convention in svn of having a TTB (trunk, tags, branches) structure at some level in the hierarchy. This is a pretty clear indication that these directories are branches (copies) instead of containers of related files. But if you have a repository that doesn't use the TTB structure, or has it in an unconventional location, both people and tools can become quite confused.

For example, I used to work on the phoneME project, which has its TTB structure replicated on a per-component basis a couple levels down the hierarchy. In addition, it has "super-tag" structures named /builds and /releases, which contain copies of components' tags. When we aimed the FishEye repository monitoring tool at the phoneME repository, it buried the svn server, and it reported that there were over 200,000,000 lines of code in the repository! (There are closer to 2,000,000 lines.) The reason was, of course, that FishEye was indexing all the branches, tags, builds, and releases directories as if they were independent files instead of branches (copies). A simple configuration change fixed the problem. However, the point is that FishEye couldn't tell the difference between a directory of files and a branch; it had to be told the difference.

The root cause of these problems is that svn is using the same mechanism -- a directory in a hierarchical namespace -- to mean two different things. This is a clear violation of the rule that similar things should appear similar and different things should appear different. Making different things use the same mechanism might seem like an elegant implementation, but it adds numerous opportunities for confusion and error.

5) Heterogenous Branches and Working Copies

There is also a convention that branches and tags are full copies of the trunk. This way, you can switch a working copy among branches, tags, and the trunk. However, it's possible to create a branch as a copy of a subtree of the trunk (or in fact of anything else). Unless you know that this was done for a particular branch, it's possible to get into a very confusing state. For example, if branches/b1 is a copy of the trunk, you can switch and merge between the trunk and branches/b1 with no problem. But if branches/x is some arbitrary subtree, switching or merging between the trunk and branches/x will do something entirely different. If you switch and you have uncommitted changes, they won't merge into the new branch, but they won't be deleted either; they'll be stranded in a working copy that's partially on the branch and partially on the trunk.

Speaking of which, the ability to have a working copy with different subdirectories at different URLs is very strange. I'm sure some expert svn users have some use for this, but to me it seems like a lot of rope users can use to hang themselves.

6) Random Merge Issues

This one is admittedly a bit nebulous.

We once ran into a case where one developer's commit undid another developer's changes. It was not a case of somebody simply botching a three-way merge. The developer merged changes from a branch to the trunk, and very carefully specified the correct revs to merge (as described above). Yet it reversed some changes that had been made on the trunk. I rechecked his merge commands and as far as I could tell they were correct. I was even able to replicate the phenomenon on a private branch. However, none of us were ever able to figure out why it happened. There was a lot of renaming going on at the time, so it's possible that merging of renames caused the problem.

We were able to recover because the "lost" changes were still present in the history, so we generated patches from the history and applied them by hand. Still, one expects the VCS to handle these cases and not force you to do things by hand. When something like this happens it really reduces one's confidence in the system.

The history might still be visible in a publicly-accessible repository. If anybody is interested in investigating this, let me know and I can track down the details.

Summary

Even though Subversion seems to be "CVS done right" I find that it has some glaring deficiencies. It also embodies some fundamental design choices that make it harder to use and understand and that increase opportunities for errors. For these reasons, I've never been happy with svn. In contrast, I've been much more comfortable and productive with a DVCS such as Mercurial.

About

user12610707

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today