Git: yet another SCM system or a revolutionary model of software development?

I have recently come across the video recording of Linus Torvalds himself arguing a case for Git as the ultimate source code management system. The presentation is done in the usual trademark Linus style (which I happen to like immensely since it seems to be the only way to wake up the audience sufficiently enough to be engaged in a conversation) but the issues he tackles are very poignant and go far beyond the merits of a particular SCM system into a realm of what is the best tool for automating highly decentralized peer-reviewed software development model:
  1. Distributed vs. centralized development.
    Bottom line: centralized SCMs run against the proverbial Bazaar.
  2. Using networks of trust as the key means of conquering complexity of the modern software projects and dealing with compartmentalization of key developers/experts.
    Bottom line: SCM should reflect how humans are wired.
  3. How to keep the pace of development activity at the highest possible level (by making it cheap and easy to experiment and not care about breaking other stuff) yet allow the easiest transition path for the changes that seem to be beneficial for the project.
    Bottom line: branch early, branch often.
  4. Developers vs. gate keepers and why the problems they face are fundamentally different.
    Bottom line: even the best branching is useless without merging
I find his arguments about why CVS is the most braindead SCM ever invented and why Subversion simply denies itself the right to exist by proclaiming that they are the CVS done right quite convincing. Especially so when they are coming from a guy who has a project of ~22000 files to maintain and does about 25 merges per day. His main pain points with CVS/Subversion hit very close to home:
  1. There is no data coherency model to write home about. Which might not be a problem for most projects but certainly is for something as security sensitive as an OS kernel. Basically with CVS (and I think even Subversion) the only way you know that your data got corrupted is when its too late.
  2. Branching is waaay too painful because of things like global namespace for branches (read: constant collisions and things like test_12345 branches), all branches being pushed down every developer's throat, etc.
  3. Subversion patches certain things as far as branches go, but fails miserably as far as merging is concerned.
  4. The tools you use are supposed to make you more productive. Period.
Now, I must admit that as much as he was successful at positioning Git to look as the best tool for the job I am still not convinced on two accounts:
  1. Does git really offer a nice way of structuring complex projects like KDE and such? His suggestion of Super projects (workspaced with pointers to individual Git repositories) might be an interesting one, but it certainly requires some practice and experience to be evaluated properly. [2008 Update: It seems that this is no longer an issue. Git got the infrastructure for supporting Submodules and from what it documented on their TWiki looks like they got it right]
  2. Is the approach Git takes with making it easier to work with projects as a whole at an expense of treating projects as collection of files the right one? See the problem is -- I'm much more of a lone developer than a gatekeeper. So Git might be optimizing for the role I rarely find myself in (and Linus finds all the time)
But regardless of these concerns I highly recommend you watch the presentation yourself -- it is well worth it. Just keep in mind one thing: just before the BitKeeper (something Linus seems to have very fond memories of) Lary used to develop this little project called TeamWare here at Sun. What was TeamWare? Well it was "a distributed source code revision control system... which BitKeeper seems to share a number of design concepts with".

I have fond memories of teamware. Initial feel of bk (not that I played much with it) was certainly that of network aware teamware offspring (tw pretty much assumed you used nfs to reach workspaces).

Have you looked at monotone, btw? I keep hearing it's pretty damn good, but I just don't have time and something to actually apply it to in a non-toy fashion.

Posted by uwe on June 20, 2007 at 09:40 PM PDT #

Unfortunately, no -- I haven't had a chance to look at Monotone. Do you know any existing projects using it as an SCM system?

Posted by Roman Shaposhnik on June 21, 2007 at 01:27 PM PDT #

I'm told that is probably the biggest project that uses monotone.

Posted by uwe on June 24, 2007 at 10:10 AM PDT #

Nice writeup. I liked that presentation as well and ,after giving it a go, was impressed with how simple it was to make a repository publicly available. Of course I had to blog this new Git-ness. Distibuted scm is a good thing.

Posted by David Sterry on June 24, 2007 at 06:23 PM PDT #

As for monotone, gaim (erm.. pidgin) moved to monotone - it's not the largest, but probably the highest profile project using it. Monotone itself is self hosted since 2003 or so (though lots of things changed underneath in the mean time)

Git can actually be considered a "monotone lite": it copied some the concepts (content addressing, full repo copies/distributedness, tree snapshots), but not others, which makes git quite a bit faster.

Among the features not ported are:

  • crypto signatures for everything: every change is signed using a monotone specific rsa key - slows things down, but not as hard as gnupg integration would. it also ensures that the change with your name in the author field actually comes from you (though monotone still lacks some of the mechanisms to make this truly useful)
  • proper renames: git guesstimates them - monotone tried that for a while, then moved to full support because it's _really_ hard to do. but in linux renames seem to be rare, so they probably simply don't need it
  • changesets: monotone tried tree snapshots in the beginning and found some things to be quite hard (eg. renames) and moved to a hybrid model (changesets are used to link revisions, manifests/snapshots still exist internally)
  • multi-mark-merge algorithm: an algorithm devised by monotone and codeville (another SCM) developers that handles lots of corner cases (repeated merge, ...) that earlier algorithms couldn't cope with

Posted by Patrick Georgi on July 26, 2007 at 12:27 AM PDT #

To: Patrick Georgi

Thanks for a nice comparison! I'm not much more interested in trying out monotone.

Posted by Roman Shaposhnik on July 26, 2007 at 03:04 AM PDT #

Linus certainly is a pretty abrasive personality, he's just about funny enough to get away with it in a setting like that but I'm not sure I'd want to work with him for any length of time. Being the inventor of Linux has basically made him the guy with the microphone and it's very hard to have a debate on the merits with somebody who is used to being in that position. I do wonder how he'd fare in more of an egalitarian corporate environment where he wouldn't necessarily be the guy with the microphone.

As for git, it was clearly designed for problems he has to solve as the owner, but not code maintainer, of the Linux trunk.

Posted by Andrew on August 08, 2007 at 08:17 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed



Top Tags
« April 2014