2008-07-23 00:04:25

Subversion — tales from a failed migration

Today our new programmer complained very loudly about that shitty CVS never showing him all the information he was looking for. He asserted that the best alternative was to use Subversion. Since he preferred not to work with CVS, we decided that instead of adding another animal to our version control zoo (currently featuring git as a legacy from older projects, CVS for revision keeping and Bazaar for projects with release versions and support periods), we would try to eliminate one.

Little did we know that this would have to be Subversion itself.

Data migration

First we migrated various CVS repositories to Subversion. The only automatted way to do this was cvs2svn (apart from the apparently-defunct Tailor) so we went for it. As it turned out, cvs2svn can only convert one project at a time, not an entire repository, so we ended up with different repositories for source, docs and other things for the same project.

As it turned out, cvs2svn could not convert all of our projects because some had files which were added, then later removed and re-added. This caused spontaneous explosions of the poor little tool, leaving our repository entirely unconverted. So we tried to migrate at least a subset of our repositories, ending up with a big bunch of partial-project repositories.

Commit mails

The first, and supposedly rather simple, hook to add were commit mails. Debian offered a package to achieve this, called svnmailer. This was basically the svn-mailer script from the Subversion site, which is Python code and has a bunch of dependencies, so we were glad we had the Debian package.

However, the script of course doesn't provide automatic activation for the various Subversion repositories cvs2svn created, so we had to add the hook and config files one by one (it seems that it also doesn't offer a sophisticated way to extract the repository name, so we couldn't use a generic mailer.conf).

After some debugging we got a mailer.conf to work which would send mail correctly, so we created a test repository and let it send some test mails via the commit hook. This turned out badly. Despite the fact that we had specified --debug in the commit hook, the script did not output anything and silently failed to send mail. Thus, we prefixed the entry in the post-commit hook with “strace -o /tmp/strace.out» (attention, symlink vulnerability!) and then suddenly it worked.

We then tried random stuff and found that not invoking sendmail directly but instead using the builtin SMTP client of the script would help, so we left it at that. Couple of hundreds of config files later, all repositories sent out commit mails to the relevant changes mailing lists.

However, it turned out quickly that the mails were, for some reason, double encoded to quoted-printable. Every “=” was replaced with a “=3D3D”, leaving an extraneous 3D in the output. Setting the config file parameter to «8bit» yielded singly encoded quoted-printable. Whatever, it was readable.

Web interface

The next step was the web interface. Everyone else, including our unfinished inhouse Open Source platform, used ViewVC, so we wanted to go for it too. However, it turned out that Debian did not ship ViewVC in the stable distribution, nor in a backport.

So we installed WebSVN, which I only recently patched in a pkgsrc-security hackathlon, and kept our fingers crossed. It looked terrible at first glance, required PHP to work and came with several charset problems (some of the umlauts on the site were ISO-8859-1 while some others were UTF-8). Additionally, it required a flat hierarchy of repositories in order to auto-discover them, so we had to put all repositories directly into /home/svn.

Then we went on to integrate the web interface into the commit mails as we were used to from CVS and Bazaar. After modifying a hundred repositories' mailer.conf files, it finally worked as well.

Permissions

Then we started to notice that the repository permissions were changed on every commit. We tried to set the SGID bit on the repositories as we were used to from CVS, but that didn't work, Subversion insisted on resetting the group ownership every time. As a solution, we put a chgrp -R cvs $REPO into the post-commit hook, but this error'd of course for everyone who was not the owner of the files, so we replaced it with chgrp -Rf cvs $REPO, which worked silently.

RT and Bugzilla integration

At this point things started to get down the slope. We also had to integrate our RT and Bugzilla scripts with Subversion. These scripts were meant to attach short versions of the commit mails (without diffs but with URLs and commands used to create diffs) to tickets and bugs mentioned in the commit messages.

This, however, required us to retrieve the commit message and all changes manually, using the svn commands. A lot of wild svn commands. However, before we finished this, another problem stepped in our way.

However, building full revision patches was rather easy; the CVS commit ID would simply have to be replaced with the Subversion revision ID.

User defined keywords

We also noticed that our merge scripts weren't working properly anymore across repositories. Of course, since we hadn't defined the custom keywords anymore which we used for the repositories. They basically work like this:

  • The Repository A uses the keyword $RepoA$ which is expanded locally like $Id$. They have a file ciss_pci.c with revision 1.52, containing the keyword
    $RepoA: src/sys/dev/pci/ciss_pci.c,v 1.52 2008-05-29 18:09:31 tonnerre Exp $
  • Project B imports the file into their code base, which is stored in RepoB with keyword $RepoB$, and customizes it slightly to work with their codebase. The file contains the keywords:
    $RepoA: src/sys/dev/pci/ciss_pci.c,v 1.52 2008-05-29 18:09:31 tonnerre Exp $
    $RepoB: src/sys/dev/pci/ciss_pci.c,v 1.2 2008-07-01 14:23:03 jmcneill Exp $
  • Project A updates their file with several patches to revision 1.58. It now contains the keywords:
    $RepoA: src/sys/dev/pci/ciss_pci.c,v 1.58 2008-07-04 22:54:41 tonnerre Exp $
  • Project B uses an apply script to automatically merge the changes between the Project A revisions 1.52 and 1.58 into their script, without having to look at the file except to clean up the merge conflicts.

Unfortunately, Subversion only supports a predefined set of keywords to be expanded, only one of which exports to the Id. Thus, it is impossible to synchronize files between different repositories. This makes it basically impossible to use Subversion for a lot of our projects which share code. This actually means most of our projects.

Vendor branches

Subversion also has a very bizarre notion of branches. Actually, there are no branches. There are only subdirectories in the branches directory (if you want to call it like that). Branches are created by copying files around directories. Relations between the branches basically don't exist.

What also doesn't exist are vendor branches. This makes it very hard to track projects coming from different vendors entirely, while applying local patches.

When using vendor branches, new versions of the original branch are always checked into the same vendor branch. Changes between the old vendor branch and the HEAD revision are created and merged into the new vendor branch, saved as the new HEAD revision and collisions are kept around to be fixed manually.

Subversion requires users to re-import the source into a new branch, diff against the branch and to apply the changes from the vendor into the branch with the user changes, rather than the other way around. This usually is tedious and requires a lot more work, especially in undoing the work in changes which fix the same problem (Rather than to just remove the change, one has to unapply the original modification and then apply the change to it again).

More than that, Subversion doesn't allow files from different revision to coexist; it isn't possible to have an older revision of specific files in a checkout. CVS allows this due to the fact that it tracks revisions per file. When debugging interactions between different changes, it may however sometimes be desirable to downgrade specific files to older revisions in order to exclude a change in chasing a bug. Subversion makes this unnecessarily hard.

Conclusions

So summing it up, the work involved in migrating from CVS to Subversion was rather exorbitant due to the different behavior in various places. Subversion focusses on full-tree changes to the point where it becomes hard to use otherwise. It also pushes the principle of simplicity ad absurdum, removing various very helpful features which one would expect from a version control system.

It also appears that Subversion focusses on projects which only ever use one single tree and don't share code with other projects, where everything is reimplemented. This may be on par with the vast majority of GNU projects, but doesn't sound adequate for the BSD world, where code is shared vividly between most projects. A good example for this is the Korn shell. The project doesn't have a web site or download location, but its code is shared between various source trees and keeps developing. As an example, a patch for UTF-8 support is nowadays part of “the tree» – if we can speak of one.

Under the aforementioned circumstances it seems to be fair to say that Subversion has its place in the world of tiny projects but stops working when many people are working together on code, or when code is shared publically. SVK – another product from Best Practical – tries to work around these limitations, but various problems remain unsolved.

Subversion still has a very long way to go before it can be on par with CVS and other similar version control systems. However, considering that some of the problems lie deep within the very design of Subversion, it is highly questionable whether this will ever happen. So we can only wait what the future will bring. All we can say so far is that right now it was not possible to migrate our projects to Subversion, and that we went back to CVS.

2008-08-17 20:41: I received a mail from the cvs2svn author stating that his tool can deal very well with the mentioned situation. His tool at least told me this was not the case, but I will re-verify and post the exact error message here if appropriate.

2008-09-14 17:44: The output is, along with a response to many more comments, in More notes on Subversion, a new article.


Posted by Tonnerre Lombard | Permanent link | File under: programming