Friday, June 26, 2015

Migrating sources to GitHub

I've been asked multiple times why the JBossWS project sources were still hosted on a Subversion repository. I've also had to put up with the complaints on time-consuming checkouts and even a bit of mockery from someone for not having migrated to Git yet...
Anyway, last week I had some quiet days, got the inspiration and started the migration... so now the JBossWS sources are eventually hosted on GitHub :-)

I've started by creating a jbossws organization on GitHub. Since the JBossWS project is actually a collection of multiple components each having its own lifecycle, I decided to create a repository for each of them into the organization.
A proper migration requires importing the whole svn repository history, of course; the easiest approach to achieve that is to rely on the GitHub importer. The tool worked fine for me with the smallest repositories (for instance the jbossws-api and jbossws-spi ones), even if it took something like 2 hours for each import (but it's nice that you can let it run on background and be notified by email when the process is completed). Unfortunately, when letting the tool process big sections of the JBossWS Subversion repository (like the jbossws-cxf stack integration), weird import errors were eventually reported, so I had to figure out another way to perform the import.

The alternative approach that worked is based on git-svn. The first step is to build up a comprehensive author mapping file linking svn commit authors to GitHub users. I used the following bash command:

> svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > authors-transform.txt

Then I created a local Git repository from the Subversion sources with the following command:

> git svn clone --stdlayout --no-metadata -A authors-transform.txt http://anonsvn.jboss.org/repos/jbossws/stack/cxf /tmp/rep

The process still takes multiple hours and fails if a not-mapped user commit is found, but when it's completed you have a copy of all sources in a local Git repository that is almost ready to push. Almost... as I still had to deal with tags because they're fetched the same as branches by the command above.
I moved to the /tmp/rep directory and started by adding a remote repository link and pushing the master:

> git remote add origin https://github.com/jbossws/jbossws-cxf.git
> git push -u origin master

Then you'd have to manually create all the tags, but that's clearly unpractical if you have hundreds of them, so I googled a bit and ended up using the following two commands to generate the actual commands for pushing branches and tags respectively to the remote repository:

> printf "git push origin "; git show-ref | grep refs/remotes | grep -v '@' | grep -v remotes/tags | perl -ne 'print "refs/remotes/$1:refs/heads/$1 " if m!refs/remotes/(.*)!'; echo

> printf "git push origin "; git show-ref | grep refs/remotes/tags | grep -v '@' | perl -ne 'print "refs/remotes/tags/$1:refs/tags/$1 " if m!refs/remotes/tags/(.*)!'; echo

I only had to clean up the output of the first command a bit as the trunk branch was clearly not to be pushed (it's the master pushed previously) and I actually did not want to push some stale branches.
After having iterated the process above for all JBossWS components that failed the automatic import process, I finally had all the sources there at GitHub.

The final steps were to disable the issue tracker & wiki on GitHub (we already use JIRA and have an equivalent to the wiki at jboss.org) and to invite proper users to join the Owner group for the organization as well as other specific groups that were needed.

A day was later spent on updating the continuous integration build environment and the project home page to point to the new repositories... but that's not that interesting to be described in details here ;-)

Enjoy the new repos, fork JBossWS and feel free to submit pull requests with your patches!