Monday, April 4, 2011

Git, Github and committing to ASF svn

There has been a discussion around ASF what the best practices of git/github/ASF svn might be. I am ready to offer my own variation.

It's been some time since ASF (Apache Software Foundation) enabled git mirrors of its SVN repository. Github has been mirroring those for some time.

E.g. Mahout project has apache git url git://git.apache.org/mahout.git which in turn is mirrored in Github as git@github.apache.org:apache/mahout.git. Being a Mahout committer myself, I will use it further as an example.

Consequently, it is possible to use Github's collaboration framework to collaborate on individual project issues, accept additions from contributors, merging/branching even more etc. In other words, enable individual jira issue to have its own commit history without really exposing this history to the final project. Another benefit is to have all the power of 3- or whatever-way rebases and merges git provides instead of having a single patch which often gets out of sync with the trunk.

And of course, ability of git to be a 'distributed' system. Your own copy is always your own play yard as well with full power of ... and so on and so forth.

So.. how do we sync apache svn with git branch?

We use git's paradigm of upstream branch, of course, with svn trunk being an upstream. Sort of.

Of course, svn doesn't support upstream tracking, not directly anyway.

First, let's clone the project and set up svn 'upstream' :
git clone git://git.apache.org/mahout.git mahout-svn
git svn init --prefix=origin/ --tags=tags --trunk=trunk --branches=branches https://svn.apache.org/repos/asf/mahout
git svn rebase

I used to run 'git svn clone...' here, but as this wonderful document points out, you can save a lot of time by cloning apache git instead of svn.

Also don't forget to install the authors file from http://git.apache.org/authors.txt by running

git config svn.authorsfile ".git/authors.txt"

assuming that's the path where you put it. Otherwise, your local git commit metadata will be different from that of created in git.apache.org and as a result, that commit's hash will also be different. "svn rebase" would reconcile it though, it seems.

We could also use 'fork' capability in github to create our repository clone, i think. It would be yet even more faster. Now that i have already a cloned repository, I cannot clone yet one more since Github doesn't seem to allow to have more than 1 fork of another repository. Oh well...

So, we are now in our 'holy cow', trunk branch. We will not work here but only use it for commits. It is also recommended to configure git-svn with svn 'authors' file per that info in apache wiki above, for committers.

Now, say we want to create a branch in Github repository, git@github.com:dlyubimov/mahout-commits which we pre-created using Github tools.

git remote add github git@github.com:dlyubimov/mahout-commits
git checkout trunk
git checkout -b MAHOUT-622  
git push -u github 

1-- creates remote corresponding to github repository, in local repository.
2-- make sure we are on local branch 'trunk'
3-- creates local branch MAHOUT-622 based on current (trunk) branch and switches to it
4-- pushes (in this case, also creates) current local MAHOUT-622 branch to github/MAHOUT-622 and also sets up upstream tracking for it. This would take significantly less time if we used Github's 'fork' as mentioned above.

Now we can create some commit history for github/MAHOUT-622 using Github collaboration (pull requests, etc.) So we are reasonably satisfied with the branch and want to commit it to trunk. Here is what we do :
git checkout MAHOUT-622
git pull
git checkout trunk
git svn rebase
git merge MAHOUT-622 --squash
git commit -m "MAHOUT-622: cleaning up dependencies"
git log -p -1 
git svn dcommit

1-- switch to MAHOUT-622
2-- fast-forward to latest changes in github remote branch
3-- switch to trunk
4-- pull latest changes from svn (just in case)
5-- merge all changes from MAHOUT-622 onto trunk tree without committing. At this point, if any merge conflicts are reported, resolve them (perhaps using 'git mergetool') if necessary;
6-- format future single svn commit for MAHOUT-622 issue as a single commit
7-- optionally check the patch of what we just changed before pushing it to svn. Also, since we just merged it here, i.e. potentially added some more changes to the original patch, do all the due diligence here: check for build, tests passing, etc. whatever else committers do.
8-- push commit to svn. European users are recommended to use --no-rebase option while doing this and then explicitly rebase several seconds later.

Alternatively, in step 5 we can merge directly from remote github branch as:
git checkout trunk
git svn rebase
git fetch github
git merge github/MAHOUT-622 --squash
git commit -m "MAHOUT-622: cleaning up dependencies"
git log -p -1 
git svn dcommit

Note that in this case, instead of pulling, we need to run 'git fetch github' in order to update remote commit tree in the local reference repo.

If we wanted to publish a patch into JIRA so that others could review it, we could do it by running 'git diff trunk MAHOUT-622' or just 'git diff trunk' if we are on MAHOUT-622 already.

Yet another use case is when we want to merge upstream svn changes into the issue we are working on :

git checkout trunk
git svn rebase
git checkout MAHOUT-622
git pull 
git merge trunk
git push
1-- switch to the 'holy cow'
2-- pull latest from 'upstream' ASF svn
3-- switch to MAHOUT-622
4-- pull latest remote collaborations to local MAHOUT-622
5-- merge trunk onto issue branch. If conflicts are reported, resolve them (perhaps using 'git mergetool');
6-- push updates to the remote github issue branch. Since we set up the issue branch to track github's branch, we can just use the default form of 'git push'.



No comments:

Post a Comment