Projects/MoveToGit/UsingSvn2Git
This page documents how to go about getting a KDE module ready for the Great Git Migration of 2010.
Getting the tools
The necessary tools are hosted at http://www.gitorious.org/svn2git. To get started do:
git clone git://gitorious.org/svn2git/svn2git.git
git clone git://gitorious.org/svn2git/kde-ruleset.git
This will get you the source code to build svn2git and the KDE ruleset files as they currently exist. Build the svn2git tool before moving on to the next step.
Building svn2git
Make sure you have Qt4 installed, then simply issue qmake && make to build the executable called "svn-all-fast-export"
How rulesets work
The format for the svn2git rules is pretty simple. First and foremost you have to declare some repositories:
create repository kdelibs end repository
This tells svn2git that it should create a git repository called "kdelibs" that we can later on use to put commits into it.
The rest of the file are rules matching specific paths in Subversion, each rule specifies what to do with the commits that appeared at the given path. The possible actions are ignoring them or adding them to a particular branch in a particular repository. Note: Ignoring is done by simply leaving out the information about the repository and the branch.
As examples are more explanatory, the following rule would put all commits from 123453 to 456789 from the path /trunk/KDE/kdelibs into the master branch of the kdelibs repository:
match /trunk/KDE/kdelibs/ min revision 123453 max revision 456789 repository kdelibs branch master end match
The min and max revision are useful in cases where the same path in SVN contains code for different branches. An example would be KDevelop3, where KDevelop 3.3 was shipped with KDE 3.5 until 3.5.7, 3.5.8 contained KDevelop 3.4 and 3.5.9 contained KDevelop 3.5 and all of those kdevelop versions are now under /branches/KDE/3.5/kdevelop.
The two revision parameters are however not mandatory, if they're left out, then all commits that went to the given matching path in SVN are taken over into the specified branch.
To generate tags with git you use a special format for the branch parameter: refs/tag/<tagname>. So to put all commits from /tags/KDE/4.4.0/kdelibs into the v4.4.0 tag in the kdelibs git repository the rule would be like this:
match /tags/KDE/4.4.0/kdelibs/ repository kdelibs branch refs/tags/v4.4.0 end match
For more examples see the svn2git/samples/ directory and the rules in the kde-ruleset repository.
TODO: Add some information how the recurse-stuff works.
Important Details
- All matching rules need to end with a '/', else the tool will crash at some point. This is a known bug. The only exception are the rules using the recurse-action.
- The rules form an ordered list that the tool goes through while matching the changed paths for each commit. So if two rules match the same path and neither of the two has more matching criteria, then the rule that is written further up in the file wins. This is useful to exclude certain commits from the extraction process, if you look at the existing kde ruleset you'll notice that at the top some revisions are ignored.
Setting up your system
You will need ~60GB (is that correct?) of disk space to get started, as the process requires a copy of the KDE svn database. There is a script that will download this for you (and which can be used to update it periodically using rsync) in kde-ruleset/bin/startSync. By default the startSync script runs rsync in "dry run" mode, so before using it to actually get the svn database edit the startSync script and remove the -n from both rsync lines.
more stuff goes here ...
Step-by-Step on writing rules for a module
- Check wether there are already rules for this module in the kde-ruleset repository. If there are no rules yet create a new rules file and add the create repository part.
- Execute svn log -v --stop-on-copy file:///path/to/kde_svn/trunk/KDE/module. This will give you a history of the given module in trunk, it'll stop on the first commit that copied the code from somewhere else. The verbose output will allow you to see where this copy came from.
- Write a rule for the path /trunk/KDE/module that puts all commits into the repository for module under the branch "master":
match /trunk/KDE/module/ repository module branch master end match
- If the module was copied from somewhere you'll see a "from: /some/other/path:<revision>" in the commit log. We can use that to follow the history back to that place with svn log -v --stop-on-copy file:///path/to/kde_svn/some/other/path@revision. The @revision is important as the original path usually doesn't exist anymore.
- Write another rule for /some/other/path taking all commits into the master branch
- rinse and repeat until you've found the first commit that initially added the module
- Next step are the branches, those are usually in /branches/KDE/x.y/module. It works the same way as with the master branch, just the resulting rule is slightly different:
match /branches/KDE/4.4/module repository module branch 4.4 end match
This time you only need to follow the branch to the point where it was copied from trunk/. There might be additional work-branches scattered over the /branches/ subversion directory.
- Last but not least the tags, usually to be found in tags/KDE/x.y.z/module.
- Now run the svn2git tool with your rules file:
svn-all-fast-export --identity-map kde-ruleset/account-map --rules yourrulesfile kde_svn
Where kde_svn is the svn database on your disk. This will take a while.
- Once its done you should have a new "module" git repository in your current working directory. Now to check wether its sane (you might have to install gitk as some distro's split it out of the git core package):
gitk --all
This gives you a nice graphical view on the history of the module that you just imported.
You can now scroll through the history to check wether things have been imported correctly. Things to look out for are:
- Branches that start nowhere. That is the oldest commit of the branch has no parent commit associated. This means that svn2git wasn't able to find out where this branch was created from. Usual causes are a missing match rule for the path where the branch originated from or some mismatching rules or the need for recurse action.
- Tags that belong to a commit with no parent, again this means that a rule might be missing or that the recurse action is needed. This happens quite a bit for the really old tags
- Tags that are atttached to a "branch", this happens if the directory for the tag (i.e. /tags/KDE/4.4.0/kdelibs) has not only the copy-commit, but also additional commits on top of it. This happens quite often for the core module when compile errors are found after the tagging. Its not very nice, but unfortunately not fixable.
- Other than the above special cases the history should be complete, i.e. the master branch should start at the top with the most-recent commit and end at the bottom with the oldest commit. All other branches should either start from the master branch directly or indirectly from another branch.
- If you find any problems and can't see a way to fix them easily, join the kde-git team on irc.freenode.org in the #kde-git channel or send a mail to the kde-scm-interest mailinglist.