Migrate to git

Source: Internet
Author: User
Tags perforce version control system git commands
Migrate to git
 

Released on

Http://www.uml.org.cn/pzgl/201108015.asp

If you save the code of a project in another version control system and decide to use git instead, the project must undergo some form of migration. This section describes some of the import scripts in git for common systems and shows how to write Custom Import scripts.

Import

You will learn how to import data from a professional heavyweight version control system-subversion and perforce-because as far as I know, these two users are the main group for conversion (to git, GIT also comes with a high-quality Conversion Tool.

Subversion

After reading the previous section about git SVN, you should be able to easily follow the instructions in git SVN to clone a repository. Then, stop using subversion, push to a new git server and start using it. To keep historical records, it takes only a while to pull data from the Subversion server (it may take a while ).

However, such an import is not perfect, and it takes so much time, it is better to do it right once! The first task is the author's information. In subversion, each submitter has a user name on the host, which is recorded in the submission information. Schacon is displayed in the previous example, such as blame output and git SVN log. To better map this information to the GIT author data, you need a ing from the Subversion user name to the GIT author. Create a file named user.txt and use the following format to represent the ing relationship:

Schacon = Scott Chacon <schacon@geemail.com>
Selse = someo Nelse <selse@geemail.com>

You can use this command to obtain the list of SVN authors:

$ SVN log -- XML | grep author | sort-u | Perl-PE's/.> (.?) <./$1 =/'It will output logs in XML format-you can find the author, create a separate list, and then extract the required information from the XML. (Obviously, This method requires that grep, sort, and Perl. are installed on the host) and redirect the output to the user.txt file. Then you can add the corresponding git user data after each item.

This file can be provided for git SVN to map the author data more accurately. You can also add -- no-Metadata after clone or init to prevent git SVN from containing additional information about those subversions. In this way, the import command is changed:

$ Git-SVN clone http://my-project.googlecode.com/svn \
--Authors-file=users.txt -- no-metadata-s my_project

The Subversion imported under the my_project directory is much cleaner than the original one. The original commit looks like this:

Commit 37efa680e8473b615de980fa935944215428a35a
Author: schacon <schacon @ 4c93b258-373f-11de-be05-5f7a86268029>
Date: Sun May 3 00:12:22 2009 + 0000
Fixed install-go to trunk
Git-svn-ID: https://my-project.googlecode.com/svn/trunk@94 4c93b258-373f-11de-
Be05-5f7a86268029

This is the case now:

Commit 03a8785f44c8ea5cdb0e8834b7c8e6c469be2ff2
Author: Scott Chacon <schacon@geemail.com>
Date: Sun May 3 00:12:22 2009 + 0000
Fixed install-go to trunk

Not only is the author a lot clean, but Git-svn-Id also disappears.

You also need to clear post-import (after import. At the very least, you should clean up the weird index structures created by git SVN. First, you need to move the labels from the strange remote branches to the actual ones, and then move the remaining branches to the local machine.

To change the tag to a suitable git tag, run

$ CP-RF. Git/refs/remotes/tags/*. Git/refs/tags/
$ Rm-RF. Git/refs/remotes/tags

This command turns the index of the remote branch originally started with Tag/into a real (lightweight) Tag.

Next, change the remaining indexes under refs/remotes to local branches:

$ CP-RF. Git/refs/remotes/*. Git/refs/heads/
$ Rm-RF. Git/refs/remotes

Now all old branches become real git branches, and all old labels become real git labels. The last task is to add the new git server as a remote server and push it to it. The following example shows how to add a remote server:

$ Git remote add origin git @ My-Git-server: myrepository. Git to upload all branches and labels, run the following command:

$ Git push Origin -- all the branches and labels should be neatly and cleanly in the new git server.

Perforce

The next system to be imported is perforce. A perforce import script is also included in the GIT release, but it is included in the contrib part of the source code-it is not available as git SVN by default. You must obtain the GIT source code before running it. You can download it at git.kernel.org:

$ Git clone git: // git.kernel.org/pub/scm/git/git.git
$ CD git/contrib/fast-import in this fast-Import

Directory, there should be a python executable script called git-p4. The Python and P4 tools must be installed on the host for normal import. For example, you need to import the jam project from the perforce public code repository (the code storage service officially provided by perforce. To set the client, we need to export the p4port environment variable to the perforce Repository:

$ Export p4port = public.cece.com: 1666 run the git-p4 clone command to import the jam project from the perforce server. we need to provide the path of the repository and project and the destination path of the import:

$ Git-p4 clone // public/JAM/src @ All/opt/p4import
Importing from // public/JAM/src @ All into/opt/p4import
Reinitialized existing git repository in/opt/p4import/. Git/
Import destination: refs/remotes/P4/Master
Importing revision 4409 (100%)

Now run the GIT log in the/opt/p4import directory to see the import results:

$ Git log-2
Commit 1fd4ec126171790efd2db83548b85b1bbbc07dc2
Author: perforce staff <support@perforce.com>
Date: Thu Aug 19 10:18:45 2004-0800
Drop 'rc3' moniker of jam-2.5. Folded RC2 and RC3 relnotes
The main part of the document. built new tar/zip bils.
Only 16 months later.
[Git-p4: depot-paths = "// public/JAM/src/": Change = 4409]
Commit ca8870db541a23ed867f38847eda65bf4363371d
Author: Richard Geiger <rmg@perforce.com>
Date: Tue Apr 22 20:51:34 2003-0800
Update derived jamgram. c
[Git-p4: depot-paths = "// public/JAM/src/": Change = 3108]

Each commit has a git-p4 identifier. This identifier can be retained to prevent future reference of the perforce version number. However, if you want to delete these identifiers, now is the time before you start the new repository. You can use git filter-branch to batch Delete these identifiers:

$ Git filter-branch -- MSG-filter'
Sed-e "/^ \ [git-p4:/D"
'
Rewrite 1fd4ec126171790efd2db83548b85b1bbbc07dc2 (123/123)
Ref 'refs/heads/master' was rewritten

Run the GIT log and you will find that the SHA-1 checksum of these Commit Changes, And the git-p4 strings disappear from the commit information:

$ Git log-2
Commit 10a16d60cffca14d454a15c6164108f4082bc5b0
Author: perforce staff <support@perforce.com>
Date: Thu Aug 19 10:18:45 2004-0800
Drop 'rc3' moniker of jam-2.5. Folded RC2 and RC3 relnotes
The main part of the document. built new tar/zip bils.
Only 16 months later.
Commit 2b6c6db311dd76c34c66ec1c40a49405e6b527b2
Author: Richard Geiger <rmg@perforce.com>
Date: Tue Apr 22 20:51:34 2003-0800
Update derived jamgram. c

Now that the import is complete, you can start pushing to the new git server.

Custom Import script

If the previous system is not one of subversion or perforce, check the Internet to see if there is any corresponding import script-import CVs, clear case, visual source safe, even the import script for the archive directory already exists. If none of these tools are applicable, or are rarely used, or you need to make more settings in the import process, you should use git fast-import. This command reads simple commands from standard input to write specific git data. In this way, creating git objects is better than running pure
Git commands or manual object writing are much simpler (for more information, see Chapter 9 ). With this feature, you can write an import script to read the necessary information from the import source, and directly output relevant instructions in the standard output. You can run the script and connect its output pipeline to git fast-import.

The following shows how to compile a simple import script. Assume that you are performing a job and copy the working directory to the directory named by the timestamp back_yy_mm_dd on time for backup. Now you need to import them to git. The directory structure is as follows:

$ Ls/opt/import_from
Back_2009_01_02
Back_2009_01_04
Back_2009_01_14
Back_2009_02_03
Current

To import data to a git directory, let's first review how git stores data. You may remember that git is essentially a linked list of A commit object, and each object points to a snapshot of the content. The job to be done here is to tell the location of the fast-import content snapshot, what commit data points to them, and their order. We adopt a snapshot processing policy to create a corresponding commit for each content directory, and each commit is connected to the previous one.

As in Chapter 7 "Git execution Policy Example", we will use Ruby to write this script, because it is a language I use and is easier to read. You can rewrite this example in any other familiar language-it only needs to print the necessary information to the standard output. At the same time, if you are using Windows, this means that you should pay special attention not to introduce a carriage return character when wrapping the line) -- fast-import of Git
Strict requirements on carriage returns (CRLF) that only use line breaks (LF), not windows.

First, go to the target directory and find all subdirectories. Each subdirectory is imported as a snapshot as a commit. We will enter each subdirectory in sequence and print the required commands to export them. The main cycle of the script is roughly as follows:

Last_mark = Nil
# Traverse all directories cyclically
Dir. chdir (argv [0]) Do
Dir. glob ("*"). Each do | dir |
Next if file. file? (DIR)
# Enter the target directory
Dir. chdir (DIR) Do
Last_mark = print_export (Dir, last_mark)
End
End
End

We run print_export in each directory, which takes out the index and tag of the previous snapshot and returns the index and tag of the snapshot. Thus, we can correctly connect the two ." Mark is the name of the commit identifier in fast-import. When creating a commit, we assign one tag one by one so that it can be used later when connected to other commit. Therefore, the first thing to do in the print_export method is to generate a tag Based on the directory name:

Mark = convert_dir_to_mark (DIR) to implement this function is to create an array sequence of directories and use the index value of the array as the mark, because the mark must be an integer. This method is roughly like this:

$ Marks = []
Def convert_dir_to_mark (DIR)
If! $ Marks. Include? (DIR)
$ Marks <dir
End
($ Marks. Index (DIR) + 1). to_s
End

An integer is used to represent each commit. Now we need to submit the date in the additional information. Since the date is represented by the Directory Name, We will parse it from it. The next line of the print_export file will be:

Date = convert_dir_to_date (DIR) while convert_dir_to_date is defined
Def convert_dir_to_date (DIR)
If dir = 'current'
Return time. Now (). to_ I
Else
Dir = dir. gsub ('back _','')
(Year, month, day) = dir. Split ('_')
Return time. Local (year, month, day). to_ I
End
End

It returns an integer value for each directory. The last item required to submit additional information is the submitted data, which is defined directly in a global variable:

$ Author = 'Scott Chacon <schacon@example.com> 'We can almost start committing data for the import Script output. The first item indicates that we define a commit object and its branch, followed by the tag, Submitter information and comments we generated, and then the index of the previous commit, if yes. The code is roughly as follows:

# Print the information required for Import

Puts 'commit refs/heads/master'
Puts 'Mark: '+ Mark
Puts "committer # {$ author} # {date}-0700"
Export_data ('imported from '+ DIR)
Puts 'from: '+ last_mark if last_mark

The Time Zone (-0700) uses hard encoding for simplified purposes. If you are importing data from another version control system, you must specify the time zone as a variable. Remarks must be submitted in a specific format:

Data (size) \ n (contents) the format contains the word data, the size of the data to be read, a line break, and finally the data itself. Since the same format is used to specify the file content later, we write an auxiliary method, export_data:

Def export_data (string)
Print "data # {string. Size} \ n # {string }"
End

The only thing left is the content of each snapshot. This is simple because they are in a directory. You can output the deleeall command, followed by the content of each file in the directory. Git correctly records every snapshot:

Puts 'deleteall'
Dir. glob ("**/*"). Each do | file | next if! File. file? (File)
Inline_data (file)
End

Note: because many systems regard each revision as a change from one commit to another, fast-import can also identify the files to be added based on each command submitted, deleted or modified, and modified content. We will need to calculate the difference between snapshots and just give the data, but this approach is much more complicated-if you don't directly throw all the data to git, you can figure it out. If the preceding method is more suitable for your data, refer to the fast-import man help page to learn how to provide data in this way.

List the content of a new file or specify the format of the modified file with the new content as follows:

M 644 inline path/to/File
Data (size)
(File Contents)

Here, 644 is the permission mode (when an executable file is added, it needs to be tested and set to 755), and inline indicates that the file content is listed immediately after the end of the line. Our inline_data method is roughly:

Def inline_data (file, code = 'M', mode = '000000 ')
Content = file. Read (file)
Puts "# {code }#{ mode} inline # {file }"
Export_data (content)
End

We have reused the previously defined export_data because the format for specifying the comments is exactly the same.

The last task is to return the current tag for use in the next loop.

Return mark Note: if you are using Windows, remember to add an additional step. As mentioned above, Windows uses CRLF as the line feed character, while git fast-import only accepts lf. To bypass this issue to satisfy git fast-import, You need to replace CRLF with lf in ruby:

$ Stdout. binmode. Run the script now and you will get the following content:

$ Ruby import. Rb/opt/import_from
Commit refs/heads/Master
MARK: 1
Committer Scott Chacon <schacon@geemail.com> 1230883200-0700
Data 29
Imported from back_2009_01_02deleteall
M 644 inline file. Rb
Data 12
Version two
Commit refs/heads/Master
MARK: 2
Committer Scott Chacon <schacon@geemail.com> 1231056000-0700
Data 29
Imported from back_2009_01_04from: 1
Deleteall
M 644 inline file. Rb
Data 14
Version three
M 644 inline new. Rb
Data 16
New Version One
(...)

To run the import script, use the pipeline to direct the content to git fast-import in the directory to be imported. You can create an empty directory, run git init as the beginning, and then run the script:

$ Git init
Initialized empty git repository in/opt/import_to/. Git/
$ Ruby import. Rb/opt/import_from | git fast-Import
Git-fast-import statistics:
---------------------------------------------------------------------
Alloc 'd objects: 5000
Total objects: 18 (1 duplicates)
Blobs: 7 (1 duplicates 0 deltas)
Trees: 6 (0 duplicates 1 deltas)
Commits: 5 (0 duplicates 0 deltas)
Tags: 0 (0 duplicates 0 deltas)
Total branches: 1 (1 loads)
Marks: 1024 (5 unique)
Atoms: 3
Memory Total: 2255 kib
Pools: 2098 kib
Objects: 156 kib
---------------------------------------------------------------------
Pack_report: getpagesize () = 4096.
Pack_report: core. packedgitwindowsize = 33554432
Pack_report: core. packedgitlimit = 268435456
Pack_report: pack_used_ctr = 9
Pack_report: pack_mmap_cballs = 5
Pack_report: pack_open_windows = 1/1
Pack_report: pack_mapped = 1356/1356.
---------------------------------------------------------------------

You will find that after it is successfully executed, it will give a bunch of data about the completed work. In the preceding example, data submitted five times is imported in a branch, which contains 18 objects. Now you can run git log to view the new history:

$ Git log-2
Commit 10bfe7d22ce15ee25b60a424c8982157ca593d41
Author: Scott Chacon <schacon@example.com>
Date: Sun May 3 12:57:39 2009-0700
Imported from current
Commit 7e519590de754d079dd73b44d695a42c9d2df452
Author: Scott Chacon <schacon@example.com>
Date: Tue Feb 3 01:00:00 2009-0700
Imported from back_2009_02_03

That's it-a clean and tidy git repository. Note that NO content is checked out at this time-there are no files in the current directory at the beginning. To get them, you have to go to the master Branch:

$ Ls
$ Git reset -- Hard Master
Head is now at 10bfe7d imported from current
$ Ls
File. RB libfast-Import

You can also do more-process different file modes, binary files, multiple branches and mergers, tags, and progress identifiers. Some more complex instances can be found in the contib/fast-import directory of git source code; among them, the more outstanding is the git-p4 script mentioned above.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.