A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
Transferred from: http://www.ibm.com/developerworks/cn/opensource/os-cn-tourofgit/
The design idea of the distributed code base and file snapshot, which is advocated by Git, is a challenge and subversion relative to the traditional CVS, SVN and other centralized, file-differentiated version control tools. Git brings many conveniences such as offline submissions, lightweight branching, and more. However, some people questioned the complexity of Git, and thus the cost of learning, some programs affect the developer to use or migrate Git project progress, the author also empathy, this is the starting point of this article.
Unlike various GIT usage guidelines, this article introduces git installation and usage, and focuses on Git's design ideas, architecture, and practical features, including the Git branching model, git tags, git patch submissions, CVS migration git, SVN migration git, and more.Background
Git is an open source distributed version control software. In British English, git refers to a stupid or unhappy person, I am afraid the git inventor--linux godfather Linus Torvalds at that time of self-deprecating psychology is not irrelevant. Until 2002, most of the time that Linux kernel maintenance worked was wasted on cumbersome transactions such as patching and saving archives. Enabling version control tools BitKeeper Managing the Linux kernel is a top priority. However, BitKeeper is a commercial software, after 3 years of free use, the Linux community has to look for alternatives to continue to host the Linux kernel source code. 2005, forced by helplessness, Linus Torvalds developed a set of open source version control tools, and named Git.
Since its inception, Git has been open-source, simple, fast, distributed, efficient and so on, to cope with a variety of similar Linux kernel source code and other complex project development needs. Today, Git is very mature, widely accepted and used, and more and more projects are migrated to the GIT repository for management. Take the Eclipse community as an example. It is said that 80% of the Eclipse Foundation projects are now fully managed using Git, and CVS access has been switched to read-only status. And, on the Eclipse Foundation website, the "CVS" three characters have been crossed out in the introduction to project management, and it is very interesting to write, "Ding Dong, the Witch is dead.", meaning "ding dong, the old witch has hung up."
Not only that, I recently also received the world's largest open source code hosting platform--sourceforge upgrade notification. Among them, a relatively simple project of the author has been automatically upgraded from CVS to Git by default. For another more complex CVS project, Toolbox for Java/jtopen,sourceforge does not automatically upgrade, it is estimated to wait for the author to do the final preparation before the upgrade. I hope that by sharing their git learning experience and practical experience, for git beginners benefit, this is the meaning of this article.Why Choose Git
In fact, Git's learning costs are even higher compared to the mainstream version control software like CVS and SVN. For Subversion users, for example, it's almost enough to understand what a file, working directory, Repository, version, branch, and tag are. For Git users, you need to understand more and more complex concepts, including files, snapshots, work trees, indexes, local repositories, remote repositories, remote, commit, branching, and Stash. So why are software developers still flocking to Git? What is the advantage of CVS versus svn,git?
The various advantages of git, the Internet and a variety of git books give their own answers. The author believes that the storage snapshot and distributed design idea is Git's 2 big point, the reason is as follows:
First, Git's self-sustaining storage file system is a big highlight. CVS, the SVN low-level adoption of the incremental file system, shown in 1. Incremental file system is characterized by: When file changes occur, the file system stores the file differences.Figure 1. CVS, SVN log file content differences
The same is the file change submission, the Git underlying file system stores the file snapshot, the entire file content, and saves the index to the snapshot, as shown in 2. Given the performance factors, the file system does not repeatedly save the file if the file content has not changed, but simply saves a link to the file.Figure 2. Git records the entire file snapshot
Git chooses this underlying storage data structure primarily to improve the efficiency of the GIT branch. In fact, a Git branch is essentially a mutable pointer to an indexed object, and each index object points to a snapshot of the file, as shown in 3.Figure 3. Data structures for Git branches
In this way, the creation of branches can be done instantaneously, with little to no cost. In other words, the Git branch is inexpensive and lightweight. We look at a variety of CVS, SVN projects, which often mean a full copy of the source code, at the expense of expensive, heavyweight. For large projects, it is necessary to create various branches, which is consistent with the idea that Git encourages frequent creation and merging of branches.
Secondly, the design idea of Git version control system is "to be centralized". The traditional CVS, SVN and other tools use the C/s architecture, there is only one central code warehouse, located on the server side. And once the server system outage, network failure and other reasons caused the central warehouse is not available, the entire CVS, SVN system code check-in and check out is paralyzed. Even when high availability is taken into account, the cost of operating maintenance increases with the migration of another central repository to continue the code submission operation.
To get rid of the dependency on the central repository, one of the initial design goals of Git is distributed control management. We give a sample, 4 shows. If we set up a project team, the developers are mainly composed of Alice, Bob, Clair and David four members. In addition to the Central warehouse origin (the Git default remote warehouse name), each member is responsible for a local repository. From a distributed point of view, David can be seen as Alice's remote repository, and vice versa. The Git distributed design concept helps reduce the reliance on the central warehouse, effectively reducing the load on the central warehouse and improving the flexibility of code submission.Figure 4. Git Distributed work
Another great benefit of Git's distributed design idea is that it supports working offline. The benefits of offline work is self-evident, for the CVS, SVN this heavily dependent on the network C/S tool, without a network or VPN, it means that the loss of the right-arm, code check-in and check-out operation will not work. And once you use Git, even on a plane or train that doesn't have WIFI, you can often submit code, just submit it to the local repository, wait until the network is connected, and then upload it to the remote mirror repository.
For more details on git, please refer to Git's official website: http://git-scm.com/.
工欲善其事, its prerequisite. After understanding git's flexible snapshot storage and distributed design philosophy, we introduce Git's installation process for different operating systems. It should be noted that this is only a thick outline of the Git installation method, as for the Git installation prerequisites, installation process problems in the diagnosis of more detailed description of the content is not covered in this article.
Back to top of pageHow to install Git
In summary, git installation is usually divided into two types: one is to choose Git source to compile the installation, the other is to use a platform-specific binary installation package, and can be subdivided into Linux, MAC, Windows, etc., the installation instructions are as follows.
1. source code compilation and installation
Installing from Git source is at least guaranteed to be the latest version. Before you install Git, you need to install its dependent packages, including curl, zlib, OpenSSL, expat, Libiconv, and so on. Depending on the type of Linux, the reader can choose from a variety of package installation tools, here in Yum for example, its installation command is as follows:
$ yum Install curl-devel expat-devel gettext-devel openssl-devel zlib-devel
Next, the reader can download the latest Git source code from the GIT official site http://git-scm.com/download (due to the time difference, the author cannot guarantee the latest version of Git as described in this article), execute the following command to compile the installation.
$ tar-zxf git-1.7.6.tar.gz$ cd git-1.7.6$ make prefix=/usr/local all$ sudo make prefix=/usr/local install
Finally, the git command is typed to verify that the installation was successful, as shown in 5. As you can see, we have successfully installed Git.Figure 5. Installing Git from the source
2 . Installing on Linux
To install a precompiled Git Binary installation package on Linux, you can choose the Package Manager that the system supports. For Red Hat Linux, install with the Yum command:
$ yum Install Git-core
For Linux systems like Ubuntu, this Debian system is installed using the Apt-get command:
$ apt-get Install Git-core
Since this kind of installation is very simple, no stickers are shown here.
3 . Install on Mac
There are two ways that a MAC system can support Git installation: Compile and install, and graphical installation. Its command-line compilation installation is similar to Linux and is no longer described here. In contrast, MAC graphics is easier to install Git, as shown in installation 6. Readers can go to Http://code.google.com/p/git-osx-installer to download the latest version of Git that supports MAC systems.Figure 6. Install Git from your MAC
4 . Installing on Windows
Installing Git on Windows is just as easy as installing Git on a Mac like the one described earlier. According to the user's usage habits, we can be broadly divided into three categories:
Custom command line user, you can choose Msysgit installation package, installation as shown in 7. The official msysgit is: Http://code.google.com/p/msysgit.Figure 7. Install the command line Git tool from Windows--msysgit
For users who are accustomed to tortoise style, you can choose the Tortoisegit installation package, as shown in right-click 8 after installation. The Tortoisegit is: http://code.google.com/p/tortoisegit/.Figure 8. Installing the right-click Git tool from Windows--tortoisegit
For users who are accustomed to the eclipse style, you can choose the Eclipse plugin--egit-mode installation, which is shown in Git repositories view 9. The EGit is: http://download.eclipse.org/egit/updates.Figure 9. Install Eclipse's Git plugin from Windows--egit
Either way, if you're using Git for the first time, you'll need to configure the user information, including the username and Email (see below), so that you can automatically refer to the two messages each time you commit the Git file, indicating who updated and committed the code.
$ git config--global user.name "Pi Guang Ming" $ git config--global user.email [email protected]
So far, we've covered Git's distributed model, snapshot model, git installation for different operating system platforms, and then the topic of this article, the use of Git.
Back to top of pageHow to use Git
As mentioned earlier, this section is the focus of this article. Our main focus is on how git works, and the more practical git branches, tags, patches, CVS, and SVN migrations for Git, as well as Git's basic command syntax, explanations, and what's not covered in this article, all available in the Git-related usage guide.
Create a Git project repository
Before we can formally use git, we need to create at least one Git code repository (the Git repository, for short). In general, there are two ways to get a Git repository. The first is to create a new git repository in the existing directory by importing all the files, and the second is to clone directly from the remote Git image warehouse to the local repository.
For the first type of git repository, we can use the GIT init command to create a new Git project repository, as follows:
$ git init
After initializing Git, a hidden directory named. Git appears in the current directory, as shown in 10.Figure 10. . Git directory
The reason to emphasize the. Git directory is because it is important. For a git repository, its. Git directory holds all the data and resources for the entire GIT project. A brief explanation of the various files in the. git directory, as shown in table 1. If you need more information, see the Git official website: http://git-scm.com/.Table 1. Git Directory brief description
|Sub-directory name||Brief Description|
|Branches||GIT Project branch information, the new version of Git no longer uses this directory.|
|Config||Git Project configuration information|
|Description||Git Project Description Information|
|HEAD||The head pointer to the current branch of the Git project|
|Hooks||The default "hooks" script is triggered before and after a particular event occurs.|
|Info||It contains a exclude file, which is a file to be ignored by the Git project.|
|Objects||Git data objects, including: commits, trees, blobs, tags.|
|Refs||Pointers to all Git project branches|
For the second type of Git repository, we don't need git init to initialize the repository, instead, use Git clone to clone the remote mirror directly to the local repository. Here, for example, we download the source code of the Git software itself, and its git clone command is as follows:
git clone git://git.kernel.org/pub/scm/git/git.git
With the ls git command, we can view the contents of the Git repository, as shown in 11. It should be explained that the image for the remote repository, the actual copy is the. git directory of data, and then based on the metadata to restore the original whole project structure, also is the content shown in Figure 11.Figure 11. Cloning Git Source Code
In addition, in addition to the GIT://protocol, for different usage scenarios, git clone also supports ssh://, HTTP (s)://And a variety of different protocols.
Git Object Model
It should be said that the Git object model is the most central part of the entire git design idea. Understanding the Git object model is key to understanding the entire git. In simple terms, each Git object contains three parts: type, size, and content. Among them, the type of the object is divided into commits, trees, blobs, tags, its brief description is as follows:
Next, we'll combine an example to explain the relationship between different Git objects. Figure 12 shows a sample Ruby project, as you can see, this example is very simple, just a schematic.Figure 12. Directory hierarchy for Ruby projects
If we commit the project to a git repository, then its Git object relationship is 13. Of these, 3 blob objects correspond to the content snapshots of the README, MYLIB.RB, yourlib.rb three files respectively. The 3 tree object pointers fully describe the entire directory structure of the project, including the contents of the directory tree, the corresponding relationship between the file and the Blob object, and the individual files corresponding to the Blob object index. Each commit will generate a commit object pointer to the root node of the tree, and the Commit object also contains details such as the author, the submitter, and so on.Figure 13. Git object diagram for Ruby projects
It is not difficult to see that many of the tree objects together with the Blob, as a content node (directory or file), constitute a direction-free graph. At any time, through the root node associated with the Commit object, you can traverse through all the contents of the entire project at the time of this commit. As mentioned earlier, the Git branch is essentially a pointer to a commit object. The merging of two git branches is essentially equivalent to the merging of two forward-free graphs, whereas a forward-free graph can make Git more efficient in judging the branch's parent node. As a result, the Git object model design gives developers the greatest flexibility to create branches and develop them on their own branches.
Although the types of these objects are different, each object has a unique identifier of the same length, represented by a 40-bit string. In fact, the object identifier for Figure 13 is shorthand, where the complete identity of the commit object is as follows:
To guarantee the uniqueness of the object's identity, Git uses the SHA1 hashing algorithm. In doing so, there are at least three major benefits:
Summarize the Git object model, where the Blob object is all the entity files in the project, including source code, picture resources, XML configuration information, and so on. In particular, the Blob object records only the contents of the file, and information about the directory, name size, and so on, is recorded on the tree object that it is associated with. Each time we submit a file, a commit object is generated and the tree object associated with the change file is updated.
Three states of Git
After understanding the Git object model, our focus shifted to the check-in and checkout of git files. The Git repository model is broadly divided into three work areas, the working directory (working directory), the staging area (Stage or Index), and the Local repository (history), as shown in the check in and check Out command 14:Figure 14. Git three transitions between States (1)
A brief description of the relevant command is as follows:
As an example, figure 15 shows how git add and git checkout are copied back and forth between the working directory and staging area, and the reader can try the git commit and git reset commands on their own. First, we create a test file with the content "Hello git" in the working directory and copy the test file to staging area via the git add command. Then, in the working directory, modify the test file to add a line "Hello Git branch". At this point, the content of staging area is still "Hello git", unchanged. Finally, the test file of the staging area is overwritten by git checkout, that is, the local modification is discarded, and the final file content is "Hello git".Figure 15. Git checkout--Files sample
In fact, replication between the working directory and the repository can also be one step, as shown in 16.Figure 16. Git three transitions between States (2)
Git commit-a is equivalent to git add and git commit, which is to copy the file from the working directory to the staging area and then copy it from staging area to the repository. git checkout HEAD--Files is the reverse of the process, that is, rolling back to the last commit.
In order to view the working directory, the staging area, and the local repository files are different, you can use the git diff command, as shown in 17:Figure 17. Comparison between three states of Git
A brief description of the git diff command is as follows:
As an example, Figure 18 illustrates the use of Git diff. As you can see, with Git diff, you know that the working directory is more than staging area a line of "Hello git Tag", and by comparing Git diff HEAD, you know that the working directory is more than two lines more than the most recent commit file in the repository, namely "Hello Git branch" and "Hello G It tag ". As a result, staging area has a line of "Hello git branch" more than the warehouse, which coincides with the conclusion of Git diff–cached.Figure 18. A comparison between three states of Git--Example
These are the basic uses of Git check-in and check-out operations, as well as more detailed commands and syntax instructions, which can be found in the relevant Git learning guide.
Next, we introduce the more advanced features and features of Git.
Git Branching model
As mentioned earlier, the branch in Git is essentially a mutable pointer to a commit object. Git maintains a default branch--master. After each commit, the master pointer will automatically move forward. And if you want to create a new branch, you can use the git branch command:
$ git Branch bugfix
This creates a new branch pointer on the current commit object, as shown in 19.Figure 19. New branch Bugfix
So, how does Git know which branch is currently working on? In fact, the answer is simple, it holds a special pointer called HEAD, which is a pointer to the current working branch. We can think of the HEAD as the alias of the current branch. At this point, it is very different from the CVS, SVN HEAD concept.
Running the git branch command simply creates a new branch, but does not automatically switch to that branch, so in this case we still work in the Master branch. To switch to another branch, you can execute the git checkout command.
$ git checkout bugfix
The HEAD then points to the Bugfix branch, as shown in Figure 20.Figure 20. Switch to Bugfix Branch
In fact, we can merge the creation of the branch with the switch two steps. To create a new and switch to this branch, run git checkout and add the-B parameter:
$ git checkout-b bugfix
Next, submit it again:
$ VI test.rb$ git commit-a-m ' Update copyright '
Figure 21 shows the results after the submission. Very interesting, now the Bugfix branch moves forward one cell, while the master branch still points to the commit object where the original Git checkout was.Figure 21. Submitting files in the Bugfix branch
Let's switch to the master branch again:
$ git Checkout Master
Its structure is shown in 22. This order has done two things. First, it moves the HEAD pointer back to the master branch, and second, replaces the files in the working directory with the snapshot content pointed to by the Master branch. That is, from now on, a series of commits based on that file will start with an older version. Its main function is that the changes made in the Bugfix branch can be temporarily canceled, isolating the effect of the Bugfix branch on the master branch. In actual projects, we often have the need to use the developer branch to develop major versions, the Bugfix branch is responsible for fixing bugs, isolating each other, and finally merging.Figure 22. Switch to Master Branch
We make some changes and submit again:
$ VI test.rb$ git commit-a-m ' made other changes '
Now our project submission history has been forked, as shown in 23, because we just created a branch, did some work, and then switched to the main branch to do some other work. We can switch back and forth in different branches and merge them together when the time is ripe.Figure 23. Submit a file in the master branch
The git merge command merges different branches together. Before merging, HEAD must point to the current submission. The git merge operation is divided into three scenarios, depending on the usage scenario:
As you can see, the git merge command merges two parent branches for a single commit, but the commit history is not linear. In contrast, the branch-rebase command git rebase repeats the history of another branch on the current branch, ensuring that the commit history is linear, as shown in 25.Figure 25. Branch of the Yan-hop
As an example, we show you the difference between git merge and Git rebase, as shown in Figure 26.Figure 26. Merge branch vs Rebase Branch
Sometimes merging does not work so well, and if you modify the same part of the same file in different branches, it can cause merge conflicts and Git cannot cleanly combine the two. At this point, Git only merges, but does not commit, it will stop and so on to resolve the conflict artificially, as follows:
$ cat Test.rbinitmaster update1master update2bugfix update1<<<<<<< HEADmaster updated3======= Bugfix update2>>>>>>> Bugfix
To see which files conflict when merging, you can use git status:
$ git status# on branch master# unmerged paths:# (use "Git add/rm <file> ..." as appropriate to mark Resolution) # # BO Th modified:test.rb#no changes added to commit (use "git add" and/or "Git Commit-a")
After the release to be patched, the Bugfix branch has completed its historical mission, and we can use the-d option of Git branch to perform the delete operation:
$ git branch-d bugfix
Above, we introduced the creation of Git branching, switching, merging (linear and non-linear), conflicts, and deletions.
Like CVS, SVN and other version control systems, Git also supports git tagging. After the development of the program to a stage, we need to make a label, release a version, such as 0.1.2,v0.1.2.
There are two types of tags that Git uses: Lightweight (lightweight) and notes-containing (annotated). A lightweight tag is actually a reference to a specific commit object, and a note tag is actually a standalone Git object stored in the repository. In contrast, a note tag contains more information, including its own verification information, tag name, Email, label date, and label description. The note label itself also allows the use of the GNU Privacy Guard (GPG) to sign or verify, so we recommend using the label with the note in order to retain the relevant information.
To tag, execute the following Git commands:
$ git tag-a v0.1.2-m "Release version 0.1.2"
Accordingly, to view the label, execute the following Git commands:
$ git tag–l
Of course, you can also use the git show command to view details such as the tag version and the Commit object.
$ git show v0.1.2
The Git command to delete a tag is as follows:
Git tag-d v0.1.2
If we have our own private key, we can also use GPG to sign the label, just to change the previous-A to-s, as follows
$ git tag-s v0.1.2-m "My signed 0.1.2 Tag"
To verify a signed tag, you can first fetch the corresponding public key, and then use the git tag–v command to verify that, as follows:
$ git tag-v v0.1.2
Note that, by default, git push does not transfer tags to the remote repository. We can only share tags with explicit commands. Its command format is as follows:
$ GIT push origin v0.1.2
If you want to push all locally added tags at once, you can use the--tags option:
$ GIT push origin--tags
As a result, other people will see these tags when they clone a shared warehouse or pull data synchronization.
In the UNIX world, the concept of patches is very important, and almost all the ordinary contributors to large UNIX projects are submitting code through patches. For the Linux kernel project, the common developer first clones the code from the Git project repository, writes the code, makes a patch, and ends up with an e-mail to the maintainer of the Linux kernel.
Git offers two simple patch generation scenarios. One is the standard patch generated using Git diff, and the second is a git-specific patch that is generated using Git format-patch. Here, we focus on the second way, about the first Git diff way, it is relatively simple, here do not introduce.
Suppose we have a project MyProj, whose working directory originally had a file test with the content "Hello git", which was submitted by default to the Master branch. Here, we create a new branch bugfix for code modification, as shown in 27:Figure 27. Create a branch
Next, we append a line "fix" to the test file and use Git format-patch to generate a patch, shown in 28, where the-m option of git format-patch indicates which branch the patch is to be compared to.Figure 28. Generate Patches
As you can see, the patch file 0001-fix.patch contains a variety of information, not only the diff information, but also the submitter, time and so on. After a closer look you will find that this is an e-mail file that can be sent directly.
Next, you can use git am to apply patches, as shown in 29. Can see, compared to the original test file, after patching, a line of "fix."Figure 29. Apply Patches
Comparing the two methods of generating patches, it is clear that git format-patch generated git-specific patches that are less compatible than the generic patches generated by Git diff. However, the Git-specific patch contains the patch developer's name, which is recorded in the repository when the patch is applied. As a result, the open source community that currently uses Git often suggests that you use Format-patch to generate patches.
Git Remote repository Operations
As mentioned earlier, Git is a distributed version control system. For a distributed node, the Git repository for the other nodes can be used as a remote repository for the local repository. To see which remote warehouses are currently configured, you can use the following command:
$ git remote
After you have cloned a project, you can see at least one remote library named origin, and Git uses that name to identify the original repository you cloned by default.
Project to a stage, to share the current results with others, you can use the git push command to push data from the local repository to the remote repository.
$ GIT push origin master
To fetch data from the remote repository locally, you can use the git fetch command to get all the data that is not in the local repository.
$ git fetch [remote-name]
If a branch is set up to track a branch of a remote repository, you can use the git pull command to automatically crawl the data and then automatically merge the remote branch into the current branch in the local repository. From this perspective, git pull is equivalent to the git fetch + git merge feature.
$ git pull [remote-name]
The relationship between the above several Git remote repositories is shown in Figure 30. To learn more about Git remote repository commands, such as delete and rename, refer to the relevant GIT Operations Guide.Figure 30. Operation of Git remote repository
CVS migration to Git
For users who want to migrate from CVS to git, you can use the Git cvsimport tool to troubleshoot migration issues, provided that you install the relevant tools Git-cvs or Cvsps.
For Git-cvs tools, you can install them using the Yum or Apt-get command. In the case of Yum, the installation commands are as follows:
$ yum Install Git-cvs
If the source code is compiled to install Git, you need to install cvsps,:http://www.cobite.com/cvsps/
$ tar-zxvf cvsps-2.1.tar.gz$ CD cvsps-2.1$ make && make install
As an example, we create a new directory Jt400.cvs and import the source of the SourceForge managed CVS Project Toolbox for Java/jtopen, which is mentioned at the beginning of the article, into Git, with the following procedures:
$ mkdir jt400.cvs$ CD jt400.cvs$ export cvsroot=:p server:[email protected]:/cvsroot/jt400$ cvs login$ git cvsimport-c src SRC where,-c SRC is the name of the project to be created in the Git repository, and the last SRC is the module to be imported in CVS.
SVN migration to Git
Similarly, Git provides git svn-related tools that provide the SVN project-to-git migration, as long as the tool Subversion-perl is installed.
$ yum Install install Subversion-perl as an example, we create a new directory PHOTON-ANDROID.SVN and import the Googlecode managed SVN project photon-android into Git, The process is as follows: $ mkdir photon-android.svn$ cd photon-android.svn$ git svn clone http://photon-android.googlecode.com/svn/
Back to top of pageSummarize
This article systematically introduces the Distributed version Control tool--git, including why the installation of git,git, how git works, how git is used, and how CVS and SVN migrate to git. For a more comprehensive approach to Git, see the documentation: Https://github.com/progit/progit.
Open source Distributed version control tool--git Tour
Start building with 50+ products and up to 12 months usage for Elastic Compute Service