Original URL: http://blog.jobbole.com/25775/
Original: "Pro Git"
Start
This chapter describes the knowledge before you start using Git. We'll start with the historical background of some version control tools, and then try to get Git to run on your system until it's finally configured to work properly. After reading this chapter, you'll see why Git is so popular and why you should start using it right away. (See all the articles in the Git detailed series)
1.1 About version control
What is version control? Do I really need that? Versioning is a system that records changes in the content of several files to enable future review of specific revisions. In the example shown in this book, we only version control the text file that holds the software source code, but in fact, you can version control any type of file.
If you are a graphic or web designer, you may need to save all revisions of a picture or page layout file (which is perhaps a feature you are very eager to have). Using a version control system (VCS) is a smart choice. With it, you can trace a file back to its previous state, or even roll the entire project down to a point in time in the past. You can compare the details of the file changes to find out who changed which place, which led to weird problems, who reported a feature defect, and so on. Using a version control system also means that you can easily revert to the original image even if you change the file in the whole project by deleting it. But the extra workload was minimal.
Local version control system
Many people are accustomed to copy the entire project directory in a way to save different versions, perhaps renaming and backup time to show the difference. The only benefit of doing so is simplicity. However, there are many disadvantages: sometimes confusing the working directory, once the wrong file lost data can not undo recovery.
To solve this problem, many local version control systems have been developed a long time ago, mostly using some simple database to record the changes in the file (see Figure 1-1).
Figure 1-1. Local version control system
One of the most popular is the RCS, which is now visible on many computer systems. You can use the RCS command even after you have installed the Developer Toolkit on a popular MAC OS X system. It works basically to save and manage file patches. A file patch is a text file in a specific format that records changes in content before and after the corresponding file revision. So, based on each revised patch, RCS can calculate the file contents of each version by constantly patching.
Centralized version control system
Then there is the question of how to get developers working together on different systems. As a result, the centralized version control system (centralized, "CVCS") came into being. Such systems, such as cvs,subversion and Perforce, have a single, centrally managed server that keeps revisions of all files, while people working together connect to the server through the client, removing the latest files or submitting updates. Over the years, this has become a standard practice for version control systems (see Figure 1-2).
Figure 1-2. Centralized version control system
This approach brings many benefits, especially compared to older local VCS. Now, everyone can see to some extent what other people in the project are doing. Administrators can easily control the permissions of each developer, and managing a CVCS is far easier than maintaining a local database on each client.
There are two sides to things, good and bad. The most obvious disadvantage of this is the single point of failure of the central server. If you are down for an hour, no one can commit the update and work together within the hour. If a central server's disk fails, it does not happen to be backed up, or the backup is not timely enough, there is still a risk of data loss. The worst-case scenario is the total loss of all historical change records for the entire project, except for some snapshot data extracted by the client, but this is still a problem, and you cannot guarantee that all the data has been fully extracted beforehand. The local version control system also has a similar problem, as long as the entire project history is saved in a single location, there is a risk of losing all historical update records.
Distributed version control system
The distributed Versioning System (distributed version Control systems, abbreviated as DVCS) was released. In such systems, such as Git,mercurial,bazaar and Darcs, the client does not just extract the latest version of the file snapshot, but instead completely mirrors the original code repository. As a result, any server that works together fails and can be recovered using any of the mirrored local repositories afterwards. Because each fetch operation is, in fact, a full backup of the Code warehouse (see Figure 1-3).
Figure 1-3. Distributed version control system
Further, many of these systems can be specified to interact with several different remote code warehouses. With this, you can collaborate with people from different working groups on the same project. You can set up different collaborative processes as needed, such as hierarchical model workflows, which are not achievable in previous centralized systems.
A brief history of 1.2 Git
Like many great events in life, Git was born in a time of great controversy and massive innovation. The Linux kernel Open source project has a wide audience of participants. The vast majority of Linux kernel maintenance work is spent on patching and saving archives (1991-2002). By 2002, the entire project team began to enable the Distributed version control system BitKeeper to manage and maintain the code.
By the year 2005, commercial companies developing BitKeeper ended up working with the Linux kernel open source community, and they withdrew the power to use BitKeeper for free. This forces the Linux open source community (especially Linux creator Linus Torvalds) to learn from the lesson that only developing a set of their own version control system will not repeat the same. They have set a number of objectives for the new system:
* Speed * Simple design * Strong support for non-linear development patterns (allow thousands of parallel development branches) * Fully distributed * ability to efficiently manage hyper-scale projects like the Linux kernel (speed and data volume)
Since its inception in the 2005, Git has matured, and has retained its initial set of goals while being highly user-friendly. It's fast, great for managing big projects, and it has an incredibly non-linear branch management system (see chapter III) that can handle complex project development needs.
1.3 Git Basics
So, simply put, what kind of a system is Git? Please note that the next content is very important, if you understand the idea of Git and basic principles of work, you will know the reason why, easy. When you start learning Git, don't try to compare concepts with other version control systems (such as Subversion and Perforce), or you can easily confuse the actual meaning of each operation. While Git is saving and processing various kinds of information, it is quite different from other version control systems, although the command form is very similar. Understanding these differences will help you to accurately use the various tools that Git provides.
Direct recording of snapshots, rather than differential comparisons
The main difference between Git and other version control systems is that git only cares about whether the overall file data is changing, while most other systems only care about the specific differences in file content. This type of system (Cvs,subversion,perforce,bazaar, etc.) records what files are updated each time, and what lines are updated, see figure 1-4.
Figure 1-4. Other systems record the specific differences of each file in each version
Git does not store the variance data that changes before and after. In fact, Git is more like taking a snapshot of a changed file and recording it in a tiny file system. Each time you submit an update, it will take a snapshot of all of the file's fingerprint information, and then save an index that points to the snapshot. To improve performance, if the file does not change, Git does not save it again, but only a link to the last saved snapshot. Git works as shown in Figure 1-5.
Figure 1-5. Git saves a snapshot of the file each time it is updated
This is an important difference between Git and other systems. It completely overturned the traditional version control of the routine, and the implementation of the various aspects of the way to make a new design. Git is more of a small file system, but it also provides a number of powerful tools based on this, not just a simple VCS. Later, in chapter three, we'll look at the benefits of this design when we discuss Git branch management.
Nearly all operations are performed locally
The vast majority of operations in Git require access to local files and resources without a network connection. But if you use CVCS, almost all operations need to be connected to the network. Because Git keeps a historical update of all current projects on the local disk, it's fast to handle.
For example, if you want to browse the project's history update summary, Git does not have to go to the outside server to fetch the data back, and then read it directly from the local database. So at any time you can flip through it without waiting. If you want to see the difference between the current version of the file and the version one months ago, Git takes the snapshot and the current file for one months to make a difference, instead of asking the remote server to do it, or pulling the old version of the file locally for comparison.
With CVCS, you can't do anything without a network or a disconnected VPN. But with Git, even if you're on a plane or a train, you can be very happy to submit updates frequently, and then upload them to the remote repository when there's a network. Also, on the way home, you can continue to work without a VPN connection. For other version control systems, this is almost impossible, or very cumbersome. For example, Perforce, if you do not connect to the server, almost nothing can be done (the default cannot issue a command top4 edit filestart editing the file, because Perforce need the network notification system to declare that the file is being revised by WHO. But actually manually modifying the file permissions can bypass this limitation, but it is not possible to commit the update after completion. If it is Subversion or CVS, although you can edit the file, you cannot commit the update because the database is on the network. It seems like none of this is a big problem, but after the actual experience, you will be pleasantly surprised to find that this is actually going to make a big difference.
Maintain data integrity at all times
Before saving to Git, all data is evaluated for content checksum (checksum), and this result is used as a unique identifier and index of the data. In other words, Git doesn't know anything about a file or directory after you've modified it. This feature, as a design philosophy of Git, is built at the bottom of the overall architecture. So if the file becomes incomplete during transmission, or if the disk is damaged, the file data is missing and Git is immediately aware of it.
Git calculates the checksum of the data using the SHA-1 algorithm, and calculates a SHA-1 hash value as a fingerprint string by calculating the contents of the file or the structure of the directory. The string consists of 40 hexadecimal characters (0-9 and a-f) and looks like this:
1 |
24b9da6552252987aa493b52f8696cd6d3b00373 |
Git's work relies entirely on this type of fingerprint string, so you'll often see such a hash value. In fact, everything stored in a Git database is indexed with this hash, not by file name.
Most operations only add data
Most of the common Git operations are simply adding data to the database. Because any kind of irreversible operation, such as deleting data, can make it difficult to rewind or reproduce the historical version. In other VCS, if the update has not yet been submitted, it is possible to lose or confuse some of the modified content, but in Git, once the snapshot is submitted, there is no need to worry about losing data, especially in the habit of pushing to other warehouses regularly.
This high level of reliability makes our development work a lot of peace of mind, although to do a variety of experimental try, and then how will not lose data. As to how git internally stores and recovers data, we'll discuss git internals in the Nineth chapter.
Three status of files
Well, now note that the next concept is very important. There are only three states in Git for any file: committed (committed), modified (modified), and staged (staged). Submitted indicates that the file has been safely stored in the local database, modified to indicate that a file has been modified, but has not yet been committed, and that a staged representation puts the modified file on the list to be saved on the next commit.
This is where we see the three working areas of the file flow when Git manages the project: the working directory of Git, the staging area, and the local repository.
Figure 1-6. Working directory, staging area, and local warehouse
There is a git directory for each project: Ifgit cloneit comes out, it is.gitthe directory,git clone --bareand if so, the new directory itself is the GIT directory. ), which is where Git stores metadata and object databases. This directory is very important, each time you clone a mirrored warehouse, the actual copy is the data in this directory.
Remove all files and directories of a version from the project, which is called the working directory to begin the follow-up work. These files are actually extracted from the compacted object database in the Git directory and can then be edited in the working directory.
The so-called staging area is just a simple file that is typically placed in a Git directory. Sometimes people call this file an index file, but the standard term is called a staging area.
The basic Git workflow is as follows:
1. Modify some files in the working directory. 2. Take a snapshot of the modified file and save it to the staging area. 3. Commit the update to permanently dump the file snapshot saved in the staging area to the Git directory.
Therefore, we can judge the status from the location of the file: if it is a specific version of the file saved in the Git directory, it is a committed state, if modified and placed in the staging area, it is a staged state, if it has been modified since the last time, but has not been placed in the staging area, is the modified state. In the second chapter, we will learn more about the details and learn how to perform subsequent operations based on file status and how to skip staging direct submissions.
1.4 Installing Git
It's time to try Git, but install it first. There are many kinds of installation methods, mainly divided into two types, one is to install by compiling the source code, the other is to use the installation package for a specific platform precompiled.
Installing from source code
If conditions permit, there are many benefits of installing from source code, at least the latest version can be installed. Every version of Git is constantly trying to improve the user experience, so it's great to be able to compile and install the latest version yourself from the source code. Some Linux versions come with packages that aren't updated in a timely fashion, so unless you're using the latest distro or backports, installing from the source code is the best option.
Git's work calls for code from libraries such as CURL,ZLIB,OPENSSL,EXPAT,LIBICONV, so you need to install these dependent tools first. On systems with Yum (such as Fedora) or systems with apt-get (such as the Debian system), you can install them using the following command:
|
$ yuminstallcurl-devel expat-devel gettext-devel \ openssl-devel zlib-devel$ apt-getinstalllibcurl4-gnutls-dev libexpat1-dev gettext \ libz-dev libssl-dev |
Then, download the latest version of the source code from the following Git official site:
|
http://git-scm.com/download |
Then compile and install:
|
$tar-zxf git-1.7.2.2 .tar.gz$cdgit-1.7.2.2$makeprefix=/usr/localall$sudomakeprefix=/usr/localinstall |
Now you can use thegitcommand togitclone the Git project repository locally so that it will be updated at any time in the future:
|
$ git clone git://git.kernel.org/pub/scm/git/git.git |
Installing on Linux
If you are installing a precompiled Git Binary installation package on Linux, you can use the package management tools provided by the system directly. Install with Yum on Fedora:
On systems such as Ubuntu, the Debian system can be installed with Apt-get:
Install on MAC
There are two ways to install Git on your Mac. The easiest part is to use the graphical Git installation tool, Interface 1-7, in:
Http://code.google.com/p/git-osx-installer
Figure 1-7. Git OS X Installation Tool
The other is installed by MacPorts (http://www.macports.org). If the MacPorts is already installed, install Git with the following command:
|
$sudoportinstallgit-core +svn +doc +bash_completion +gitweb |
This way there is no need to install the dependent library yourself, MacPorts will help you to solve these problems. Generally the installation options listed above are sufficient, and if you want to use Git to connect to Subversion's code repository, you can add the +SVN option, which is described in chapter eighth. Another way is to use homebrew (https://github.com/mxcl/homebrew):brew install git. )
Installing on Windows
It's also easy to install Git on Windows, where a project called Msysgit provides an installation package that downloads the EXE installation file and runs it on the Google Code page:
|
http://code.google.com/p/msysgit |
Once the installation is complete, you can use the command-linegittool (which already comes with an SSH client), plus a graphical interface for the GIT Project management tool.
1.5 Configuration before running Git for the first time
Generally in the new system, we need to first configure their own Git work environment. The configuration works only once, and the current configuration is used later in the upgrade. Of course, you can always modify an existing configuration with the same command if you want.
GIT provides a tool called Git config, which is actually agit-configcommand, but you cangitcall this command by adding a name. ), which is designed to configure or read the appropriate work environment variables. It is these environment variables that determine how and how Git works in every aspect. These variables can be stored in the following three different places:
●/etc/gitconfigFile: A configuration that is universally applicable to all users in the system. Ifgit configyou use the--systemoption, read and write this file.
●~/.gitconfigFile: The profile under the user directory applies only to that user. Ifgit configyou use the--globaloption, read and write this file.
The configuration file in the current project's Git directory (that is, the file in the working directory.git/config): The configuration here is only valid for the current project. Each level of configuration overrides the same configuration on the upper layer, so.git/configthe configuration in it overrides/etc/gitconfigthe variable with the same name.
On a Windows system, Git searches for files in the user's home directory.gitconfig. The main directory is$HOMEthe directory specified by the variable, which is generallyC:\Documents and Settings\$USER. In addition, Git will try to find/etc/gitconfigfiles, just look at the original git installed in what directory, as the root directory to locate.
User Information
The first one to configure is your personal user name and email address. These two configurations are important, and each time Git commits, it references these two messages, stating who submitted the update, so it will be permanently included in the history along with the update:
12 |
$ git config --global user.name"John Doe"$ git config --global user.email [email protected] example.com |
If you use the--globaloption, the changed profile is the one in your home directory, and all of your projects will default to the user information configured here. If you want to use a different name or email in a particular project, just remove the--globaloption to reconfigure it, and the new settings are saved in the current project's.git/configfile.
Text Editor
The next step is to set the text editor to use by default. Git will automatically call an external text editor when you enter some extra messages. By default, the default editor specified by the operating system is used, which can typically be Vi or Vim. If you have other preferences, such as Emacs, you can reset them:
|
$ git config --global core.editor emacs |
Variance analysis Tool
There is also a more common use of the diff analysis tool when resolving merge conflicts. For example, to use Vimdiff:
|
$ git config --global merge.tool vimdiff |
Git can understand the output of kdiff3,tkdiff,meld,xxdiff,emerge,vimdiff,gvimdiff,ecmerge, and the merge tools such as Opendiff. Of course, you can also specify the tools you have developed yourself, and see Chapter seventh for details.
View configuration information
To check for existing configuration information, you can use thegit config --listcommand:
|
$ git config --listuser.name=Scott Chacon[email protected] gmail.comcolor.status=autocolor.branch=autocolor.interactive=autocolor.diff=auto... |
Sometimes you see duplicate variable names, which means they come from different configuration files (such as/etc/gitconfigand~/.gitconfig), but eventually Git actually uses the last one.
You can also directly check the settings of an environment variable, just follow the specific name, like this:
|
$ git config user.name Scott Chacon |
1.6 Getting Help
There are three ways to learn about the various tools that Git can use, and to read their usage help:
|
$ git help $ git --help$mangit- |
For example, to learn how the config command can be used, run:
We can browse these help information at any time without having to connect to the Internet. However, if you feel that it is not enough, you can go to the Frenode IRC server (irc.freenode.net)#gitor#githubchannel to seek help from others. There are always Lushu on these two channels, and most of them have a lot of git knowledge and are ready to help others.
1.7 Summary
At this point, you should have a basic understanding of Git, including the difference between it and the CVCS you used before. Now, you should have Git installed on your system, set your name and email. Let's go on to learn the basics of Git.
One of the "go" git explanations: Git starts