One Daniel once said that the biggest benefit of versioning is that you can always regret it, and Git is undoubtedly the leader in many version control software, which is much more popular in the open source community, and why is it different from other version control software projects? And let's take a slow look.
Brief History
Like many of the great events in life, Git was born with an era of great controversy and massive innovation. The Linux kernel Open source project has a wide audience of participants. The vast majority of Linux kernel maintenance work is spent on patching and saving archives (1991-2002). By 2002, the entire project team began to enable the Distributed version control system BitKeeper to manage and maintain the code. By the year 2005, commercial companies developing BitKeeper ended up working with the Linux kernel open source community, and they withdrew the power to use BitKeeper for free. This forces the Linux open source community (especially Linux creator Linus Torvalds) to learn from the lesson that only developing a set of their own version control system will not repeat the same. They have set a number of objectives for the new system:
- Speed
- A simple design
- Strong support for non-linear development patterns (allows thousands of parallel development branches)
- Fully distributed
- Ability to efficiently manage hyper-scale projects like the Linux kernel (speed and data volume)
Since its inception in the 2005, Git has matured, and has retained its initial set of goals while being highly user-friendly. It's fast, it's great for managing big projects, and it has an incredibly non-linear branch management system that can handle a variety of complex project development needs.
Centralized and distributed
Git is born with Linus hate CVs and svn unrelated, CVS and SVN are centralized version control system, and Git is a distributed version control system, then centralized and distributed what is the difference?
First of all, the centralized version control system, SVN believes that a lot of people have used it, it has a central server, repository is placed on top. When the project is developed, everyone will get the latest code from above, then start doing their own work, when the work is done, then submit their own things to the central server. Everyone to this server to submit, version is also the server to pipe, but it has a disadvantage, is the central server is very important, how our network is slow or hang up, or the central server problem can not start, version control will be almost impossible.
The distributed version control system is much more secure than the centralized version control system, because everyone's computer will have a full repository, and even if you are on a plane or train, you can frequently submit updates very happily, and then upload them to the remote repository when there is a network.
The main difference between Git and other version control systems is that git only cares about whether the overall file data is changing, while most other systems only care about the specific differences in file content. This type of system (Cvs,subversion,perforce,bazaar, etc.) records what files are updated each time, and what lines are updated. Git does not store the variance data that changes before and after. In fact, Git is more like taking a snapshot of a changed file and recording it in a tiny file system. Each time you submit an update, it will take a snapshot of all of the file's fingerprint information, and then save an index that points to the snapshot. To improve performance, Git will not be saved again if the file does not change. The vast majority of operations in Git require access to local files and resources without a network connection. Because Git keeps a historical update of all current projects on the local disk, it's fast to handle. Before saving to Git, all data is evaluated for content checksum (checksum), and this result is used as a unique identifier and index of the data. Git calculates the checksum of the data using the SHA-1 algorithm, and calculates a SHA-1 hash value as a fingerprint string by calculating the contents of the file or the structure of the directory. The string consists of 40 hexadecimal characters (0-9 and a-f) and looks like this:
24b9da6552252987aa493b52f8696cd6d3b00373
Git's work relies entirely on this type of fingerprint string, so you'll often see such a hash value. In fact, everything stored in a Git database is indexed with this hash, not by file name.
Three status of files
There are only three states within Git for any file:
Submitted (committed): submitted indicates that the file has been securely stored in the local database;
Modified (modified): modified to indicate that a file has been modified, but has not yet been submitted for saving;
Staged (staged): staged indicates that the modified file is placed in the manifest to be saved on the next commit.
This is where we see the three working areas of the file flow when Git manages the project: the working directory of Git, the staging area, and the local repository.
Each project has a git directory, which is where Git stores metadata and object databases. This directory is very important, each time you clone a mirrored warehouse, the actual copy is the data in this directory.
The basic Git workflow is as follows:
- Modify some files in the working directory.
- Take a snapshot of the modified file and save it to the staging area.
- Commit the update to permanently dump the file snapshot saved in the staging area to the Git directory.
Reference: http://git-scm.com/book/zh/ch1-2.html
Git Learning Notes (i)