Git adventures (4)-behind-the-scenes story of indexing and submission

Source: Internet
Author: User
Tags sha1 hash version control system

I think if you have read the first three articles in the GIT adventures, you may already know how to use the GIT add and git commit commands; one of them is to save the files to the index to prepare for the next commit, and the other is to create a new commit (COMMIT ). However, you may not know the interesting details behind their scenes. Please allow me to come here one by one.

Git index is a temporary area between your working directory (working tree) and the Project repository (staging area ). With it, you can submit a lot of content modifications together (COMMIT ). If you create a commit, the commit is generally the content in the temporary storage area, rather than the content in the working directory.

The state of files in a git project is roughly divided into the following two categories, and the second category is divided into three categories:

  1. Untracked File)
  2. Tracked File)
    1. Modified but not saved files (changed but not updated or modified)
    2. Files that can be submitted temporarily (changes to be committed or staged)
    3. Unmodified files (clean or unmodified) since the last submission)

We have seen so many rules above. In the old way, we will build a git test project to test it:

Let's first create an empty project:

$rm -rf stage_proj$mkdir stage_proj$cd stage_proj$git initInitialized empty Git repository in /home/test/work/test_stage_proj/.git/

We also create a file with the content "Hello, world:

$echo "hello,world" > readme.txt

Now let's look at the status of the previous job catalog. You can see that readme.txt is not tracked (untracked file ):

$git status# On branch master## Initial commit## Untracked files:#   (use "git add <file>..." to include in what will be committed)##   readme.txt
nothing added to commit but untracked files present (use "git add" to track)

Add external readme.txt to the temporary storage zone: $ git add readme.txt

Now let's take a look at the status of the current working directory:

$git status# On branch master## Initial commit## Changes to be committed:#   (use "git rm --cached <file>..." to unstage)##   new file:   readme.txt#

We can see that the status of "readme.txt" is changed to "temporarily saved" and can be submitted (changes to be committed ), this means that we can directly execute "Git commit" to submit the file to the local repository.

The staging area is generally stored in the index file (. Git/index) under the GIT directory ). An index is a binary file that stores information related to the currently saved content, including the temporary file name, sha1 hash string value of the file content, and file access permissions, the content of the entire index file is stored in order by the temporary file name.

But I don't want to submit the file right away. I want to see the content in the staging area. Let's execute the GIT LS-files command to see it:

$git ls-files --stage100644 2d832d9044c698081e59c322d5a2a459da546469 0   readme.txt

If we read the "ding Jie niu" in the previous article, you will find that "there are more" in the GIT directory ". git/objects/2D/832d9044c698081e59c322d5a2a459da546469 "and then execute" Git cat-file-P 2d832d ", you can see that the content inside is" Hello, world ". When git adds a temporary partition to a file, it not only adds it to the index file (. git/index), and save its content to the "Git directory" first.

If we accidentally add unnecessary files to the temporary storage zone when executing the "Git Add" command, you can run "Git Rm -- cached FILENAME" to remove the accidentally added files from the temporary storage area.

Now, after modifying the "readme.txt" file:

$echo "hello,world2" >> readme.txt

Let's take a look at the changes in the temporary storage zone:

$git status# On branch master## Initial commit## Changes to be committed:#   (use "git rm --cached <file>..." to unstage)##   new file:   readme.txt## Changed but not updated:#   (use "git add <file>..." to update what will be committed)#   (use "git checkout -- <file>..." to discard changes in working directory)##   modified:   readme.txt#

You can see that the command output contains a piece of content: "changed but not updated... modified: readme.txt ". It may be strange to everyone that I didn't add the "readme.txt" file to the temporary storage area. How can I prompt that I have not added it to the temporary storage area (changed but not updated) is git wrong.

Git is correct. Every time you execute "Git Add" to add a file to the temporary storage area, it will perform the sha1 hash operation on the file content and add a new entry to the index file, store the file content in the local "Git directory. If the file content is modified after "Git Add" is last executed, gitwill view the file content in sha1hashield when the "Git status command" is executed. At this time, readme.txt "is displayed in two statuses at the same time: modified but not temporarily saved files (changed but not updated), saved files that can be submitted (changes to be committed ). If we submit the file at this time, it will only commit the content of the file temporarily saved for the first "Git Add.

I am not very satisfied with the modification of "Hello, World2". To cancel the modification, run the GIT checkout command:

$git checkout -- readme.txt

Now let's take a look at the status of the working directory in the repository:

$git status# On branch master## Initial commit## Changes to be committed:#   (use "git rm --cached <file>..." to unstage)##   new file:   readme.txt#

Okay, now that the project is back to the desired state, I will use the GIT commit command to submit the modification:

$git commit -m "project init"[master (root-commit) 6cdae57] project init   1 files changed, 1 insertions(+), 0 deletions(-)    create mode 100644 readme.txt

Now let's take a look at the working directory status:

$git status# On branch masternothing to commit (working directory clean)

You can see that "nothing to commit (working directory clean)". If all the modifications in a work tree have been submitted to the current head ), so it is clean, and vice versa, it is dirty ).

Sha1 value content addressing

As git is described in the next UNIX article, git is a completely new way to use data (git is a totally new way to operate on data ). Git manages all the objects (blob, tree, commit, tag…) it manages ......), All sha1 hash string values are generated based on their content as the object name. Based on Current mathematical knowledge, if the sha1 hash string values of the two data blocks are equal, then we can think that the two pieces of data are the same. This will bring several benefits:

  1. As long as git compares object names, it can quickly determine whether the content of the two objects is the same.
  2. Because the calculation method of "Object Name" in each repository is identical, if the same content exists in two different warehouses, the same "Object Name" exists ".
  3. Git can also check whether the sha1 hash value of the object content matches the object name to determine whether the object content is correct.

We can use the following example to verify whether the above is true. Create a file named readme2.txt that exactly corresponds to readme.txt and then submit it to the local repository:

$echo "hello,world" > readme2.txt$git add readme2.txt$git commit -m "add new file: readme2.txt"[master 6200c2c] add new file: readme2.txt1 files changed, 1 insertions(+), 0 deletions(-)create mode 100644 readme2.txt

The following complex command is to view the BLOB Object contained in the current commit (head:

$git cat-file -p HEAD | head -n 1 | cut -b6-15 | xargs git cat-file -p100644 blob 2d832d9044c698081e59c322d5a2a459da546469    readme.txt100644 blob 2d832d9044c698081e59c322d5a2a459da546469    readme2.txt

Let's take a look at the Blob objects contained in the last commit (Head ^:

$git cat-file -p HEAD^ | head -n 1 | cut -b6-15 | xargs git cat-file -p100644 blob 2d832d9044c698081e59c322d5a2a459da546469    readme.txt

It is obvious that although the current submission has an additional file than the previous one, they share the same BLOB Object: "2d832d9 ".

No delta, just Snapshot

Git differs greatly from most familiar version control systems, such as subversion, CVS, and perforce. Traditional systems use Delta storage systems, which store differences between each commit. Git, on the contrary, stores the complete content (snapshot) submitted each time. Before submission, git calculates the sha1 hash string value as the object name based on the content to be submitted, check whether the same object exists in the warehouse. create corresponding objects in the GIT/objects directory. If yes, existing objects will be reused to save space.

Next, let's test whether git actually saves the submitted content in the "snapshot" method.

First modify "readme.txt", add some content to it, save it for temporary storage, and finally submit it to the local repository:

$echo "hello,world2" >> readme.txt$git add readme.txt$git commit -m "add new content for readme.txt"[master c26c2e7] add new content for readme.txt   1 files changed, 1 insertions(+), 0 deletions(-)

Now let's take a look at what blob objects are included in the current version:

$git cat-file -p HEAD | head -n 1 | cut -b6-15 | xargs git cat-file -p100644 blob 2e4e85a61968db0c9ac294f76de70575a62822e1    readme.txt100644 blob 2d832d9044c698081e59c322d5a2a459da546469    readme2.txt

From the above command output, we can see that "readme.txt" already corresponds to a new BLOB Object: "2e4e85a", and the BLOB Object corresponding to "readme.txt" in the previous version is: "2d832d9 ". Next let's take a look at whether the content in the two "blob" is the same as our expectation:

$git cat-file -p 2e4e85ahello,worldhello,world2$git cat-file -p 2d832d9hello,world

You can see that the content of each submitted file is still all saved (snapshot ).

Summary

There are essential differences between git internal mechanisms and other traditional version control systems (VCS), so it is not surprising that the meaning of the "add" Operation in git is different from that of other VCs, "Git Add" not only adds untracked files to version control, but also stores modified articles in indexes.

At the same time, git becomes a very fast version control system (VCS) because "sha1 hash string value content" and "snapshot storage (snapshot)" are used ).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.