See Git from a principle perspective

Last Update:2016-06-12 Source: Internet

Author: User

Tags hosting using git version control system

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. How to play Git

Welcome to Coding Technology Small Hall, my name is Tan He, at present I am in coding.net mainly responsible for webide and codeinsight development. The main content I bring today is the principle and use of Git.

When it came to git, the first impression was nothing more than SVN's version control system, but in fact, they had a very big difference, at least SVN didn't play as much as Git. Let me give you a few examples to say briefly.

1.1 Build a blog

Nanyi the people who wrote the blog into three stages

Use free space, such as CSDN, Blog Park.
Find free space limits too much, so they buy their own domain name and space, build independent blog.
Independent blog Management is too troublesome, it is best to retain control of the premise, let others to tube, their own responsible for writing articles.

In fact, the third stage refers to the use of Pages services. Many companies, such as Coding, Github, and other code-hosting platforms, have launched Pages that can be used to build personal blogs. The Pages service does not require complex configuration to complete the blog building.

In the process of using Pages, by using markup Language (Markdown) to write a blog, push to the server, you can see the newly published blog.

No need to manage the server, lower the threshold of building a blog, while maintaining the user's high-level customization rights to the blog.

1.2 Writing books

A lot of people like to write blogs, blogs, and then put together a book. such as Matrix67 "fun of Thinking", Nanyi "how to become thinking" is such an example.

Actually the book is not far away from us, why? Because there are gitbook services like this.

For users of the GIT + Pages service, Gitbook is easy to use because Gitbook is using Git and markdown.
You can bring together your Markdown blog copy to form a book. The content of the typesetting Gitbook will help you do, we are only responsible for the content. By writing the content, we get instant access to four versions of the ebook in HTML, PDF, epub, mobi. This is a preview of the HTML version:

There is a explore channel on the Gitbook, which lists all the public books (and of course you can search directly).

In fact, in addition to writing books, but also with others to the translation of foreign language information, for example "the Swift programming Language" Chinese version, the English version is divided into several parts, and then in the open source project by the participants to claim the translation, each person to contribute their own strength, This completes the operation that follows the official documentation update at a very fast pace. If you like a language, or technology, the lack of Chinese information, you can initiate such activities to complete the translation of foreign materials.

1.3 Talent Recruitment

Talent recruitment this piece, has not formed a certain scale yet. But there are still a lot of companies that choose to find their favorite developers on a code-hosting platform, such as Coding and Github.

Some developers have looked at this piece, specifically developed such a site, such as githuber.cn, github-awards.com.

Take Githuber For example, the site mainly provides two functions, the first one is the star chart, plainly all users in accordance with the language classification, and then based on the number of fans (star) sort.

We can easily see the top several users of the leaderboard, their open source projects, which to a certain extent can represent the development trend of the language. For example, I am interested in Java, and then I looked at the top 10, found that most are Android development, this shows the popularity of Android development.

Of course you can also see your rankings, will make you have to play strange upgrade pleasure.

The second function is search, input filters, search the programmer!

1.4 Webide

Coding Webide is an on-line integrated development environment (IDE) developed by Coding. As long as your project is stored on the code hosting platform, it can be imported into webide. You can then develop it online.

Webide also offers webterminal features that allow users to remotely manipulate Docker containers, freely install preferred packages, and easily toss.

It seems to be quite fun, if you want to play these all, Git is sure to learn well. Next, let's look at the fundamentals of Git.

2. Git principle

We can think about it now, if we design it ourselves, how we should design it.

The traditional design scheme we can easily divide into two pieces: working directory, remote repository.

But as a well-targeted distributed version control system, the first thing to do is to add a local repository.

Then we choose to add a buffer domain between the working directory and the remote repository, called Staging area.

The reasons for joining staging area are as follows:

In order to be able to implement partial submissions
To create a state file no longer work area, it pollutes the workspace.
Staging Area record file modification time and other information, improve the efficiency of file comparison.

So far there are three important areas for our local area: workspaces, staging area, and local warehouses.

Next, let's consider how the local repository stores the project history version.

2.1 Snapshots

This is three versions of the project, version 1 has two files A and B, then modified A, became A1, formed version 2, and then changed B to B1, formed version 3.

If we save every version of the project to the local repository, we need to save at least 6 files, in fact, there are only 4 different files, A, A1, B, B1. In order to save space for storage, we need to just save one copy of the same file as a method. This introduces the Sha-1 algorithm.

You can use the git command to calculate the Sha-1 value of a file.

echo ‘test content‘ | git hash-object --stdind670460b4b4aece5915caf5c68d12f560a9fe3e4

SHA-1 generates a 40-bit length hash value by calculating the contents of the file.

The Sha-1 is very characteristic:

Hash value computed from the contents of the file
Same hash value, same file content

For the content in, no matter how many times we execute, we will get the same result. Therefore, the Sha-1 value of the file is a unique ID that can be used as a file. At the same time, it has an additional function to verify file integrity.

With the help of sha-1, we can make adjustments to how the project version is stored.

2.1.1 The contents of data stored in a database

In fact, it is now consistent with the structure that git actually stores. We can preview the files that are actually stored under. Git.

As we can see, in the Objects directory, a lot of files are stored, they all use the first two bits of Sha-1 to create the folder, the remaining 38 bits are the file name. We call these files the obj file first.

For so many obj files, all the records submitted by our code are saved. These obj files are actually divided into four types, blob, tree, commit, tag, respectively. Next, let's take a look at each.

Blob
First, A, A1, B, and B1 are the blob-type obj.
Blob: Used to store the contents of the project file, but does not include the file path, name, format, and other descriptive information. Any version of any file of the project is stored in BLOB form.
Tree
Tree is used to represent directories. We know that the project is a directory with files and subdirectories in the directory. So there are blobs and sub-tree in the tree, and all are referenced using the Sha-1 value. This is the one that corresponds to the directory. From the top of the tree to the entire structure of trees, the leaf node is a blob, representing the contents of the file, non-leaf nodes represent the project directory, the top-level tree object represents a snapshot of the current project.
Commit
Commit: Represents a commit and has the parent field, which is used to refer to the parental submission. Points to a top-level tree that represents a snapshot of the project, and some other information such as the previous commit, Committer, author, message, and so on.

2.2 Staging Area

Staging area is a file with the path:.git/index

It is a binary file, but we can use commands to view its contents.
Here we pay attention to the second and fourth columns, the fourth column is the file name, and the second column refers to the blob of the files. This blob holds the contents of the file when it is staged.

The second column is the Sha-1 hash value, which is the foreign key to the content, pointing to the blob that actually stores the file's contents. The third column is the conflicting state of the file, which later tells you that the fourth column is the path name of the file.

We operate the staging area scene is this, whenever editing good one or a few files, add it to staging area, and then modify the other files, and then put into the staging area, loop repeatedly. Until the modification is complete, use the commit command to permanently save the contents of the staging area to the local repository.

This process is actually the process of building a snapshot of a project, and when we commit it, Git uses this information from staging area to generate the tree object, which is the project snapshot, which is persisted to the database. So it can also be said that staging area is the area used to build a snapshot of a project.

2.3 File Status

With workspaces, staging area, and local warehouses, you can define the status of your files.

The state of a file can be divided into two categories. One is the state that the staging area compares to the local warehouse, and the other is the state that the workspace compares to the staging area. It is also easy to be willing to split into two categories, because the first type of state is written directly to the local repository when it is committed. And the second is not. A file can have two states at the same time.

For example, a file may have both the above modified state, and the following modified state, but in fact they represent a different state, Git will use green and red to distinguish between the two modified states.

2.4 Branches

Next, look at a very important concept, branching.

The purpose of branching is to allow us to develop in parallel. For example, we are currently developing features, but we need to fix an emergency bug, we cannot fix the bug in the state of the project being modified, because it will introduce more bugs.

With the concept of branching, we can create a new branch, fix the bug, and synchronize the new functionality with the bug fix.

The implementation of the branch is actually very simple, we can first look at the. git/head file, which holds the current branch.

cat .git/HEAD=>ref: refs/heads/master

In fact, this ref represents a branch, it is also a file, we can continue to look at the contents of this file:

cat .git/refs/heads/master=> 2b388d2c1c20998b6233ff47596b0c87ed3ed8f8

You can see that the branch stores an object, and we can continue to view the object's contents using the Cat-file command.

<[email protected]> 1460971725 +0800=> committer Hehe Tan <[email protected]> 1460971725 +0800=> => add branch paramter for rebase

From the above content, we know that the branch points to a commit. Why the branch points to a commit is actually the branch in git why so lightweight answer.

Because a branch is a pointer to a commit, when we commit a new commit, the point of the branch is simply to follow the update, and creating the branch simply creates a pointer.

3. High-level command

There are two types of commands in Git, one is the toolset for doing the underlying work, called the underlying command, and the other is a more user-friendly high-level command. A high-level command is often made up of multiple bottom-line commands.

People who do not know the way the high-level feel is very strong, in fact, refers to those we most often use the git command.

3.1 Add & Commit

Add and commit should be the most frequently used high-level commands.

touch README.mdgit add README.mdgit commit -m "add readme”

Touch refers to the creation of a file that represents our modification of the contents of the project file, where the add action is to save the changes to staging area, and commits to permanently save the contents of the staging area to the local repository.

Whenever a modified file is added to staging area, Git calculates the sha-1 based on the contents of the file and converts the content into a blob and writes to the database. Then use the Sha-1 value to update the file entries in the list. In the Staging Area file list, each file name corresponds to a sha-1 value that points to the actual contents of the file. At the end of the commit, Git converts the list information into a snapshot of the project, the tree object. Write to the database, and then build a commit object to write to the database. Then the update branch points to.

3.2 Conflicts & Merge & Rebase3.2.1 Conflicts

The branches in git are very lightweight, so we use them frequently when using Git. It's not inevitable that you need to merge the newly created branch.

Merging branches in Git has two options: Merge and Rebase. However, either way, there is a possibility of conflict. So let's take a look at the emergence of conflicts first.

The situation on the diagram does not move the branch pointer to solve the problem, it requires a merging strategy. First of all, we need to be clear about who and whose merger is 2,3 with 4,5,6? When it comes to branching, we always think of the line as a combination of lines. Actually not, the real merger is 3 and 6. Because each commit contains a complete snapshot of the project, the merge is just a combination of the tree and the tree.

We can first think of a simple algorithm. Used to compare 3 and 6. But we also need a comparative standard, if only 3 and 6 comparison, then 3 and 6 compared to the addition of a file, can also be said to be 6 and 3 than delete a file, which does not accurately represent the current conflict state. So we compare the points of disagreement (merge base) of their two branches as reference points.

When compared, compare to merge base (commit 1).

First make a list of all the files in 1, 3, 6, and then iterate through the files in this list. Now let's take an example of a file in the list, which is called version 1, version 3, and version 6, respectively, in submissions 1, 3, and 6.

The Sha-1 values for version 1, version 3, and version 6 are identical, which indicates no conflict
At least one version 3 or 6 is the same as the version 1 state (meaning that the Sha-1 value is the same or not present), which can be automatically merged. For example, there is a file in 1, 3 to modify the file, and 6 to delete the file, the 6 will be the case.
Version 3 or version 6 is different from the status of version 1, the situation is more complicated, the automatic merge policy is difficult to take effect, need to be resolved manually. Let's take a look at the definition of this state.

Conflict State definition:

1 and 3:deleted_by_them;
1 and 6:deleted_by_us;
3 and 6:both_added;
1 and 3 and 6:both_modified

We take the first case example, the file has two states 1 and 3,1 indicates that the file exists in commit 1 (that is, merge_base), 3 means that the file was modified in commit 3 (Master branch), there is no 6, that is, the file in commit 6 (Featu Re branch) was deleted, and in summary, this state is Deleted_by_them.

Can look at the fourth case, the file has three states 1, 3, 6,1 means commit 1 (merge_base) exists, 3 means that commit 3 (master Branch) has been modified, 6 means (feature branch) has also been modified, in summary, both_m Odified (both sides modified).

When a non-automatic merge conflict is encountered, Git writes these states to staging area. The difference with us is that Git uses the 1-in-a-kind markup file, which represents the base version of the file, 2 represents the current branch version, and 3 represents the version of the branch to merge.

3.2.2 Merge

After resolving the conflict, we can submit the modified content as a new commit. This is the merge.

New commits can still be made after the merge.

You can see that merge is a way of not modifying branch history submission records, which is the way we use them. However, this is not easy to use in some cases, such as when we create PR, Mr or change patches to the manager, the manager in the merger operation Conflict, but also to resolve the conflict, which undoubtedly increases the burden of others.

Using rebase can solve this problem.

3.2.3 Rebase

Suppose our branching structure is as follows:

Rebase will bring all commits from Merge Base back to the target branch in the form of a patch. This allows the target branch to merge the branch directly with Fast Forward, which means no conflict occurs. Submission history is a line, which is a great boon for obsessive-compulsive patients.

If we want to see what rebase actually do, there is a way to look at Rebase's entire operation with a "slow lens". Rebase provides interactive options (parameter-i), and we can choose what you want to do for each patch.

With this interactive option, we can "step through" the rebase operation.

After testing, in fact, rebase mainly in. Git/rebase-merge generated two files, respectively, Git-rebase-todo and done files, the role of these two files can be seen by the name. The Git-rebase-todo stores the commit that rebase will be working on. And done holds a commit that is in operation or has been completed. For example, here, Git-rebase-todo stores 4, 5, 6, three commits.

First git puts a commit of sha-1 4 into done. Indicates that 4 is being manipulated, and then 4 is hit to 3 as a patch, forming a new commit of 4 '. This step is likely to conflict, and if there is a conflict, you need to resolve the conflict before proceeding.

Next, Sha-1 5 commits into the done file and then hits 5 as a patch to 4 ', forming 5 '.

Then put the Sha-1 6 commit into the done file, and then hit 6 in the form of a patch to 5 ', forming 6 '. Finally move the branch pointer so that it points to the latest commit 6 ' on. This completes the operation of the rebase.

Let's take a look at the real rebase file.

pick e0f56d9 update gitignorepick e370289 add a# Commands:# p, pick = use commit# r, reword = use commit, but edit the commit message# e, edit = use commit, but stop for amending# s, squash = use commit, but meld into previous commit# f, fixup = like "squash", but discard this commit‘s log message# x, exec = run command (the rest of the line) using shell# d, drop = remove commit

The file has three columns, the first column represents the action to be performed, all the actions that can be made are listed in the comments below, for example, pick indicates that the commit is used, reword represents the use of the submission, but the Message,edit that modified its submission represents the use of the submission, But to make some changes to the submission, the others will not say it.

The done file has the following form, and Git-rebase-todo is the same:

pick e0f56d9 update gitignorepick e370289 add a

From the diagram just now, we can see that one of the drawbacks of rebase is that it modifies the history of the branch submission. If you have pushed the branch to the remote repository, you will not be able to push the modified branch up and must be forcibly pushed with the-f parameter (force).

Therefore, it is best not to operate on a public branch using rebase.

3.3 Checkout, Revert, Reset3.3.1 Checkout

For checkout, we are generally not unfamiliar. Because the frequency of use is very high, it is often used to switch branches, or to switch to a single commit.

Here we take the switch branch as an example, from the Git workspace, staging area, the local warehouse to see what checkout do. The status before Checkout is as follows:

First checkout Find the target commit (commit), the snapshot in the target commit is the tree object which is the version of the project we are checking out.
Checkout first generates staging area content based on the tree, and then transforms it into our project file based on the tree and the blob it contains. Then modify the HEAD point to indicate the switch branch.

You can see that checkout has not modified the commit history. Just extract the contents of the corresponding version of the project.

3.3.2 Revert

If we want to use a reverse commit to recover a version of the project, then we need revert to help us finish. What is a reverse commit, that is, the old version of the added content, to be removed in the new version, the old version of the deleted content, to be added in the new version. This is useful in situations where the branch has been pushed to a remote repository.

Before Revert:

Revert also does not modify the history commit record, the actual operation is equivalent to checking out the target submitted project snapshot to the workspace and staging area, and then using a new commit to complete the "fallback" version.

After Revert:

Reset

The reset operation is much like revert, which is used to make the version "fallback" in the current branch, but the reset will modify the history commit record.

There are three common options for reset, which are-soft,-mixed,-hard. Their scopes increase in turn.

Let's see separately.

Soft will simply modify the branch point. Instead of modifying the contents of the workspace and staging area, we can then do a commit and form a new commit. This is useful when we undo the temporary submission scenario.

Before using reset--soft:

After using reset--soft:

Mixed has a staging area more than soft's scope. In fact, the mixed option is only one add operation with soft.

Before using reset--mixed:

After using reset--mixed:

The hard scope is more than mixed a workspace.

Before using reset--hard:

After using reset--hard:

The hard option causes the workspace content to be "lost".

When using the hard option, make sure you know what you're doing and don't use this option when you're confused. If you do, do not panic, because as long as git generally does not actively delete the contents of the local repository, depending on the situation you lost, can be retrieved, such as after the loss can use git reset--hard orig_head immediate recovery, or use Reflog command to view a reference to a previous branch.

3.4 Stash

Sometimes, we do some work on one branch and modify a lot of code, and then we need to switch to another branch to do something else. But do not want to do only half of the work submitted. Once you have done this, make a commit of the current modification, message fill half of work, and then switch to the other branch to do the job, and then switch back to use Reset-soft or commit amend when you are done.

GIT provides the stash command to help us solve this requirement.

Stash the workspace with the contents of the staging area, save it, and then use the Reset hard option to restore the workspace and staging area content. We can use stash apply at any time to change the app back.

Stash implement the idea of committing our changes to the local repository, referencing the commit with a special Branch pointer (. Git/refs/stash), and then recovering the commit when it is resumed. We can go a step further and see what kind of structure stash commits.

, if we provide the-include-untracked option, Git will make a commit to the untracked file, but the commit is a free State, then make a commit to the contents of the staging area. Finally, the modification of the workspace is made a commit, and the submission of untracked, staging area submission, and the base submission as the parent.

It's so complicated to provide more flexible options that we can selectively recover the content of it. For example, when recovering stash, you can choose whether to rebuild index, which is exactly the same state as the stash operation.

3.5 bisect

Finally, I want to talk about a function that has rescued me from the "Fire pit".

The project was posted to the online project with a bug, and after troubleshooting, but not to find the source of the bug. We also have a way to find the last good version, from the last to this time all the submissions in turn try, one by one. Until you find the commit that has the problem, and then analyze the cause of the bug.

Git reminds us of this scenario, which is just the same idea, but using dichotomy to find out. This is the bisect command.

Using this command is simple,

git bisect startgit bisect bad HEADgit bisect good v4.1

Git calculates a commit in the middle, and then we test it.

Based on the test results, using git bisect good or bad to tag, git automatically switches to the next commit. Repeat this step continuously until you find the commit that originally introduced the bug.

We know that the efficiency of the dichotomy is very high, 2 of the 10 is already 1024, so we test generally up to 10 times, and then more is 11 times, 12 times. In fact, this requires us to optimize the method of testing, so that simple operation can make the bug reappear. If the re-operation is very simple and simple enough for us to test with a script, it's easier to use git bisect run/test.sh, one step.

If a commit code is not running, you can use Git bisect skip to skip the current commit or use visualize to manually specify a commit in the list given by Git to test.

Happy Coding; )

See Git from a principle perspective

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More