Basic concepts and getting started with Git

Source: Internet
Author: User
Tags commit diff hash using git version control system
Original article: Click to open Link

Basic Concepts

In this chapter, we will present a design idea for a distributed version control system and its differences from a centralized version control system. In addition, we'll take you through how the distributed repository works, and why we say that creating branches and merging branches in Git is not a big deal. 1 Distributed version control, what's so extraordinary

Before discussing the concept of distributed versioning, let's take a quick look at the traditional centralized version control architecture.

The typical layout of a centralized version control system (such as CVS or subversion) is shown in Figure 1. Each developer has a working directory (that is, the workspace) that contains all the project files on his or her own computer. Once the developer has made changes locally, he or she will periodically submit the changes to a central server. The developer then picks up the changes made by other developers while performing the update operation. The current and historical versions of these files (that is, the repository) are stored on this central server. As a result, these parallel-developed branches, as well as the various named (tagged) versions, will be centrally managed.

Figure 1 Set Chinese version control

In the Distributed version control system (see Figure 2), there is no separation between the developer environment and the server environment. Each developer has a workspace for the current file operation and a local repository for all versions, branches, and tags of the project (which we call a clone). Each developer's changes are loaded into a new version commit (commit), which is submitted to their local repository first. Then, the other developers will see the new version immediately. With push (push) and pull commands, we can transfer these modifications from one repository to another. So, technically, all of the repositories here are in the same position on the distributed architecture. So theoretically, we no longer have to use the server to pass all the modifications made on one development machine directly to another development work machine. Of course, in practice, the server repository in Git also plays an important role, such as the following special version libraries.

Figure 2 Distributed version Control project repository (Blessed repository): This repository is primarily used to store versions created and released by "official". Shared repository: This repository is primarily used for file exchange between people within the development team. In a small project, the project repository itself is capable of this role. However, in the context of multi-point development, we may need several such dedicated repositories. Workflow Repository (Workflow repository): The workflow repository is typically used only to populate changes that represent a particular progression state in the workflow, such as the status after the audit passed. Derived repository (Fork repository): This repository is primarily used to isolate portions of content from the development line (for example, to isolate content that has been developed for long periods of time that are not appropriate for a common release cycle), or to isolate the ones that may never be included in the mainline That part of the development of the experiment is progressing.

Let's take a look at the advantages of distributed systems in relation to the centralized type. High performance: Virtually all operations require no network access and can be executed directly locally. Efficient way of working: Developers can quickly switch between different tasks with multiple local branches. Offline Features: Developers can perform commits, create branches, version labels, and so on without a server connection. Then upload it to the server. Flexible development process: we can build a dedicated repository for other departments in teams and companies, such as a repository built to facilitate communication with testers. This makes it easy to publish because it is only a single push on a particular repository. Backup role: Since each developer holds a copy of the repository with a full history version, the likelihood of data loss due to server failure is negligible. Maintainability: For hard-to-handle refactoring work, we can try it on a copy of the repository before it is successfully routed to its original repository. 2 Repository, the basis of distributed work

In fact, the repository is essentially an efficient data storage structure, made up of the following parts. File (BLOB): This contains both text and binary data, which will not be saved as a file name. Directory (that is, tree): Content that is associated with a file name is saved in the directory, and it contains other directories. Version (that is, commit): Each version defines a recoverable state of the corresponding directory. Whenever we create a new version, its author, time, comments, and previous versions will be saved.

For all data, they are computed as a hexadecimal hash value (for example, a value like 1632acb65b01 c6b621d6e1105205773931bb1a41). This hash value will be used as a reference to the related object and the key value to restore the data later (see Figure 3).

Figure 3 Object storage in the repository

That is, the hash value of a commit object is actually its "version number", and if we hold a hash value for a commit, we can use it to check if the corresponding version exists in a repository. If it exists, we can restore it to the appropriate directory in the current workspace. If the version does not exist, we can also import (pull back) all of the objects referenced by the commit from other repositories separately.

Next, let's look at the advantages of using this hash value and this established repository structure. High performance: Accessing data by hashing values is very fast. Redundancy-Frees up storage: The same file content can only be stored once. Distributed version number: Because the relevant hash value is calculated based on the file, author, and date, the version can also be "offline", without worrying about future versioning conflicts. Efficient synchronization between repositories: When we pass a commit from one repository to another, only those objects that do not exist in the target repository need to be transferred. And because of the help of hash values, we can quickly determine whether the object is already in existence. Data integrity: Because hash values are calculated based on the contents of the data, we can see whether a hash value matches the relevant data at any time by using Git. To detect possible unexpected changes or malicious actions on the data. Automatic rename Detection: The renamed file can be automatically detected because the hash value computed based on the content of the file has not changed. Because of this, there is no dedicated rename command in Git, just move the command. 3 branching is easy to create and merge

For most version control systems, the creation and merging of branches is often considered a high-level extension operation due to their specificity. But since Git was originally created to facilitate the developers of Linux kernels scattered around the world, the results of the combined multi-party effort have been one of the biggest challenges, so one of the design goals of git is to make it as easy and secure as possible to create and merge branches.

In Figure 4 below, we show you how a developer can create a branch for parallel development. Each point in the diagram represents a version of the project (that is, commit). Because in git, we can only version the entire project, so each point also represents the individual files that belong to the same version.

Figure 4 Branching creation due to developer's parallel development

As shown above, the starting point for the two developers in the figure is the same version. The two later made their own changes and submitted the changes. This time, the project has two different versions for each of the two developers ' repositories. In other words, they created two branches here. Next, if one of the developers wants to import another person's changes, he or she can use Git for the version merge. If the merge succeeds, Git will create a merge commit, which will contain the changes made by the two developers. At this point, if another developer also retrieved the submission, the two-developer project returned to the same version.

In the example above, the creation of the branch is non-programmatic because only two developers are developing the same software in parallel. In Git, of course, we can also open a targeted branch, which is to explicitly create a branch (see Figure 5). Explicit branching is typically used primarily to coordinate a functional parallel development.

Figure 5 Explicit branching for different tasks

When you perform a pull-back and push operation, the repository can specify which branches it is targeting. Of course, in addition to these simple branch creation and merge processing, we can also perform the following actions on the branch. Porting a branch: We can transfer commits from one branch to another repository directly. Only specific modifications are delivered: We can copy one or several commits in a branch directly to another branch. This is called the pick-up process. Cleanup history: We can transform, sort, and delete branch history. This facilitates the creation of better historical documentation for the project. We call this process an interactive re-order (interactive rebasing). 4 Summary of this chapter

After reading this chapter, we hope that you are now basically familiar with the basic concepts in Git. That is, even if you drop the book now (of course, hope not.) , you can also take part in discussions about distributed version control systems, explain the necessity and usefulness of using hash values, and introduce the branching creation and merging operations in Git.

Of course, you may also have the following questions. How should we use these basic concepts to manage projects? How should we coordinate multiple repositories? How many branches do we need? How should we integrate our own build servers?

For the first question, you can continue reading the next chapter. In the next chapter, you'll see the commands that are specifically used to create the repository, version, and replacement submissions between the repositories. For other questions, you can also refer to the chapters that describe the workflow in detail.

Plus, if you're a busy project manager, you're still hesitant about using Git. We would suggest that you look at the discussion of the limitations of Git, see Chapter 26th. Getting Started

If you want to try git, then we can start right away. This chapter will lead you to the creation of your first project. We'll show you the commands for submitting modified versions, viewing history, and exchanging versions with other developers. 1 Preparing the GIT environment

First of all, we need to install Git. You can find everything you need on the GIT website:

Http://git-scm.com/download

Git is a highly configurable software. First, we can announce the use of the Config command to configure the user name and user mailbox: [1]

> git config--global user.email "hans@mustermann.de"
2 first git project

Here, we suggest that you better be able to open up a separate project for the next git test. In short, start with a simple small project. In our small sample project, there are only two text files in the First-steps directory, as shown in Figure 1.

Figure 1 Our sample project

Before we start fiddling around with this toy project, we recommend that you make a backup first. Although in git it's not always easy to delete or destroy permanently, git usually sends a warning message whenever you want to do some "dangerous" action. But it's always good to be prepared. 1 Creating a version library

Now, we first need to create a repository to store the project itself and its history. To do this, we need to use the init command in the project directory. For a project directory with a repository, we often call it a workspace.

> cd/projects/first-steps 
> Git init
Initialized empty git repository in/projects/first-steps/.git/

The init command creates a hidden directory named. Git in the above directory and creates a repository in it. Note, however, that the directory may not be visible in Windows Explorer or in the Mac Finder.

Figure 2 The directory where the local repository is located 2 first time commit

Next, we need to add the Foo.txt and bar.txt two files to the repository. In git, we typically call one version of a project a commit, but this is achieved in two steps. The first step is to use the Add command first to determine which files should be included in the next commit. The second step is to use the commit command to transfer the changes to the repository and give the commit a hash value to identify the new commit. Here, our hash value is 2f43cd0, but it may be different because the value depends on the file content.

> Git add foo.txt bar.txt 
> Git commit--message "Sample project imported." 
Master (root-commit) 2f43cd0] Sample project imported.
2 files changed, 2 insertions (+), 0 deletions (-) 
Create mode 100644 bar.txt 
create mode 100644 foo.txt
3 Check Status

Now, let's modify the contents of the Foo.txt file, delete the Bar.txt file, and add a new file called bar.html. The status command then displays all changes that have occurred since the project was last submitted. Note that the new file bar.html is marked as not tracked here, because we have not registered it with the Add command to the repository.

> Git Status 

# on branch Master 
# Changed but not updated: 
# (use "Git add/rm <file> ..." to update wh At'll be committed) 
# (use "Git checkout--<file> ..." to discard changes in 
#                                                     working directory) 
# 
#      deleted:   bar.txt 
#      modified:  foo.txt 
# 
# untracked files: 
# (use "  git add <file> ... "to include in what'll be committed) 
# 
#      bar.html
no changes added to commit (use "Git add" and/or "Git Commit-a")

If we also want to see more detail, we can also display each of its modified rows by using the diff command. Of course. There are a lot of people who might think that the output of diff is a very difficult thing to read. Fortunately, in this area, we have many tools and development environments available, and they can show it all more clearly (see Figure 3).

Figure 3 diff report in the Graphics tool (KDIFF3)

> Git diff foo.txt 
diff--git a/foo.txt b/foo.txt 
index 191028.090387f 100644 
---a/foo.txt 
+ + + B/foo . 
txt @@-1 +1 @@- 
\ No NewLine at end of file 
+foo foo 
\ No NewLine at end of file
4 Commit Changes

Next, all of the changes must be filed into a new commit first. We will execute the add command on the modified and new files and use the RM command for the files to be deleted.

> Git add foo.txt bar.html 
> Git rm bar.txt 
rm ' Bar.txt '

Now call the status command again and we'll see that all the changes have been included in the next commit.

> Git Status 
# on branch Master 
# changes to being committed: 
#   (use "git reset HEAD <file> ..." To u nstage) 
# 
#       New file:   bar.html 
#       deleted:    bar.txt 
#       modified:   Foo.txt 
#

Then commit the changes with the commit command.

> Git commit--message "Some changes." 

[Master 7ac0f38] Some changes. 

3 files changed, 2 insertions (+), 2 deletions (-)  
Create mode 100644 bar.html  
Delete mode 100644 bar.txt
5 Show History

The log command can be used to display the history of the project, and all commits are sorted in descending order of time.

> Git log

commit 7ac0f38f575a60940ec93c98de11966d784e9e4f 
author:rene Preissel <rp@eToSquare.de> 
Date:thu Dec 2 09:52:25 +0100 

    Some changes. 

Commit 2f43cd047baadc1b52a8367b7cad2cb63bca05b7 
author:rene preissel <rp@eToSquare.de> 
date:thu Dec 2 09:44:24 +0100 

    Sample project imported.
3 Git's collaboration features

Now we have a workspace that holds project files and a repository for project history. In a traditional centralized version system like CVS and subversion, every developer has his or her own workspace, but everyone shares a common repository. In Git, each developer has a workspace that belongs to his or her own, self-contained repository, so it's a complete version control system that doesn't rely on a central server. Developers can collaborate on projects by exchanging submissions in their respective repositories. Let's do an experiment and create a new workspace so we can simulate the activities of the second developer. 3.1 Clone Repository

Our new developer will first have a copy of his or her own repository (also known as a clone). The copy contains all the original information and historical information for the entire project. Below. We use the Clone command to create a clone.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.