Preface
Git is a word that everyone is familiar with. The kids who develop can't do without it every day. Of course, if you do not use distributed computing in your project, you may never use git, of course, I may have never heard of it. However, this is not the point. The point is this article. Let's talk about git and get to know git from a macro perspective.
A little order
Before writing this article, I have been wondering how to locate it? Is it a basic tutorial or a general introduction to the GIT mechanism? Due to the recent tight schedule and not so much free time, I decided to write a macro-level introduction. As for the basic tutorial, I have read a lot of articles on the Internet, I write better than me, so I don't have to miss my children any more.
In view of this, my position is that the main content of this article is to talk about the changes that git has brought to us, as well as the internal implementation of git, or the mechanism. Finally, let's share with you how you feel about using git in your project.
Version Control
What is version control? Why should I care about it? I think it's silly to ask this question. You don't have to worry about developing shoes. If you really don't know, go to Baidu. So, do you know how many version control systems there are? You have considered why the company uses Git instead of SVN? Or is SVN used instead of git? Let's take a look at the differences between them.
- Local Version Control System
Do you remember how we back up files and data long ago? My practice is to copy the entire file (folder) to another hard disk or network disk. Name them by yourself. This is a pure manual job. As we all know, as long as it is manual repetitive work, the chance of errors will greatly increase. However, to solve this problem, many local version control systems have been developed long ago. Most of them use a simple database to record the differences in previous file updates.
One of the most popular ones is the revision control system, which is still visible in many computer systems today. You can use the RCS command even after installing the developer toolkit on the popular Mac OS X system. Its working principle is basically to save and manage the patch ). A file patch is a text file in a specific format that records changes in the content before and after the corresponding file revision. Therefore, based on the patch after each revision, the RCS can calculate the file content of each version by continuously patching.
- Centralized Version Control System
The emergence of the RCS greatly facilitates the control and maintenance of historical files (data. However, soon after, another problem occurred. The differences in operating systems make it impossible for developers to work collaboratively, especially in version control. Intelligent Human has found a solution. In this way, the centralized version control system (CVCs) came into being.
Such systems, such as CVS, subversion, and perforce, all have a single centrally managed server that stores the revision of all files, people working collaboratively connect to this server through the client to retrieve the latest files or submit updates. Over the years, this has become the standard practice of version control systems. In addition, centralized version control systems are also the most common and common in enterprise development.
This approach brings many benefits, especially compared to the older local VCs. Now, everyone can see to some extent what other people in the project are doing. Administrators can easily control the permissions of each developer, and managing a CVCs is far easier than maintaining the local database on each client.
There was an old saying in the Ancients that "the blessings of happiness and blessings depend on each other ". He said that everything has two sides. There is no absolute good or bad, but it is a relative comparison. What we want to pursue is a balance, or for us, the advantage is more than the disadvantage. It seems a bit out of question... Back to the question, we can see that the most obvious disadvantage of the centralized version control system is the single point of failure of the central server. If an instance goes down for one hour, no one can submit the update within this hour, and no one can work together.
If the disk of the central server fails and no backup happens, or the backup is not timely enough, there is a risk of data loss. The worst case is that all history changes of the entire project are completely lost, and some snapshots that are accidentally extracted by the client and saved locally become the hope of data recovery. However, this is still a problem. You cannot guarantee that all data has been completely extracted in advance. The local version control system also has similar problems. As long as the history of the entire project is saved in a single location, there is a risk of losing all historical update records.
- Distributed version control system
Why is it that humans are always the smartest? Humans can always solve the problems they encounter, and this time is no exception. Based on the above problems, the distributed version control system (DVCs) is available. In such systems, such as git, mercurial, bazaar, and darcs, the client not only extracts the latest file snapshot, but completely mirrors the code repository. In this way, any server that works collaboratively can be recovered from a local warehouse from any image. Because every extraction operation is actually a complete backup of the code repository.
Furthermore, many such systems can specify to interact with several different remote code repositories. By default, You can collaborate with people in different teams in the same project. You can set different collaboration processes as needed, such as hierarchical workflow, which cannot be implemented in previous centralized systems.
Thoughts
So, simply put, what kind of system is git? Let's take a look at his thoughts.
- Directly record snapshots, rather than compare differences
The main difference between git and other version control systems is that git only cares about whether the overall file data changes, while most other systems only care about the specific differences in file content. Such systems (CVS, subversion, perforce, bazaar, etc.) record what files are updated and what lines are updated each time.
Git does not store the different data before and after the changes. In fact, git takes snapshots of changed files and records them in a micro file system. Each time an update is submitted, it will view the fingerprint information of all files and take a snapshot of the file, and then save an index pointing to the snapshot. To improve performance, if the file remains unchanged, git will not save it again, but only make a link to the last saved snapshot.
- Almost all operations are performed locally.
Most operations in git only need to access local files and resources without connecting to the network. However, if CVCs is used, almost all operations need to connect to the network. Because git stores historical updates of all current projects on the local disk, the processing speed is fast.
For example, if you want to view the historical Update Summary of a project, git does not need to go to the external server to retrieve data, but read the data from the local database and display it to you. So you can read it at any time without waiting. If you want to see the difference between the current version of the file and the version earlier than a month ago, git will take out the snapshot a month ago and perform a Difference Operation on the current file, instead of requesting a remote server to do this, or pulling files of earlier versions to a local directory for comparison.
With CVCs, you cannot do anything without a network or disconnecting a VPN. However, if you use git, even if you are on a plane or train, you can submit updates very happily and upload them to a remote warehouse when there is a network. On the way home, you can continue working without connecting to a VPN.
- Always maintain data integrity
Before saving the data to git, all data must be checked and calculated, and the result is used as the unique identifier and index of the data. In other words, it is impossible for git to know nothing after you modify a file or directory. As a git design philosophy, this feature is built at the bottom of the overall architecture. So if the file becomes incomplete during transmission, or the file data is missing due to disk damage, git can immediately detect it. Git uses the SHA-1 algorithm to calculate the data checksum and calculate a SHA-1 hash value based on the file content or directory structure as the fingerprint string. The string consists of 40 hexadecimal characters (0-9 and a-f.
Git's work is completely dependent on this type of fingerprint string, so you will often see this hash value. In fact, all the things stored in the GIT database use this hash value for indexing, rather than relying on the file name.
- Most operations only add data
Most common git operations only add data to the database. Any irreversible operation, such as data deletion, makes it difficult to roll back or reproduce a previous version. In other VCs, if an update is not submitted, some modifications may be lost or confused. However, in git, once a snapshot is submitted, there is no need to worry about data loss, this is especially true for the habit of regularly pushing data to other warehouses.
For any file, there are only three States in git: committed, modified, and staged ). Submitted indicates that the file has been securely saved in the local database; modified indicates that a file has been modified, but not submitted for storage; saved files are saved in the list to be saved when the file is submitted next time.
As a result, we can see three working areas for file transfer during git Project Management: git working directory, temporary storage area, and local repository.
The basic Git workflow is as follows:
- Modify some files in the working directory.
- Take a snapshot of the modified file and save it to the temporary storage area.
- Submit the updates and permanently dump the file snapshots saved in the temporary storage area to the GIT directory.
Therefore, we can determine the status from the position of the file: if the file is a specific version saved in the GIT directory, it is in the submitted status; if the file is modified and saved to the temporary storage area, it is in the Saved state. If it has been modified since the last time it was taken out, but it has not been put into the saved area, it is changed.
Feelings
Because git is developed in Linux, it is mainly used as an open-source distributed version control system in Linux. However, thanks to some of the great gods, users in Windows can also use it. However, for compatibility, the simulation of git in Windows is a little inferior, but this does not mean that the GIT function is missing in windows. However, in windows, unexpected problems are more likely to occur. Of course, these problems can still be solved, which is a little troublesome.
For example, because the file permissions in Windows cannot be exactly the same as those in Linux, the file permissions detected by git may be displayed as changed. In addition, because the line breaks in windows are different from those in Linux, collaborative development is also prone to problems. Therefore, if you use git on Windows, you need to add the following two configuration parameters:
Git config -- global core. filemode falsegit config -- global core. autocrlf true the first sentence is to ignore changes to file permissions. The second sentence is to automatically convert LF to CRLF when the file is checkout, and automatically convert CRLF to LF when checking in.
Of course, if it is developed on the Windows platform, we recommend that you use the tortoisegit client on the GUI, and there are also git plug-ins under Java ide. For example, in eclipse, you can install plug-ins, install git. In intellij idea, this plug-in is installed by default. You only need to check activation in settings and configure it.
As for how to use git, I will not talk about it here. There are a lot of tutorials on the Internet. Here we recommend an article about how to use git in eclipse. The article address is quite detailed. You can take a look at it as needed.
Conclusion
This article focuses on some of git's macro-level things and the basic principles of git. We are committed to giving you a basic understanding of git. Of course, this is based on version control systems that have been used. In addition, you don't have to worry about using SVN or git. I personally think it is best to know about it if it is for learning. If there are requirements in the company, you can follow the requirements in the company. There is no need to use both models, depending on the company's specific circumstances. If you plan to develop towards architects in the future, you need to think more about the advantages and disadvantages of various technologies and tools, this is also the most basic.
Refer:
- Http://git-scm.com/
- Http://www.devtang.com/blog/2012/02/03/talk-about-svn-and-git/
Project management tools in git