Distributed and centralized version control tools-svn, git, and mercurial

Last Update:2018-12-05 Source: Internet

Author: User

Tags git workflow mercurial

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Http://blog.csdn.net/zl728/article/details/5952995

Phenomenon. In recent years, our focus on version control tools seems to be changing. at the beginning, we primarily and only aimed to monitor the code so that we can safely return to the old version so that we can diagnose problems in the code. later, our focus was on how to smooth collaboration between people. this focus is not to replace code monitoring, but is based on code monitoring and built on it. now we are paying more and more attention to the use of these tools to describe code changes, so there is a need to rewrite the history rewriting command. of course, the description of code changes also needs to be based on the first two concerns.

We can divide the application of the version control tool into six layers:

0. No Version Control

There is no version control solution, or you can use a shared file system and regularly back up it. if a developer or a maximum of several developers share code without tools, the risks they face can be imagined:

The code may be incompatible at any time.
The code may be lost due to the developer's error.
If developers want to rewrite the changes made by others, it will be easier.

1. Preliminary Exploration

Developers have a workspace on the network and they cannot work online.
Running a code build may mean you have time to get a good meal.
Reconstruction is slow even if it can be done.
The checkout code may take a whole night.
The checkin code is also slow.
No atomic commit.
Branching and Tagging are expensive.
If branch is set up individually or locally, it means another checkout.
Centralized, rather than published.
The tracking of merge points is very slow or cannot be used at all.
At this time, the tool has no way to understand the merging of renaming.
The code library sometimes crashes and requires a high proportion of experts/developers, such.

At this time, the tool has basic version control functions, such as checkout, version record and lock file. this usually means that developers work on the same code, and Code Synchronization will depend on the lock status of each code file. this tool has problems in terms of expansion and long-term work. renaming resources is almost impossible. branching and Tagging operations require the permission to operate three copies of the Code at the same time, and may require a good or two-column incense. for example, VSS.

2. Clumsy

The developer has a local copy and can work online.
The local file system means that the construction speed is greatly improved.
The reconstruction time is enough for a cup of tea.
The checkout speed is already very fast.
Checkin may be slower.
There is still no atomic commit.
Branching and Tagging are still expensive.
Centralized, not distributed.
The tracking of merge points is very slow or cannot be used at all.
There is no way to merge and rename files. Some extension follow-up conflict resolution mechanisms need to be used before submission.
The code library sometimes crashes and the proportion of experts/developers has been optimized, such.

Such as CVS and TFS.

3. Basic Molding

Developers can copy data locally and Work offline.
You can quickly build a local file system.
It can be reconstructed quickly.
Checkout and checkin are both very fast.
Finally, an atomic commit is available.
Lightweight Branching and Tagging operations.
Basic merge operations.
The personal/local branching operation still needs to be checkout again.
Because it is still centralized, rather than distributed.
Basic merge point tracking.
There is no way to merge and rename files. Some extension follow-up conflict resolution mechanisms need to be used before submission.
The code library sometimes crashes, and the proportion of experts/developers is very low, such.

Such as Subversion. 4. effective and reliable

Developers can copy data locally and Work offline.
You can quickly build a local file system.
It can be reconstructed quickly.
Checkout and checkin are both very fast.
Non-Operational Code Synchronization and updates are fast.
Finally, an atomic commit is available.
Lightweight Branching and Tagging operations.
Advanced Branching and merge operations.
The personal/local branching operation still needs to be checkout again.
Because it is still centralized, rather than distributed.
Comprehensive merge point tracking.
Merge and rename files can only be implemented through the configured branch ing. Otherwise, you must revise the file before submission.
The code library rarely crashes, and the proportion of experts/developers is very low, such.

Such as Perforce. 5. High Speed, invisible, and highly available

Developers can copy data locally and Work offline.
You can quickly build a local file system.
It can be reconstructed quickly.
Checkout and checkin are both very fast.
Non-Operational Code Synchronization and updates are fast.
Finally, an atomic commit is available.
Lightweight Branching and Tagging operations.
Advanced Branching and merge operations.
Very efficient personal/local branching operations.
Distributed, rather than centralized.
Comprehensive merge point tracking.
Seamless merge of Renamed files without any configuration.
The code library rarely crashes, and the proportion of experts/developers is almost zero, such.

Such as Git and Mercurial. through the previous version control tool evolution process, we can basically see the features and advantages of distributed tools. compared with the previous customer-server-side centralized system, it adopts a P2P approach. the client no longer needs to synchronize code from a single central code library. Copying code from each endpoint is a real code library. the distributed version control system synchronizes code by exchanging patches between endpoints. This method determines several important differences between the distributed system and the centralized system:

By default, there is no reference to the standard code library; only copy of The Work code.
Because you do not need to communicate with the central server, the execution speed of general operations (such as submitting, viewing history, and restoring and modifying) is very fast. communication is required only when code changes are pushed to other endpoints or when pull code changes are made from other endpoints.
Each copy of the code can be used as a remote backup of the code library and its change history, which provides natural protection for data loss.
Encourage testing branch-it is easy and fast to create or destroy branch.
Cooperation between peers has become very easy.

The current project uses Subversion for version control, but we use git-svn, here we mainly use these two representative tools to compare the advantages of centralized and distributed tools.

Subversion advocates a single central code library model and does not advocate large-scale branching. in a continuous integration environment, that is, the environment in which we work every day, this model is very suitable. this is one of the reasons why Subversion is so popular and its application scope is so wide.

Although distributed systems allow you to have enough flexibility to arrange your own workflows, most people still use continuous integration, this means that a main code library can be shared. the current version control system has a magic merge tool, but these merge operations are still limited to text. therefore, semantic consistency still requires continuous integration. the result is that even if a team is using a distributed version control system, they still need a master version library.

Even so, the distributed system still has some experience that SVN cannot provide.

In a distributed management system, you can have a complete copy of the code library on your local disk. operations on the code library do not need to be performed on the central server over the network, so the speed will be very fast. especially when you are viewing logs and comparing them with the old version of the code or other operations that require the complete code library, this speed improvement will be very obvious. for centralized systems, you may feel a little slow in the LAN, but if you work in a distributed project and your code library is in another continent, this will be a very big problem.
If you often go around and cannot establish network connections with the code library at any time, a distributed management system will enable you to work with the code library at any time. you can submit your jobs, view history, and compare differences between versions anytime, anywhere.
Another experience may not be a tool problem, but a social problem. distributed version control tools Encourage fast branching and experimentation. in SVN, you can also perform branching operations, but your operations are also visible to other people working on this code base. This may not be a big problem, but it does reduce people's desire to do some experimental work. the distributed system encourages you to record your work code: You can submit unfinished modifications to your local code library, or even code that cannot pass tests or be compiled. you can also perform these operations in SVN, but creating these branches in a public space is always daunting.

In a specific situation, SVN also has its own advantages. If you need to manage binary files (such as Word documents or ppt files) that cannot be merged by the version control system, you should roll back to the exclusive checkout lock mechanism, which requires a centralized system. in addition, SVN is easier to use: You have a code base, and all the changes point to this code base. If you know how to create, submit, and checkout, you can start using it, like branching, updating these operations will naturally become familiar with the use process. SVN has some very useful client software, and almost all mainstream ides have plug-ins integrated with SVN, which can greatly help you use SVN.

Git adds complexity. It seems that there are always two modes for operations: checkout and clone, commit and push ...... you need to know which commands are for local operations and which are for the server or the main code base. in fact, the command and Thinking Modes of Git are different from those of other version control systems. bulter Cole once described Git as follows: "It is a magical and powerful thing, and it can do almost anything you want it to do. Only you know how to make it do ". opponents of Git will also complain about Git's lack of discoverability, and it is difficult for you to deduce its behavior from its surface design. the Git supporter thinks that this is only because Git uses different thinking modes from other systems. You need to forget the previous knowledge about version control systems to better appreciate Git. in any case, Git is very attractive to those who like to study the internal working mechanism of things.

In general, Git performs better in handling branching than Mercurial, especially for short-term branch for testing and checkpoints. mercurial advocates another mechanism, for example, to quickly clone a code library or use patches, but Git's branching mode is simpler and easier to use. mercurial has the same problem when processing large binary files. we recommend that you use SVN to manage binary files. If you only need to manage a few binary files and it is not worthwhile to establish a separate management mechanism, Mercurial will be able to handle them.

In addition, Git is a perfect choice for open-source projects because it can resonate so much on the Internet. you can create a new project branch, submit changes to your own project branch, and ask the project maintenance personnel to put your changes in pull. with Git, it is so convenient and natural. even if you do not have the permission to submit changes to the project, you can build your own code library online and release your own patches, anyone who likes your patches can also put them in their own code libraries, including project maintenance personnel.

Git has a region called "staging area. you can build your submission in this intermediate area between your submission to the code library. more importantly, you can only submit some modifications, rather than submitting all the modified files. you can even submit only one modified part of the file.

Git is very flexible and very TIMTOWTDI (There is more than one way to do it). You can use any workflow you like and Git will support it.

The main workflows are as follows:

1. SVN format

Centralized workflow, which is also a very common Git workflow. if someone else submits your last fetch code, Git will not allow you to push your code to the main code base.

2. Integration Management Form

In this workflow, an integration manager submits the code to the "blessed" code base. Other developers clone the code from this code base and push the modification in their own code base, and let the Integrated Management Personnel pull their modifications. this is the development mode that is often used by most open-source projects and GitHub.

3. dictatorship and Lieutenant form

For larger projects, you can set the developer's development mode to a development mode similar to that of the Linux kernel. some people are responsible for a specific subsystem (lieutenant) of the project and merge all modifications to this subsystem. in addition, there will be another consortor (dictatorship) who may modify the pull code from his/her lieutenant and submit the "blessed" code base. everyone can copy the code from the blessed code library.

Once again, Git supports workflows flexibly. You can match, mix, and select workflows based on your needs.

Let's take a look at the advantages of Git over SVN:

Git has a "clean" command. SVN urgently needs this command.
Git has a "bisect" command.
SVN creates a. svn directory in each folder, while Git creates only one. git directory.
In SVN, each file or folder may come from a different version or branch. This may cause confusion.
No matter when you delete something, you need to tell SVN that Git will discover and handle it on its own.
In Git, it is easy to ignore the syntax, for example *. pyc, which will be applied to all subfolders. of course, you can ignore the content in a specific folder. in SVN, it is difficult to apply an ignore mode to all subfolders.
The ignore settings in Git are "private". These settings are included in. git/info/exclude and will not affect others.
Git tracks content rather than files, which provides better support for merging renamed files.
The size of the Git code library is much smaller than that of SVN.
In Git, you can rewrite the history. Before submitting, this will be of great help for preparing the patch set and modifying previous errors.
Another point is the problem we encounter in this special environment. We pair programming every day, and we often change pair, even when a story is half done. the code is not completely submitted here, and it only exists on one machine, and the host of this machine may need to do other story. in this case, it is very convenient to use git-svn. because we do not need to submit code that is not completed or can not be compiled to SVN.

It is said that Git currently does not support part of the code library checkout/clone, but it is under development and supports submodule. SVN can only checkout a subfolder from the code library as needed. the SVN version is shorter and predictable, while the Git version is a 40-bit hexadecimal numeric string. git has a great advantage in Branch processing, but I have never used branch so far, so I have not yet had a deep understanding of this part.

Refer:

Martin Fowler: VersionControlTools

Martin Fowler: MercurialSquashCommit

Why Git Better Than X

GitSvnComparison

Distributed Revision Control

A Maturity Model for Source Control

Tech Talk: Linus Torvalds On Git

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More