The implementation of Git client in Webide

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

<blockquote> <blockquote> Coding Webide is an on-line integrated development environment (IDE) developed by Coding.net. You can create a project workspace through webide, perform online development, debug, etc., and have a functional Terminal. Due to the high threshold of Git usage, webide provides a convenient GUI interface that was previously implemented by Webide for basic git client Features. This update, added merge,stash,rebase,reset, tags a few advanced features, so that developers use webide efficiency greatly improved! The following is a coding.net Engineer's experience in implementing Git functionality in Webide. </blockquote> </blockquote>Version controlA system that manages changes in the content of documents, programs, configurations, and so On.In fact, version control is not difficult to understand, in fact, even if the programmer is not unfamiliar to him, such as Windows system restore, Mac Timemachine. At some point, they record the status of the system or the contents of the file, and can recover it when needed.For programmers, He has the following benefits: <ol> <ol> <li>Recovery: can recover file contents when accidentally deleting files or changing the wrong file</li> <li>Rollback: a major issue with the new version can be rolled back to the previous correct State.</li> <li>Collaboration: different developers develop according to the same version, forming different versions that can be easily merged together.</li> </ol> </ol>A common version control systemThe common version control system is CVS, SVN, Mercurial, Git, and so On.These four version control systems can be divided into two groups according to the requirements of the network, one set is cvs, SVN, a group is Mercurial, Git. <ul> <ul> <li>CVS, SVN: using a central repository, developers need to remove code from the central repository</li> <li>Mercurial, Git: using a local repository, developers can develop locally</li> </ul> </ul>The first set of requirements must be connected to the Company's network to work, while the second set of warehouses in the local, means that do not connect to the Company's network, further can be said to be offline to Work.Distributed version control systems such as git and mercurial are becoming increasingly popular and are slowly replacing the cvs, SVN "central" version Control System.Why Choose GitWhat's the reason for git to stand out from so many version control systems? <ol> <ol> <li>Local Submissions: This means that you can work offline, whether you're at home or on the subway, and don't need to connect to the Company's Network.</li> <li>Lightweight branching: Git's Lightweight branching allows you to quickly switch project Versions. This feature is especially important in some scenarios, especially when we are in the process of developing and suddenly find an urgent bug that needs repairing, we can quickly switch branches and fix bugs.</li> <li>Ease of conflict resolution: because of the lightweight branching, git also encourages us to use branching for Development. But when we merge the branches into the trunk, there will inevitably be conflicts, and the way git resolves conflicts is very user friendly.</li> <li>There is a powerful code hosting platform supported by GitHub and Coding: there is a lot of open source code on GitHub and Coding, and the users on these two platforms are very active, using git, helping to get in touch with more excellent projects, good developers, and very helpful to our Growth.</li> </ol> </ol>Examples of Git principlesA classic git Operation.<pre class="hljs"><pre class="hljs"><code>touch README.mdgit add README.mdgit commit -m "add readme"</code></pre></pre><code class="hljs css">touch READEME.md</code>Can represent create, modify file operations <code class="hljs css">git add README.md</code>Indicates that changes to the file are added to the staging area <code class="hljs sql">git commit -m "add readme"</code>Commits changes to the warehouseWe already know this, so what does it mean to add to the staging area and submit to the warehouse specifically?Three different statesGit has three states: workspaces, staging area, and local Repositories. <ul> <ul> <li>Add: Working area, Staging Area</li> <li>Commit: Staging area, Local Warehouse</li> </ul> </ul>Working directory We know, we usually write code, is done in the working directory.Staging area is also called an index, which holds the list of files to be submitted the next Time.A local repository is where Git uses the data to save the Project. Committing the code means permanently saving the contents of the file to the Database.First look at the local repository, where the files in the project are saved as snapshots in the local repository.Snapshots in GitEvery version is a full snapshot of the Project. Instead of files that were modified in the snapshot, Git uses the link to point to the previously stored File.This brings up a question, what is the link? How to quickly know if the content of the file has changed? The scenario in Git is to use SHA-1.SHA-1<pre class="hljs"><pre class="hljs"><code>echo ‘test content‘ | git hash-object --stdind670460b4b4aece5915caf5c68d12f560a9fe3e4</code></pre></pre>Characteristics: <ul> <ul> <li>Hash value computed from the contents of the file</li> <li>Same hash value, same file content</li> <li>As a unique ID</li> </ul> </ul>SHA-1 the contents of the file through the algorithm to generate a 160bit message digest, that is, 40 hexadecimal digits. An important feature of SHA-1 is that it is almost guaranteed that if the SHA-1 values of the two files are the same, then they are exactly the same content.The above code, no matter how many times it is run, gets the same hash value. This hash value can be considered to be the unique ID of the File.All data in Git calculates the hash value before it is stored, and is then referenced with that hash Value. therefore, This ID can represent any one commit, or a snapshot of the code at any one time, in addition to the unique representation of any version of the File.Data that is actually stored in git<code class="hljs fsharp">find .git/objects -type f</code>Let's take a look at the data actually stored in git, which looks messy, which is stored in The. git/objects and then uses the first two bits of the hash value calculated by Sha-1 as the name of the folder, followed by the 38 bits as the file Name.In so many files, it can be divided into 4 types, namely blob, commit, tree, and Tag.By following these types, you get a relationship like this (ignoring tag).Each wireframe represents an object, which is a file under the objects Directory.The string of letters and numbers above each object is the last directory name + file name of object, which is the Sha-1 hash Value.The first row format for each object is consistent, consisting of two columns, the first column representing the type of object, and the second column is the length of the file Content.Next we'll look at each of these types:Blob: used to store the contents of a project file, any version of any file of the project is stored in BLOB form. But does not include the file path, name, format and other descriptive Information.Tree: used to represent a directory in a project, we know that there are files and subdirectories in the Directory. So there are blobs and sub-tree in the Tree. This is the correspondence to the Directory. The tree also contains the path and name of the File. From the top of the tree to the entire structure of trees, the leaf node is a blob, representing the contents of the file, non-leaf nodes represent the project directory, then the top-level tree object represents the current project Snapshot.Commit: a commit represents a commit. The value of the tree inside points to a snapshot of the Item. There are other information, such as parent,committer, author, message, and so On. The tree is seen as a structure of trees, where blobs can appear as leaf nodes. A commit can be seen as a DAG with a directed acyclic Graph. Because a commit can have a parent, it can have two or more parent.At this point, we know the local warehouse. Next look at staging Area.Staging AreaStaging area is a buffer between the workspace and the local repository, which holds the file list information that will be submitted the next Time. It is actually a file, the path is: <code class="hljs perl">.git/index</code> . Since the file is a binary file, there is no way to look directly at its contents, but it can be viewed using git commands.The meaning of each column is, in turn, the file permissions, file blob, file status, file Name.The second column refers to the blob of the File. This blob holds the contents of the file when it is Staged.We operate the staging area scene is this, whenever editing good one or a few files, add it to staging area, and then modify the other files, and then put into the staging area, loop repeatedly. Until the modification is complete, use the commit command to permanently save the contents of the staging area to the local repository.This process is actually the process of building a snapshot of a project, so it can be said that staging area is the area used to build the snapshot of the Project.BranchNext look at the concept of branching, first look at a picture:Each point in this diagram represents a commit. The information we can see from this picture is: <ul> <ul> <li>Lines that diverge from any point can be called Branches.</li> <li>Branches can be merged</li> </ul> </ul>Implementation of the BranchIn The. git/head file, The current branch is Saved.<pre class="hljs"><pre class="hljs"><code>cat .git/HEAD=>ref: refs/heads/master</code></pre></pre>In fact, this ref represents a branch, it is also a file, we can continue to look at the contents of this file:<pre class="hljs"><pre class="hljs"><code>cat .git/refs/heads/master=> 2b388d2c1c20998b6233ff47596b0c87ed3ed8f8</code></pre></pre>You can see that the branch stores an object, and we can continue to view the Object's contents using the Cat-file command.<pre class="hljs"><pre class="hljs"><code><[email protected]> 1460971725 +0800=> committer Hehe Tan <[email protected]> 1460971725 +0800=> => add branch paramter for rebase</code></pre></pre>From the above content, we know that the branch points to a commit. Why the branch points to a commit is actually the branch in git why so lightweight answer.Because a branch is a pointer to a commit, when we commit a new commit, the point of the branch is simply to follow the update, and creating the branch simply creates a pointer.Now that git is done, let's look at jgit.JgitJgit is a fairly robust GIT implementation implemented in Java, and the Git plugin Egit in the Eclipse IDE is based on Jgit Development. Like git, it provides the underlying commands and high-level COMMANDS.The entrance to the high-level command is the Git class. High-level commands understand that most of our clients using Git are high-level commands.such as add, commit, checkout, and so on are high-level commands, they provide friendly interaction, often a command to complete the effect you Want.The entry for the underlying command is the Repository class. The underlying commands are different from the high-level commands, which are directly scoped warehouses (Repository). such as abstracttreeiterator, is used to traverse the tree structure, Dircache is used to manipulate staging area, Revwalk is used to traverse the commit, ObjectInsert is used to generate obj, Objectloader is used to load an object.A high-level command is often made up of several underlying COMMANDS.Repository (warehouse)As the beginning of everything, you need a Repository.<pre class="hljs"><pre class="hljs"><code>Repository repository = new FileRepositoryBuilder() .setGitDir(new File("/home/tan/GitTest/.git")) .readEnvironment() .build();</code></pre></pre>You only need to pass in the path of the warehouse when you use it, and it will automatically read some necessary environment Variables.ObjectinserterObjectinserter is used to insert data into the Git database, which is the objects Directory. The type of insert is the four types we have just mentioned, namely Blob, Tree, Commit, Tag.<pre class="hljs"><pre class="hljs"><code>try (ObjectInserter inserter = repo.newObjectInserter()) { ObjectId objectId = inserter.insert(Constants.OBJ_BLOB, new String("test").getBytes()); inserter.flush();}</code></pre></pre>The second parameter represents the data to be inserted, and the data is automatically compressed using zlib.TreewalkUsed to traverse the directory structure, either as a workspace, a staging area, or a project snapshot (repository).<pre class="hljs"><pre class="hljs"><code>try (TreeWalk treeWalk = new TreeWalk(repo)) { treeWalk.setRecursive(true); treeWalk.addTree(new FileTreeIterator(repo)); treeWalk.addTree(new DirCacheIterator(repo.readDirCache())); while (treeWalk.next()) { AbstractTreeIterator treeIterator = treeWalk.getTree(0, AbstractTreeIterator.class); DirCacheIterator dirCacheIterator = treeWalk.getTree(1, DirCacheIterator.class); }}</code></pre></pre>Treewalk is used to traverse the structure of the tree, it is a more powerful thing is to traverse multiple trees at the same time, the idea of traversing a multi-lesson tree to do a merge file list, and then traverse the list, no call gettree will return a null Value.In fact, git status is what this principle does: <ol> <ol> <li>Changed: available in repository, idnex, different content</li> <li>Removed: exists in the repository and does not exist at index</li> <li>Added: exists in index, does not exist in repository</li> <li>Untracked: not in the repository and index, only in the working directory</li> <li>Modified: at index, in the workspace, and the file content is different</li> <li>Missing: exists in index, does not exist in the workspace</li> </ol> </ol>RevwalkRevwalk is used to traverse Commit.<pre class="hljs"><pre class="hljs"><code>try (RevWalk revWalk = new RevWalk(repository)) { revWalk.markStart(one); revWalk.markStart(two); revWalk.setRevFilter(RevFilter.MERGE_BASE); RevCommit base = revWalk.next();}</code></pre></pre>In our example, we tagged two commits, and the filter we set is merge_base, which automatically finds the merge_base of the branches where the two commits are Located. Where merge_base can be considered as bifurcation points of a branch, merge_base will be used as a reference when Merging.Using the underlying commandHigh-level commands are actually made up of a number of underlying commands, such as the add and commit that we use most often: <ul> <ul> <li>Add <ul> <li>Use Objectinserter to write file contents to Objects (blob), get BLOB ID</li> <li>Write the Blob ID to staging area using Dircache</li> </ul></li> <li>Commit <ul> <li>Use Dircache to generate index tree</li> <li>Use Objectinserter to write the tree to the repository (tree) and get the tree ID</li> <li>Build a commit, Write a tree id, and set its parent, message, and other information</li> <li>Use Objectinserter to write commit to Objects (commit), get commit ID</li> <li>Writes the commit ID to the current branch, allowing branch to point to the latest commit</li> </ul></li> </ul> </ul>High-level commandThe above complex operation can be simply replaced with the underlying COMMAND.<pre class="hljs"><pre class="hljs"><code>git.add().addFilepattern("README").call();git.commit().setMessage("add readme").call();</code></pre></pre>Limitations of advanced operationsHigh-level commands are easy to use, but they offer limited functionality. Here we take the merge Example.Using the Jgit Merge APIIt is easy to merge using the interface provided by jgit, just specify the branch you want to Merge.<pre class="hljs"><pre class="hljs"><code>MergeCommand merge = git.merge();merge.include(branch);MergeResult result = merge.call();</code></pre></pre>But after the merge, what happens when the file conflicts, how to resolve the conflict? In fact, in addition to merge,stash, rebase and so on operations will also have a conflict. This means that the processing of git conflict files is one of the important functions of the Client.Unfortunately Jgit does not provide a solution to the conflict, so this requires us to solve the problem ourselves.Resolve Conflicts:An ideal solution to conflict resolution is to divide the conflicting files into three columns based on local modifications, the base version, and the modifications to merge the Branches.In this way, we can visually control the conflicting content, and can easily choose or discard the Changes.Optional options <ol> <li><li>Calculate Merge BaseThe first one is to calculate the merge_base of these two branches. This allows us to obtain three commits, each of which records a snapshot of the file at the time of the Commit. And we just have to take the contents of the conflicting file out of the Snapshot. But the disadvantage of this scheme is that we only know the branch to merge at the moment of merging, and then we want to know that we can only go to the. git merge_head, and other ways such as stash, rebase and other actions caused by the conflict will not generate the File.</li></li> <li><li>Information on using staging areaThink about us when we have a merge conflict state, git status lists the conflicting files, as well as the types of conflicts, such as "both sides modified", "deleted by us", "add both" and so on, git if you get this information?If there are conflicting files, we look at staging area and we can see something like this:<pre class="hljs cpp"><code>git ls-files --stage100644 6e9f0da13f19b444ec3a9c3d6e795ad35c0554a2 1 Readme100644 29d460866c44ad72cc08ef4983fc6ebd48053bab 2 Readme100644 12892f544e81ef2170034392f63c7fc5e6c6ccd9 3 Readme</code></pre>There are four states in the original staging area for marking Files: * 0:standard Stage * 1:base Tree Revision * 2:first Tree revision (usually called "ours") * 3:second Tree revision (usually called "theirs")Next we'll look at how these 4 states represent conflicting states.</li></li> </ol>Status of File Conflicts:Assuming that we are currently in the master branch, the branch to be merged is test, and the development history is as follows:Now assume that there is a file (Readme) conflict in the merge process, we query staging area The state of the file (there can be multiple): <ul> <ul> <li>1 and 2:deleted_by_them;</li> <li>1 and 3:deleted_by_us;</li> <li>2 and 3:both_added;</li> <li>1 and 2 and 3:both_modified</li> </ul> </ul>We take the first case example, the file (Readme) has two states 1 and 2,1 indicates that the file exists in commit 1 (that is, merge_base), 2 means that the file is modified in commit 2 (master branch), there is no status 3, that is, the file in Comm It 3 (test Branch) was deleted, and in summary, this state is Deleted_by_them.Can look at the fourth case, the file (Readme) there are three states 1, 2, 3,1 indicates that commit 1 (merge_base) exists, 2 means that commit 2 (master Branch) has been modified, 3 means (test Branch) is also modified, summed up is B oth_modified (both sides modified).Get three versions of a conflicting fileKnowing the state of the conflicting files, you can get three versions of the conflicting files in staging Area. The code is as Follows:<pre class="hljs"><code>Dircache Dircache = Repository.readdircache ();//in staging area, All files are in alphabetical order, so the different states of the file are attached int eidx = Dircache.findentry (path )///nextentry will automatically transfer files with the same file name to find the next File. int lastidx = Dircache.nextentry (eidx);//in The [eidx, lastidx) interval is the different version of the file conflict for (int i=0; I<Lastidx-eidx;I++) {DircacheentryEntry =Dircache.getentry (Eidx +i); If it isMerge_baseif (entry.getstage () == dircacheentry.stage_1) readblobcontent ( entry.getobjectid ()); If this is the current branch else if ( Entry.getstage () == dircacheentry.stage_2) readblobcontent (entry.getobjectid ());//if It is the branch to be merged else if (entry.getstage () == dircacheentry.stage_3" readBlobContent ( Span class= "hljs-attribute" >entry.getobjectid ()), </code></pre>At this point we have a solution to merge Conflicts.Happy Coding;)The implementation of Git client in Webide

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The implementation of Git client in Webide

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The implementation of Git client in Webide

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support