10.7 Git Internal principles-Maintenance and data recovery

Source: Internet
Author: User
Tags commit garbage collection pack reset version control system git clone
Maintenance and data recovery

Sometimes you need to clean up the warehouse-making it more compact, or cleaning up the imported repositories or recovering lost content. This section will cover some of these scenarios. Maintenance

Git will automatically run a command called "Auto GC" on a regular basis. Most of the time, this command does not produce any effect. However, if there are too many loose objects (objects not in the package file) or too many package files, git runs a full git GC command. "GC" stands for garbage collection, this command does the following: collects all loose objects and places them in a package file, merges multiple package files into a large package file, and removes obsolete objects that are not related to any commits.

You can manually perform automatic garbage collection as follows:

$ git GC--auto

As mentioned above, this command does not usually produce an effect. It takes about 7,000 or more loose objects or more than 50 package files to get Git to start a real GC command. You can change these values by modifying the settings of Gc.auto and Gc.autopacklimit.

The other thing that the GC will do is to package your references into a separate file. Let's say your warehouse contains the following branches and tags:

$ find. Git/refs-type F.
git/refs/heads/experiment.
git/refs/heads/master.
git/refs/tags/v1.0.
git/refs/tags/v1.1

If you execute the git GC command, these files will not be available in the refs directory. To ensure efficiency Git will move them into a file named. git/packed-refs, like this:

$ cat. Git/packed-refs
# pack-refs with:peeled fully-peeled
cac0cab538b970a37ea1e769cbbde608743bc96d refs/ Heads/experiment
ab1afef80fac8e34258ff41fc1b867c702daa24b Refs/heads/master
cac0cab538b970a37ea1e769cbbde608743bc96d refs/tags/v1.0
9585191f37f7b0fb9444f35a9bf50de191beadc2 refs/tags/ v1.1
^1a410efbd13591db07496601ebc7a059dd55cfe9

If you update the reference, Git does not modify the file, but instead creates a new file to Refs/heads. In order to obtain the correct SHA-1 value for the specified reference, Git first finds the specified reference in the refs directory and then finds it in the Packed-refs file. So, if you can't find a reference in the refs directory, it might be in the Packed-refs file.

Note the last line of the file, which starts with ^. This symbol indicates that the label on the previous line is a note tag, and that line is the one that the note tag points to. Data Recovery

When you use Git, you may accidentally lose a commit. Usually this is because you forcibly delete the branch you are working on, but you eventually find that you still need the branch, or you can reset a branch by hard, and discard the commit you want. If these things have happened, how to get your submissions back.

The following example resets the master branch in your test repository to an old commit in order to recover the lost commits. First, let's look at where your warehouse is now:

$ git log--pretty=oneline
ab1afef80fac8e34258ff41fc1b867c702daa24b modified repo a bit
484a59275031909e19aadb7c92262719cfcdf19a added repo.rb
1a410efbd13591db07496601ebc7a059dd55cfe9 Third Commit
cac0cab538b970a37ea1e769cbbde608743bc96d Second commit
FDF4FC3344E67AB068F836878B6C4951E3B15F3D first Commit

Now, we'll hard reset the master branch to the third commit:

$ git reset--hard 1a410efbd13591db07496601ebc7a059dd55cfe9 HEAD is now at
1a410ef Third commit
$ git log--pretty =oneline
1a410efbd13591db07496601ebc7a059dd55cfe9 Third commit
cac0cab538b970a37ea1e769cbbde608743bc96d Second commit
FDF4FC3344E67AB068F836878B6C4951E3B15F3D first Commit

Now the top two commits have been lost-no branch points to these commits. You need to find the last commit SHA-1 and then add a branch to it. The trick is to find the last commit SHA-1-but you don't remember, do you?

The most convenient and most common method is to use a tool called Git reflog. When you are working, Git silently records the value of each time you change the HEAD. Each time you commit or change a branch, the reference log will be updated. The reference log (Reflog) can also be updated with the git update-ref command, which we refer to in git for reasons that use this command instead of writing the value of SHA-1 directly into the reference file. You can see what you've done at any time by executing the git reflog command:

$ git reflog
1a410ef head@{0}: reset:moving to 1a410ef ab1afef head@{1
}: commit:modified repo.rb a bit
484a5 HEAD@{2}: Commit:added repo.rb

Here you can see the two commits we've checked out, but not enough information. To make the displayed information more useful, we can execute Git log-g, which outputs the reference log in the form of a standard log.

$ git log-g
commit 1a410efbd13591db07496601ebc7a059dd55cfe9
reflog:head@{0} (Scott Chacon < schacon@gmail.com>)
Reflog message:updating HEAD
author:scott Chacon <schacon@gmail.com>
Date :   Fri may 18:22:37 2009-0700

		Third commit

commit ab1afef80fac8e34258ff41fc1b867c702daa24b
Reflog: Head@{1} (Scott Chacon <schacon@gmail.com>)
Reflog message:updating HEAD
Author:scott Chacon < Schacon@gmail.com>
Date:   Fri may 18:15:24 2009-0700

       modified repo.rb a bit

It looks like the following is your missing commit, and you can restore it by creating a new branch that points to the commit. For example, you can create a branch named Recover-branch to point to this commit (AB1AFEF):

$ git Branch recover-branch ab1afef
$ git log--pretty=oneline recover-branch
ab1afef80fac8e34258ff41fc1b867c702daa24b modified repo a bit
484a59275031909e19aadb7c92262719cfcdf19a added REPO.RB
1a410efbd13591db07496601ebc7a059dd55cfe9 Third commit
cac0cab538b970a37ea1e769cbbde608743bc96d Second commit
FDF4FC3344E67AB068F836878B6C4951E3B15F3D first Commit

Yes, now there is a branch called Recover-branch that is where your master branch once pointed, once again making the first two commits reachable. Next, suppose that your lost commit is not in the reference log for some reason-we can simulate this by removing the Recover-branch branch and removing the reference log. Now the first two commits are not pointed to by any branch:

$ git branch-d recover-branch
$ rm-rf. git/logs/

Since the reference log data is stored in the. git/logs/directory, you are now not referencing the log. Then how to recover that commit. One way is to use the Git fsck utility, which will check the integrity of the database. If you run it with a--full option, it will show you all objects that are not pointed to by other objects:

$ git fsck--full
Checking object directories:100% (256/256), done.
Checking objects:100% (18/18), done.
Dangling blob d670460b4b4aece5915caf5c68d12f560a9fe3e4
dangling commit ab1afef80fac8e34258ff41fc1b867c702daa24b
dangling tree aea790b9a58f6cf6f2804eeac9f0abbe9631e4c9
dangling Blob 7108f7ecb345ee9d0084193f147cdad4d2998293

In this example, you can see your lost commits after "dangling commit". Now you can use the same method as before to restore the commit, which is to add a branch that points to the commit. removing Objects

Git has a lot of great features, but one of those features can cause problems, and git clone downloads the entire history of the project, including every version of each file. If everything is source code then this is good, because Git is highly optimized to store this data efficiently. However, if a person has previously added a file of a very large size to the project, even if you remove the file from the project, each clone will be forced to download the large file. The problem arises because the file exists in history and it is always there.

This can be a serious problem when you migrate Subversion or Perforce repositories to Git. Because these version control systems do not download all of the history files, this file brings fewer problems. If you are migrating from another version control system to Git and find that the warehouse is much larger than expected, then you need to find and remove these large files.

Warning: This operation is destructive to the modification of the commit history. It will rewrite every commit from the earliest tree object you have to modify or remove a large file to refer to. If you're importing a warehouse and doing this before anyone starts based on those submissions, there will be no problem-otherwise, you must notify all contributors that they need to base their results on your new submission.

To demonstrate, we will add a large file to the test repository and delete it in the next commit, and now we need to find it and permanently delete it from the repository. First, add a large file to the repository:

$ Curl https://www.kernel.org/pub/software/scm/git/git-2.1.0.tar.gz > git.tgz
$ git add git.tgz
$ git commit -M ' Add git tarball '
[master 7b30847] Add git tarball
 1 file changed, 0 insertions (+), 0 deletions (-)
 Create M Ode 100644 git.tgz

Oops-in fact this project does not need this huge compressed file. Now we'll remove it:

$ git rm git.tgz
rm ' git.tgz '
$ git commit-m ' oops-removed large tarball '
[master dadf725] oops-removed L Arge tarball
 1 file changed, 0 insertions (+), 0 deletions (-)
 Delete mode 100644 git.tgz

Now, we perform a GC to see how much space the database occupies:

$ git GC
counting objects:17, done.
Delta compression using up to 8 threads.
Compressing objects:100% (13/13), done.
Writing objects:100% (17/17), done.
Total (delta 1), reused (Delta 0)

You can also execute the count-objects command to quickly see how much space is occupied:

$ git count-objects-v
count:7
size:32
in-pack:17
packs:1
size-pack:4868
prune-packable: 0
garbage:0
size-garbage:0

The value of Size-pack refers to the size of your package file in kilobytes, so you occupy approximately 5MB of space. Less than 2KB was used before the last commit-obviously, removing a file from a previous commit does not remove it from history. Each time someone clones this warehouse, they will have to clone all 5MB to get this micro project, just because you accidentally added a large file. Now let's completely remove this file.

First you have to find it. In this case, you already know which file it is. But suppose you don't know how to figure out which file or files are taking up so much space. If you perform

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.