Git removes deleted large files from the library

Source: Internet
Author: User

Preface
When using git, you must have accidentally added a large file to the database. Even if you delete it, the file is saved in the record. In the future, both copy and push/pull will be troublesome. Today, I uploaded the project to GitHub and found that a maximum of MB files can be uploaded. A MB file exists in the local git library. Although it has been deleted, it also saves the record. Next we will teach you how to permanently delete invalid large files from the database.

Delete large files
The method is simple, that is, find the large file object and then delete it.

Submit all changes first

$ Git commit-am "commit all"
1
Perform GC operations on the warehouse

$ Git GC
1
Run count-objects to view the space usage. Size-pack is the size of packfiles in kilobytes. Therefore, it is about 150 MB.

$ Git count-objects-V
1


Run the underlying command git verify-pack to identify large objects and sort the output file size in the third column.

$ Git verify-pack-V. Git/objects/Pack/pack-8eaeb... 9e. idx | sort-K 3-N | tail-3
1


Note: You can see that the bottom is a large file.

Run the rev-LIST command to pass in the-objects option. It lists all the commit Sha values, blob Sha values, and the corresponding file path. In this way, you can view the Blob file name.

$ Git rev-list -- objects -- all | grep 185ab8d
1


Remove the file from all tree records.

$ Git log -- pretty = oneline -- branches -- spark-assembly-1.3.1-hadoop2.4.0.jar
1


You can use the filter-branch command to override all commit operations starting from 646784 to completely remove files from the GIT history.

Git filter-branch -- index-filter 'git Rm -- cached -- ignore-unmatch spark-assembly-1.3.1-hadoop2.4.0.jar '-- 646784d95f347749517a67c50c117f4bf85d0b42 ..
1


Note: The "--index-filter" option is similar to the "-tree-filter" option. However, instead of passing in a command to modify the files checked out on the disk, you can modify the saved region or index. Instead of using the RM file command to delete a specific file, you must use git Rm--cached to delete it-that is, deleting it from an index rather than a disk. This is done for Speed considerations-Because git does not need to check out all versions on the disk before running your filter, this operation will be much faster. You can also use--tree-filter to perform the same operation. The--ignore-unmatch option of git RM specifies that no error is displayed when the content you are trying to delete does not exist. Finally, you can use filter-branch to overwrite all history records starting from the 64678 commit. Otherwise, all historical records will be rewritten, which will take more time than necessary.

The history does not contain references to that file. However, when reflog is run and filter-branch is run, git goes. some refs added by git/refs/original still have references to it. Therefore, you need to delete these references and repack the repository. All references to these commits need to be removed before repack.

\ $ Rm-RF. Git/refs/original
\ $ Rm-RF. Git/logs/
\ $ Git GC
1
2
3
View space usage

$ Git count-objects-V
1
If you really want to delete this object completely, you can run the GIT prune command.
---------------------
Bai Yang
Source: csdn
Original: 50723783
Copyright Disclaimer: This article is an original article by the blogger. For more information, see the blog post link!

Git removes deleted large files from the library

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.