How to Use Git to manage Binary large objects

Last Update:2017-02-10 Source: Internet

Author: User

Tags how to use git

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

How to Use Git to manage Binary large objects
GuideThrough the first six articles in this series, we have learned to use Git to manage Version Control for text files. We can't help but ask, is there any binary file that can also be used for version control? The answer is yes. Git already has extensions that can process Binary Large Object blocks (blob) such as multimedia files. Therefore, today we will learn how to use Git to manage so-called binary assets.

Git does not support large binary objects. Remember that Binary large objects are different from large text files. Although Git has no problem with version control for large text files, it does not have much effect on opaque binary files. It can only be submitted as a large entity black box.

Imagine that there is another exciting first-person decryption game. You are creating a complex 3D modeling for it. The source files are saved in binary format, finally, a 1 GB file is generated. You have submitted it once and there is a 1 GB new commit in the Git source repository history. Later, you modified the hair shape of the model character and submitted the update, because Git cannot take the hair away from the header and the rest of the model, therefore, you can only submit 1 GB of data. Then, you changed the eye color of the model and submitted this part of the update: it is a GB-level submission volume. Minor modifications to a model may result in three GB-level submissions. This is a serious problem for developers who want to control the size of all the resources of a game.

The difference is that text files in the format of obj, like other types of files, are all submitted to store all update and modification statuses, the difference is that the obj file is a series of plain text lines describing the model. If you modify the model and save it back to the obj file, Git can read the two files row by row and create a different version to get a fairly small commit. The finer the model, the smaller the commit, which is the standard Git use case. Although the file itself is large, Git uses the overwriting or sparse storage method to build a complete description of the current data usage status.

However, not all of them are plain text, but they all use Git. Therefore, a solution is required and there are already several.

Originally, OSTree was used as a GNOME project to manage operating system binary files. It does not apply here, So I skipped it directly.

Git large file storage (LFS) is an open-source project on GitHub, which is developed from the git-media Project. Git-media and git-annex are extensions used by Git to manage large files. They are two different solutions for the same problem, each with its own advantages. Although they are not official projects, they are unique to me:

Git-media is a centralized mode with a repository of public assets. You can tell git-media where large files need to be stored, whether on hard disk, server, or cloud storage server, each user in the project regards this location as the central primary storage location for large files.
Git-annex focuses on the distribution mode. Each user creates a repository, each of which has a local directory git/annex that stores large files. These annex will be synchronized regularly, and every user can access all resources as long as necessary. Unless specially configured through annex-cost, git-annex takes precedence over local storage and then external storage.

I have used git-media and git-annex in production, so I will give you an overview of how it works.

Git-media
Git-media is developed using the Ruby language, so you must first install the gem (LCTT: Gem is a Ruby-Based Development Kit ). Installation instructions are available on the website. Users who want to use git-meida need to install it. Because gem is a cross-platform tool, it is applicable to all platforms.
After installing git-media, you need to set some Git configuration options. You only need to configure it once on each machine.
$git config filter.media.clean "git-media filter-clean"$ git config filter.media.smudge "git-media filter-smudge"
In each repository where you want to use git-media, set an attribute to combine the created filter into the file type that you want to classify as "media. Do not be confused by such terms. A better term is "asset", because "media" usually refers to audio, video, and photos, but you can also easily convert 3D models, baking and textures are classified as media.
For example:
$ echo "*.mp4 filter=media -crlf" >> .gitattributes$ echo "*.mkv filter=media -crlf" >> .gitattributes$ echo "*.wav filter=media -crlf" >> .gitattributes$ echo "*.flac filter=media -crlf" >> .gitattributes$ echo "*.kra filter=media -crlf" >> .gitattributes
When you want to saveStageThese types of files are copied to the git/media directory.
Assume that a Git source warehouse already exists on the server. The last step is to tell the source warehouse where the "mother ship" is located, that is, when a media file is pushed to all users for sharing, the location where the media file will be stored. This is set in the git/config file of the Repository. Replace it with your username, host, and Path:
[Git-media] transport = scpautodownload = false # The default value is true. Pull the resource scpuser = sethscphost = example. comscppath =/opt/jupiter. git
If SSH settings on your server are complex, such as using a non-standard port or a path to a non-default SSH key file, use ssh/config to set the default configuration for the host.
The use of git-media is the same as that of a common file. You can treat a common file as a blob file and perform the same commit operation. The only difference in the operation process is that, in some cases, you should synchronize your assets (or media) to the shared repository.
To release assets or back up data for a team, run the following command:
$ git media sync
When you replace a file in git-media with a new version (for example, an audio file that has already been voiced, or a completed mask painting, or a video file that has been color-rated), you must explicitly tell Git to update the media. This will overwrite the default settings where git-media does not copy a remote existing file:
$ git update-index --really-refresh
When another member of your team (or you, on another machine) clones the repository, if the autodownload option is not set to true in git/config, resources are not downloaded by default. However, git media sync, a synchronization command of git-media, solves all problems.
Git-annex
The git-annex processing process is slightly different. The local repository is used by default, but the basic idea is the same. You can install git-annex from the software repository of your release, or download and install it from the website as needed. Like git-media, any user who uses git-annex must install it on the machine.
Its initialization settings are easier than git-media. Run the following command to replace it with your path to create a bare repository on your server:
$ git init --bare --shared /opt/jupiter.git
Clone it to the local computer and mark it as the initial path of git-annex:
$ git clone seth@example.com:/opt/jupiter.cloneCloning into 'jupiter.clone'... warning: You appear to have clonedan empty repository. Checking connectivity... done.$ git annex init "seth workstation" init seth workstation ok
Do not use filters to differentiate media resources or large files. You can use the git annex command to configure large files for classification:
$ git annex add bigblobfile.flacadd bigblobfile.flac(checksum) ok(Recording state in Git...)
Submit a file like a normal file:
$ git commit -m 'added flac source for sound fx'
However, the push operation is different because git annex uses its own branch to track assets. The-u option may be required for your first push, depending on how you manage your Repository:
$ git push -u origin master git-annexTo seth@example.com:/opt/jupiter.git* [new branch] master -> master* [new branch] git-annex -> git-annex
Like git-media, common git push commands do not copy data to the server, but only send related messages. To truly share files, run the synchronization command:
$ git annex sync --content
People have submitted shared resources. You need to pull them. The git annex sync command will prompt you to locally check resources that do not exist on your local machine but exist on the server.
Git-media and git-annex are both flexible and can both use local repositories instead of servers. Therefore, they are often used to manage private local projects.
Git is a very powerful and scalable system application software, and we should not hesitate to use it. Try it now!

From: http://www.php230.com/1480419001.html

Address: http://www.linuxprobe.com/large-objects.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More