Analysis of the principle of GIT data storage

Source: Internet
Author: User
Tags sha1

Writing background

come in in your spare time. Looking at some of the content of topology discovery in the peer network, focusing on Markle tree's knowledge points, in an article (https://www.sdnlab.com/20095 ...), found a word " A common example of a Merkle dag is the git repository ", so I looked at some of the principles of the Git repository, first of all, as follows. For your own reference and for everyone.

Git repository parsing

My question at the time:

  • How does git store data that can be accurately rolled back to the established version based on the stored data?
  • is git storage similar to the storage mechanism for Docker? Is it all tiered storage?
  • What if git is not layered, and each commit is stored, then the amount of data is large?
doubts
Our path is to start with a new local git repository, step-by-step data and commit, and see how the content changes.
  • Start with git init a new local warehouse, and then open a folder in the warehouse .git

    • The head represents the position of the currently committed pointer;
    • Index is an indexed file;
    • Files in the refs folder are Commitid that are pointed to by different branches;
    • The history of each refs is recorded in the Logs folder;
    • The contents of the Objects folder are used to store git local repository objects
  • Now that you have found the location where git data is stored, what is the GIT structure?

    • git is stored as a key value, meaning that any type of data can be stored. The corresponding value can also be removed at any time by key;
    • The underlying object in Git that generates 4 of data:

      1. Tree Object : Can be viewed as a directory that manages some "tree" objects or "blob" objects. It has a string of pointers to "blob" objects or other "tree" objects, which are typically used to represent directory hierarchies between content (like files and subdirectories).
      2. Blob object : A "blob" is typically used to store the contents of a file. A "Blob" object is a piece of binary data, and the Blob object's key is generated from the SHA1 algorithm, so if two files have the same data content in a directory tree or a repository, they will share the same Blob object. It has no relationship with the path where the file is located and whether the filename has changed.
      3. Commit object : The commit object points to a "tree object" with related descriptive information that marks the state of a particular point in time for the item. It includes some metadata about the point in time, such as the timestamp, the author of the last commit, the pointer to the last commit, and so on.
      4. Tag Object : A "Tag" object includes an object name (SHA1 signature), object type, label name, name of the label creator ("Tagger"), and a message that may contain a signature (signature).
  • What happens to commands when you add some content git commit ?

    • When a commit is made, the objects, logs, and refs folders change, and we focus primarily on the Objects folder.
    • Each commit will save the data once, generating a commit object, a tree object, and a Blob object;
    • Objects folder data stored in the specific rules, for these three kinds of objects, will use SHA-1 to the content and header information generated hash value, go to hash value of the first two bits of the objects directory below the name of the folder, take the remaining 38 characters for the file name, Example: 8b0c4fe1567a463214c09334b54977e0114c90fe, take 8b to create a folder in objects, Take 0c4fe1567a463214c09334b54977e0114c90fe to create a file under the 8b folder for the file name.
    • After the knowledge point has been learned, how can we verify it?

      • Use the git cat-file content that we submitted to verify.
      • I made two commits, one is a completely new readme.md file, there is a row of data (# # # you Know), the second commit, create a new test.py file and add some new data in the README.MD;
      • git cat-file -tView the object's type and git cat-file -p print the object's contents in an elegant manner.
  1. Use git log --pretty=oneline to view my two commits
  2. Use git cat-file -t 172b54c8cd3eedca2fc301374286c2cb807d674f to view the type of the first commit
  3. Use git cat-file -p 172b54c8cd3eedca2fc301374286c2cb807d674fe to view content submitted for the first time
  4. Using git cat-file -p 8b0c4fe1567a463214c09334b54977e0114c90fe a tree object that looks at the first commit, you can see that a Blob object is stored in the tree object, which is the first time we commit a new file readme.md
  5. Use git cat-file -p 67aeba604cea61ec63d19db0706b19d846c65ba4 to view the content of the Blob object that was submitted for the first time # # you Know
  6. Use git cat-file -p 03543a4c19023da01b5114d7f7a614d95a1bf084 to view content for a second commit
  7. Use git cat-file -p 03543a4c19023da01b5114d7f7a614d95a1bf084 to view the contents of a second committed tree object, including modified content and new content
  8. Using the git cat-file -p 03543a4c19023da01b5114d7f7a614d95a1bf084 readme.md Blob object content to view the second commit, you can see the entire contents of the whole file, not just the modified data.
    • Summarize the answers to the questions

      • Git's data storage structure is a key value type, divided into 4 objects, and each commit is the entire file storage, rather than layered storage, so this will lead to a large amount of data stored, then git uses zlib to compress the data, so we open the stored file is such data, Then we all use the Cat-file command to check, how can these content be zlib compressed?
      • I wrote a simple program in the Go language, to verify that the data is compressed after zlib, run the program when you want to see the Git object file path, you can see the restored content;

Go code

package mainimport (    "bytes"    "compress/zlib"    "fmt"    "io"    "io/ioutil"    "os")//进行zlib压缩func DoZlibCompress(src []byte) []byte {    var in bytes.Buffer    w := zlib.NewWriter(&in)    w.Write(src)    w.Close()    return in.Bytes()}//进行zlib解压缩func DoZlibUnCompress(compressSrc []byte) []byte {    b := bytes.NewReader(compressSrc)    var out bytes.Buffer    r, _ := zlib.NewReader(b)    io.Copy(&out, r)    return out.Bytes()}func main() {    args := os.Args    if args == nil || len(args) < 2{        fmt.Println("Should input zlib file path.")       return     }               b, err := ioutil.ReadFile(args[1])    if err != nil {        fmt.Print(err)    }    fmt.Println(string(DoZlibUnCompress(b)))}

Summarize

  • The article is a reference to a lot of senior blog based on the written, but also have their own practice, so it is necessary to record.
  • If you have problems, try to understand and solve them, and verify them by hands-on practice.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.