Original: http://gitbook.liuhui998.com/7_5.html
I. Packing file index First, let's take a look at the packaged file index, which is basically a series of bookmarks that point to the location within the packaged file. The package file index has two versions of the . version 1 format for GIT 1.6 versions, and the version 2 format is used for Git 1.6 and later versions . However, version 2 can be read by git 1.5.2 and above git, and it is also migrated back (Backport) to 1.4.4.5. Version 2 contains the CRC checksum value for each object, so in the process of repackaging, the compressed objects can be copied directly between packages (from pack to pack) without worrying about data corruption. The package file index of version 2 also supports packaging files larger than 4G . in two versions, the  SHA1 table stores the SHA1 values of the objects and sorts them according to their SHA1 values (to facilitate a binary search of the table), The offset table stores the offset value of the object in the SHA1 table that corresponds to the location in the packaged file. To speed up the search for objects, Git uses the idea of fragmentation, which contains a fanout table in a packaged file. The Fanout table uses a special method to point to the OFFSET/SHA1 table. Simply put, fanout[0] represents the offset of the object with the smallest value in the OFFSET/SHA1 table for all objects with SHA1 values starting with 0x00 SHA1; fanout[1] Represents the offset of the object in the OFFSET/SHA1 table that has the smallest SHA1 value in all objects that begin with 0x01, and fanout[2] represents the SHA1 of the SHA1 value in all objects that start with 0x02, with the lowest SHA1 values in the OFFSET/SHA1 table. And so on, fanout[254] represents the offset of the object with the smallest SHA1 value in the OFFSET/SHA1 table for all objects that begin with a value of 0xFE SHA1. fanout[255] Represents the offset of the object with the largest SHA1 value in the OFFSET/SHA1 table for all objects starting with the SHA1 value, which also represents the size of the current OFFSET/SHA1 table. Therefore, when looking for an object by the SHA1 value, the first two bits of the SHA1 value are used in the Fanout table to determine the range of an interval in the OFFSET/SHA1 table, and then in the SHA1 table, using the binary method to find the SHA1 value. This reduces the 8 binary search iterations by the Fanout table for the worst case scenario. in version 1th, offset (offset) and SHA values exist in the same location. However, in version 2nd, the SHA value, CRC value, and OFfset are placed in different tables. Two versions of files are finally indexed files and CRC checksum values for the packaged files . It is important to extract (extract) an object from the packaged file, and the index file is not necessary. The purpose of an index file is to help users quickly extract objects from packaged files. Those "Upload package" (Upload-pack) and "Retrieve package" (receive-pack) programs (Programs that implement the push and fetch protocols) use the packaged file format (packfile format) to transfer objects, but do not use indexes. Because The index can be re-established by scanning the packaged files after uploading or retrieving the packaged files. The package file format is simple. It has a head (header) and a series of packaged objects (each with its own header and body), and a check tail (trailer). The first 4 bytes are the string ' PACK ', which is used to ensure that you find the starting position of the packaged file. This is followed by a 4-byte package file version number, followed by 4 bytes indicating the number of entries in this file (entry). You can use the following Ruby program to read the header of the packaged file:
def Read_pack_header sig = @session. Recv(4) ver = @session. Recv(4). Unpack("N") [0] entries = @session. Recv(4). Unpack("N") [0] [sig, ver, entries] EndBehind the head is a series of packaged objects sorted by Sha values, each containing a header and content. The end of the packaged file is the SHA1 checksum (20 bytes long) of all (sorted) Sha values in the file (i.e., the iterative SHA1 operation in sorted order). The object header, which consists of one or more bytes in order, indicates the type of data followed and the expanded size. Each byte of the head has 7 bits for the data, and the 1th bit is used to indicate if there are any subsequent bytes in the header. If the 1th bit is ' 1 ', you need to read in 1 bytes (the next byte still belongs to the head), otherwise the next byte is the data. The first 3 bits of a byte specify the type of the data, as described in the following table. (3 bits can be combined into 8 numbers. In the current use, 0 (000) is ' undefined ', and 5 (101) is not currently in use.) Here we give an example of a two-byte head. The first 3 bits of the 1th byte indicate that the type of data is commit (commit), the remaining 4-bit and 2nd-byte 7-bit numbers are 144, indicating that the length of the data expansion is 144 bytes. It is important to note that the ' size ' contained in the head of the object is not the length of the data that follows, but the length after the data is expanded. Therefore, it is useful to package the offsets in the index file, and with it you do not have to expand each object to get the starting position of the next header. For non-delta objects, the data section is just zlib compressed data stream. For those two Delta objects, the data section contains the base object on which it depends, and the delta (differential) data used to refactor the object. The first 20 bytes of data are called Ref-delta, which is the first 20 bytes of the base object's Sha value. Ofs-delta stores the offset of the base object in the same packaged file. In any case, there are two constraints that must be strictly adhered to: 1, the Delta object and the base object must be in the same packaged file, 2, the Delta object and the base object must be of the same type (that is, the tree to the tree, blob to blob, and so on).
git package files