Tar file format analysis

Source: Internet
Author: User

tar file format Analysis [Edit]1. Question

During testing, it was found that the tar file a of the original data folder was created by the command "TAR-CVF", then the original data folder was copied to another place, and a tar file B was created through TAR-CVF, and files A and B had inconsistent data at intervals. Obviously, these inconsistent data and the format of the tar file is closely related, the following based on the tar-1.28 source code Analysis of the reasons for inconsistent data. [Edit]2. Conclusion

Although the data file content is consistent, but the file creation time, UID, GID, user name, user group name, user rights, tar file header type information will be saved in the tar file header information, so if any of the above elements are inconsistent, Will cause the generated tar files to be inconsistent. In the generated tar file, a tar header message is inserted in front of each packaged folder and file, with many categories of header information, the default being the Gnu_format type, and the following data structure:

-struct Posix_header-
*/* byte offset */(
char name[100];/* 0 */
char mode[8]; *//+
Cha R Uid[8]; /* 108 */
char gid[8];/* * * * */
char size[12];/* 124 */+
char mtime[12];/* 136 */n
Char Chksu M[8]; /* 148 */
Typeflag char,/* 156 */+
char linkname[100];/* 157 */(
char magic[6];/* 257 */
PNS Ar version[2]; /* 263 */
uname[32 char];/* 265 */
gname[32];/* 297 */-
char devmajor[8];/* 329 */
41 Char Devminor[8]; /* 337 */
prefix[155];/* 345 */*/*/
44};

If there are several files under folder A, the format of the tar file for folder A is as follows.

[Edit]3. Test and Code Analysis

The key steps for code based Tar-1.28,tar files are as follows:

1. Create test folders and test files:

# ll Test_dir Total
drwxr-xr-x 3 root root 4096 Sep 10:26./
drwxr-xr-x 4 4096 Sep 17 10:33. /
-rw-r--r--1 root root 7209 Sep 10:25 manpage.cp
-rw-r--r--1 root root 3902 Sep + 10:26 manpage.mv
drwx R-xr-x 2 root root 4096 Sep 10:26 test_subdir/

[root@slave1/root/meng-test/tar/tar-1.28/src]
# ll Test_dir/te st_subdir/Total
drwxr-xr-x 2 root root 4096 Sep 10:26./
drwxr-xr-x 3 root root 4096 Sep 17 10:26.. /
-rw-r--r--1 root root 12220 Sep 10:26 manpage.fdisk

2. Debug command "TAR-CVF Test_dir.tar test_dir" via GDB
Add a breakpoint at create_archive (), create_archive is the function that creates the tar file

# gdb./tar GNU gdb (gdb) Red Hat Enterprise Linux (7.2-50.el6) Copyright (C) Free Software Foundation, Inc. License gplv3+: GNU GPL version 3 or later  

3. After entering create_archive, enter the while loop ready to write to the tar file, the first file to be written to is a folder Test_dir
Dump_file0 is a function that is responsible for packaging a single file, and if the file is a folder, it is recursively written to the tar file.

{
    const char *name;
    while ((name = Name_next (1)) = NULL)
    if (!excluded_name (name, NULL))
        dump_file (0, name, name);
}
(GDB) bt
0 Dump_file0 (st=0x7fffffffd510, name=0x66e100 "Test_dir", p=0x66e100 "Test_dir") at create.c:164 1 0x000000000040d3f2 in Dump_file (parent=0x0, name=0x66e100 "Test_dir", fullname=0x66e100 "Test_dir") at create.c:195 2 0x000000000040d4ad in CR Eate_archive () at create.c:1407 3 0x000000000042460c in main (Argc=<value optimized out>, Argv=<value optimized out>) at tar.c:2779

4. After entering the DUMP_FILE0, first look at the Union block, this 512bytes data will eventually be written to each packaged file, folder header.
The default header type is Oldgnu_header

366 Union block
367 {
368 char buffer[blocksize];
369 struct Posix_header header;
370 struct Star_header star_header;
371 struct Oldgnu_header oldgnu_header;
372 struct Sparse_header sparse_header;
373 struct Star_in_header star_in_header;
374 struct Star_ext_header star_ext_header;
375}; 

5. Because the first file to be packaged is the folder Test_dir, the code path is dump_file0->dump_dir->dump_dir0.

(GDB) S
dump_dir0 (St=0x5418f0f5, name=0x66e100 "Test_dir", p=0x66e100 "Test_dir") at create.c:1105
1105	  BOOL Top_level =! st->parent;
(GDB) bt
#0  dump_dir0 (st=0x5418f0f5, name=0x66e100 "Test_dir", p=0x66e100 "Test_dir") at create.c:1105
#1  Dump_dir (St=0x5418f0f5, name=0x66e100 "Test_dir", p=0x66e100 "Test_dir") at create.c:1309
#2  dump_file0 (St=0x5418f0f5, name=0x66e100 "Test_dir", p=0x66e100 "Test_dir") at create.c:1753
#3  0x000000000040d3f2 in Dump_file (parent=0x0, name=0x66e100 "Test_dir", fullname=0x66e100 "Test_dir") at create.c:1955
#4  0X000000000040D4AD in Create_archive () @ create.c:1407
#5  0x000000000042460c in Main (argc=<value Optimized out>, Argv=<value optimized out>) at tar.c:2779
(GDB)

6. In Dump_dir0, call Start_header to initialize the header file of the folder Test_dir, Start_header and Finish_header appear in pairs.

(GDB) bt
#0 Write_header_name (st=0x7fffffffd510) at create.c:721
#1 start_header (st=0x7fffffffd510) at create.c:744
#2  0x000000000040c657 in Dump_dir0 (St=0x5418f0f5, name=0x66e100 "Test_dir", p=0x66e100 "Test_ Dir ") at create.c:1112
#3  dump_dir (st=0x5418f0f5, name=0x66e100" Test_dir ", p=0x66e100" Test_dir ") at create.c : 1309
#4  dump_file0 (st=0x5418f0f5, name=0x66e100 "Test_dir", p=0x66e100 "Test_dir") at create.c:1753
#5  0x000000000040d3f2 in Dump_file (parent=0x0, name=0x66e100 "Test_dir", fullname=0x66e100 "Test_dir") at create.c : 1955
#6  0x000000000040d4ad in create_archive () @ create.c:1407
#7  0x000000000042460c in Main ( Argc=<value optimized Out>, Argv=<value optimized out>) at tar.c:2779
(gdb) s
727		   < Strlen (St->file_name))
(gdb)
726	  Else if (name_field_size-(Archive_format = = Oldgnu_format)
(GDB) p archive_format
$8 = Gnu_format
(gdb)

7. In Start_heade, call the following function to write data to the header

-write_header_name    -->   Write file name to header, Test_dir
-mode_to_chars         -  Write file permission information to header, i.e. 755
-uid_to_chars              --> to write to header UID,
-gid_to_chars             --> to write to the header the group information to which the file belongs, root
-off_to_chars & nbsp           --> writing header file size
-time_to_chars         &NBSP ; ----write the file's modification time to the header
-writes the magic message to the header file, "Ustar"
-uid_to_chars--Writes the file owner information to the header, root
-gid_to_chars The group information that the file belongs to in the header, that is, root

(GDB) p (struct star_header) *header
$34 = {name = "test_dir/", ' \000 ' <repeats, times>, mode = "0000755", UID = "0000000", gid = "0000000", size = ' 0 ' <repeats times>, mtime = "12406170515", chksum = "\000\000\000\000\000\0 00\000 ",
typeflag = 0 ', linkname = ' \000 ' <repeats for times>, magic =" Ustar ", Version =" ", uname =" root ", ' \000 ' <repeats times>, gname =" root ", ' \000 ' <repeats times>,
devmajor =" \000\000\000\000\000\  000\000 ", Devminor =" \000\000\000\000\000\000\000 ", prefix = ' \000 ' <repeats" Times> ", atime = ' \000 ' <repeats Times>, CTime = ' \000 ' <repeats Times>}
(GDB) 

8. After writing the header, call Finish_header to join the checksum domain to the header, and finally write the header content to the tar file

9. Return to Dump_dir, traverse the next file, MANPAGE.MV, and like the package folder file above, call Start_header and Finish_header to write a similar header file.
When the header is complete, unlike the folder file, Dump_regular_file is called to write the data file manpage.mv to the file.

(GDB) bt #0 dump_regular_file (fd=9, st=0x7fffffffd1c0) at create.c:1034 #1 0X000000000040CCBF in Dump_file0 (st=0x5418f 0e6, name=0x66f1a0 "manpage.mv", p=0x66f800 "TEST_DIR/MANPAGE.MV") at create.c:1769 #2 0x000000000040d3f2 in Dump_file (p  arent=0x7fffffffd510, name=0x66f1a0 "manpage.mv", fullname=0x66f800 "TEST_DIR/MANPAGE.MV") at create.c:1955 #3 0x000000000040d2b8 in Dump_dir0 (St=0x5418f0f5, name=0x66e100 "Test_dir", p=0x66e100 "Test_dir") at create.c:1216 #4 dump _dir (St=0x5418f0f5, name=0x66e100 "Test_dir", p=0x66e100 "Test_dir") at create.c:1309 #5 Dump_file0 (ST=0X5418F0F5, name =0x66e100 "Test_dir", p=0x66e100 "Test_dir") at create.c:1753 #6 0x000000000040d3f2 in Dump_file (parent=0x0, name=0x66e1 
XX "Test_dir", fullname=0x66e100 "Test_dir") at create.c:1955 #7 0x000000000040d4ad in Create_archive () at create.c:1407 #8 0x000000000042460c in Main (Argc=<value optimized out>, Argv=<value optimized out>) at tar.c:2779 (GDB) ( GDB) p *header $47 = {buffer ="Test_dir/manpage.mv", ' \000 ' <repeats bayi times>, "0000644\000\060\060\060\060\060\060\060\000\060\060\060\ 060\060\060\060\000\060\060\060\060\060\060\060\067\064\067\066\000\061\062\064\060\066\061\067\060\064\067\ 066\000\000\000\000\000\000\000\000\000\060 ", ' \000 ' <repeats times>," Ustar \000root ", ' \000 ' <repeats 28 Times>, "root", ' \000 ' <repeats times>, Header = {name = "test_dir/manpage.mv", ' \000 ' <repeats ti Mes>, mode = "0000644", uid = "0000000", gid = "0000000", size = "00000007476", Mtime = "12406170476", chksum = "\000\0  00\000\000\000\000\000 ", Typeflag = 0 ', linkname = ' \000 ' <repeats the times>, magic =" Ustar ", Version =" ", uname =" root ", ' \000 ' <repeats times>, gname =" root "," \000 "<repeats times>, devmajor =" \00
 0\000\000\000\000\000\000 ", Devminor =" \000\000\000\000\000\000\000 ", prefix = ' \000 ' <repeats 154 times>},

10. When the header is complete, unlike the folder file, Dump_regular_file is called to write the data file manpage.mv to the file.

11. Follow the steps above to finish writing the remaining folders and files in turn.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.