File display size, actual size, and file holes

Source: Internet
Author: User
File display size and actual size and file holes. com2 this article will illustrate problem 3 do the following game to observe this problem 3.1 we create 5 different content files 3.2ll view these 5 files 3.3du view these 5 files 4ext3_ I... file display size and actual size as well as file holes problem directory 1 preface www.2cto.com 2 This article will illustrate problem 3 do the following game to observe this problem 3.1 we create five different content files 3.2 ll view these five files 3.3 du view the disk storage format of these five files 4 ext3_inode 4.1 find out how much I _size and I _blocks are stored on the disk 4.1.1 find out the five files inode number 4.1.2 find the storage location and content of these five inode on the disk. 4.2 according to I _size, I _blocks's offset in ext3_inode to view its specific data. 5. how to store holes in files? 5.1 first, let's take a look at the holes in files. 5.2. How to store the file holes on the disk? how can I run the 5.3 ls-a command to view the 5.4 du-h hole command to view the hexadecimal data on the 5.5 dump disk? 6. conclusion: we often use the ls and du commands, but the results are different. what is the relationship between them?: For example, [root @ syslab filetest] # du-h * 0 file_0 4.0 K file_1 4.0 K file_2 4.0 K file_4096 8.0 K file_4097 4.0 K hole [root @ syslab filetest] # ll-h total 24 K-rw-r --. 1 root 0 Jan 10 16: 15 file_0-rw-r --. 1 root 1 Jan 10 16:15 file_1-rw-r --. 1 root 2 Jan 10 16:15 file_2-rwxr-xr-x. 1 root 4.0 K Jan 10 16:15 file_4096-rw-r --. 1 root 4.1 K Jan 10 16:15 file_4097-rw-r --. 1 root 25 K Jan 10 hole: du command and ll (ll is equivalent to ls-l) command to see different sizes! How can I know the size of a file? Next we will discuss the problem environment, ext3, block size formatting to 4096byte the problem File display size described in this article (ls command ): the I _size field in ext3_inode (this field indicates the size of the file. for example, if we write a byte, we will assume that the file is 1 byte large) actual file size (du command): I _blocks field in ext3_inode (the block size here is fixed (different from the block size of the file system), fixed to 512 bytes, even if the size of the block created by the file system is 4096 bytes, the block calculation here is always 512 bytes. for example, if we write a byte to this file, this file actually occupies the size of a block when a file system is created. The holes in the file are used to save the actual storage space. the holes do not occupy the actual disk blocks. the isize display size contains the holes (such as 10 m holes, this 10 MB will also be added to the file size), and the actual physical block of the disk does not have the storage hole size (the actual block is not used to store this 10 m hole) details: echo 1>/aa # write 1 during echo, a linefeed echo-n 1>/bb # does not write the line break at the end of echo. we will play the following game to observe this problem. five files with different content for (I = 1; I <= 4096; I ++) do echo-n a> file_4096; done for (I = 1; I <= 4097; I ++ )) do echo-n a> file_4097; done for (I = 1; I <= 1; I ++) do echo-n a> file_1; done for (I = 1; I <= 1; I ++) do echo a> file_2; done touch file_0ll View the five files ls-lh-rw-r --. 1 root 0 Jan 10 16: 15 file_0-rw-r --. 1 root 1 Jan 10 16:15 file_1-rw-r --. 1 root 2 Jan 10 16: 15 file_2-rw-r --. 1 root 4096 Jan 10 16:15 file_4096-rw-r --. 1 root 4097 Jan 10 file_4097 now, file_4096 has 4096 bytes and file_2 has 2 bytes, five file sizes, such as name du. view these five files. du-h * [root @ syslab filetest] # du-h * 0 file_0 4.0 K file_1 4.0 K file_2 4.0 K file_40 96 8.0 K file_4097 4.0 K holeext3_inode disk storage format struct ext3_inode {_ le16 I _mode;/* File mode */_ le16 I _uid; /* Low 16 bits of Owner Uid */_ le32 I _size;/* Size in bytes */_ le32 I _atime;/* Access time */_ le32 I _ctime; /* Creation time */_ le32 I _mtime;/* Modification time */_ le32 I _dtime;/* Deletion Time */_ le16 I _gid; /* Low 16 bits of Group Id */_ le16 I _links_count;/* Links cou Nt */_ le32 I _blocks;/* Blocks count */.. _ le32 I _block [EXT3_N_BLOCKS];/* Pointers to blocks */..} this is the actual storage format of the inode corresponding to the ext3 file on the disk. I _size indicates the display size of the file, that is, the file size we think. I _blocks indicates the actual number of blocks occupied by the file (the block size here is fixed (different from the block size of the file system) and fixed to 512 bytes, it does not change with the block size during formatting ). Find out how much I _size and I _blocks are stored on the disk and find the inode number of the five files [root @ syslab filetest] # ll-I | sort-k 1 273974-rw- r --. 1 root 4096 Jan 10 16: 15 file_4096 273975-rw-r --. 1 root 4097 Jan 10 16: 15 file_4097 273976-rw-r --. 1 root 1 Jan 10 file_1 273977-rw-r --. 1 root 2 Jan 10 file_2 273978-rw-r --. 1 root 0 Jan 10 file_0 the first column is the inode number to find the storage location and content of these five inode on the disk (that is Based on the ext3 disk partition format, we use debugfs to view the specific disk location of the inode debugfs-R 'stats'/dev/sda2... block size: 4096... inodes per group: 8192... inode size: 256... group 32: block bitmap at 1048576, inode bitmap at 1048592, inode table at 1048608 23336 free blocks, 0 free inodes, 968 used directories, 0 unused inodes [Checksum 0x0cdd] Group 33: block bitmap at 1048577, inode bitmap at 1048593, inode ta Ble at 1049120 0 free blocks, 4470 free inodes, 217 used directories, 4470 unused inodes [Checksum 0xd341]... here, we see Inodes per group: 8192, and we know that the block groups of these five inode storage are (inode IDs in ls display start from 1, the kernel starts from 0) 273974/8192 = 33; the inode subscript in the block group is the remainder 3638; (the array subscript starts from 0) 273975/8192 = 33; inode subscript in the block group is the remainder of 3639; (array subscript starts from 0) 273976/8192 = 33; inode subscript in the block group is the remainder of 3640; (array subscript starts from 0) 273977/8192 = 33; inode subscript in the block group is the remainder of 3641; (array subscript starts from 0) 273978/8192 = 33; inode subscript in the block group is the remainder of 3642; (When the array subscript starts from 0, we know from the output of debugfs, if the inodetable of block group 33 is in the data block with block number 1049120, what is the specific content of the above five inode struct on the disk? let's look at what is stored on the disk because of the disk block size. 4096 bytes (specified during formatting ), the starting position of inodetable is 1049120*4096 (block number * block size), so the five inode positions are 1049120*4096 + (3638-1) of the inode corresponding to file_4096 on the disk) * 256 to 1049120*4096 + 3638*256 (unit: bytes) we read the disk data of this inode 1049120*4096 + 3637*256 = 4298126592 we read and write the raw data on the disk, check what value is stored [root @ syslab filetest] # dd if =/dev/sda2 bs = 1 count = 256 skip = 42 98126592 | od-t x4-Ax 000000 11681a4 00001000 50ee78a8 50ee78a8 000010 50ee78a8 00000000 00010000 00000008 000020 00080000 00000001 0001f30a 00000004 000030 00000000 00000000 00000001 001125c7 first, let's check if this data segment is file_4096 struct on disk. we know the above data about struct ext3_inode {_ le16 I _mode; /* File mode */The first two bytes are the access mode, so [root @ syslab ~] # Dd if =/dev/sda2 bs = 1 count = 256 skip = 4298126592 | od-t x2-Ax 000000 81a4 0000 1000 0000 78a8 50ee cdbd 50ef get the first two bytes equal to 0x81a4 gossip 100644, for rw, r, and r modes, ll. check-rw-r --. 1 root 4096 Jan 10 file_4096 is correct, and then we will check the results in chmod + x file_4096, ls-l-rwxr-xr-x. 1 root 4096 Jan 10 file_4096 the last few digits of the mode are 755 to [root @ syslab filetest] # dd if =/dev/sda2 bs = 1 count = 256 skip = 4298126592 | od-t x2-Ax 000000 81ed 00 00 1000 0000 78a8 50ee cde7 50ef the first two bytes are 0x81ed equal to octal 100755, 755! In fact, we can also check the creation time of this file 0x50ee78a8 = 1357805736 [root @ syslab filetest] # date-d "@ 1357805736" Thu Jan 10 16:15:36 CST 2013 or modify the file to see the actual modification. according to I _size, I _blocks's offset in ext3_inode is used to view its specific data. because we know I _size, the specific offset of I _blocks in ext3_inode _ le32 I _size; is 0x4 relative to the struct ext3_inode struct, the offset of _ le32 I _blocks is 0x1c, and the two values are 4 bytes in length, so let's take a look at the I _size and I _blocks of file_4096 [root @ syslab filetest] # dd if =/dev/sda2 bs = 1 count = 256 ski P = 4298126592 | od-t x4-Ax 000000 11681ed 00001000 50ee78a8 50efcde7 000010 50ee78a8 00000000 00010000 00000008 000020 00080000 0001f30a 00000001 00000004 000030 00000000 001125c7 00000000 00000001 000040 00000000 00000000 .. so according to the offset, we can get the two values I _size = 0x00001000 = 4096. the actual size of I _blocks = 0x00000008 = 8 blocks is displayed for ls (as we have already pointed out above, the block size is fixed to 512 bytes, which is different from the size of the hierarchical block of the file system. the block size is 8*512 = 4096, which is exactly one block. Therefore, file_4096 is exactly one block and the display size is 40. 96 bytes. the actual size is 4096 bytes. check the remaining four files. The result is as follows: file_4097, I _size = 0x00001001 = 4097 byte I _blocks = 0x00000010 = 16 (16*512 = 4096*2, that is, it actually occupies two blocks !) File_1 I _size = 0x00000001 = 1 byte I _blocks = 0x00000008 = 8 (8*512 = 4096, that is, 1 byte also occupies 1 block) file_2 I _size = 0x00000002 = 2 byte I _blocks = 0x00000008 = 8 (8*512 = 4096, that is, 2 byte also occupies 1 block) file_0 I _size = 0x00000000 = 0 byte I _blocks = 0x00000000 = 0 (0 byte does not occupy any data block) then we compare the I _size and I _blocks values with the ls-l commands and the du-h * command results to find that the I _size and ls commands display the same size, so this is what we think is the size, for example, if I write a character to a file, we will assume that the file size is 1 byte I _blocks and the du command shows the same size, so this is the actual size of our disk storage, for example, I write a character to a file, but the actual disk occupies a block (Usually 4096 bytes) how to store the holes in files with large or small holes is a part of common files, it is a null character but not stored in any data block on the disk. Holes are a feature of Unix files. File holes are introduced to avoid disk space waste. The implementation of file holes in Ext2 is based on the distribution of dynamic data blocks: only when the process needs to write data to a block can the block be actually allocated to the file. In actual disk storage, the I _block array of inode associated with the file (the specific physical block number array associated with inode) only stores the logical block numbers of allocated blocks, the other elements in the array are empty. Let's take a look at how to store echo-n "a" | dd of =/root/filetest/hole bs = 4096 seek = 6 for files/root/ filetest, the first 4096*6 bytes are all skipped, and then a character "a" is written. let's take a look at the above method to view the 273979-rw-r --. 1 root 24577 Jan 10 hole. this file shows that it occupies 24577 bytes, but how many disks does it actually occupy? Run the du-h hole command to view the hexadecimal data on the 4.0 K holedump disk. I _size = 0x00006001 = 24577 I _blocks = 0x00000008 (8*512 = 4096 byte, 1 block size! Empty space does not occupy actual data blocks !) The ls command summarizes the display size of the file, that is, the I _size value in the inode struct on the disk, in the bytedu command, the actual disk storage size of the file is I _blocks * 512 in the inode struct on the disk. the unit of byte I _blocks is fixed to 512 bytes, this is different from the block size during File system formatting and does not change the file size as the file system formatting does not occupy the actual storage space block on the disk, that is, if the file contains MB holes, the holes in the disk block files that do not occupy the time are included in the files when ls is displayed. that is, if the files contain M holes, ls will add the holes of M. note: here I compare the ls from the actual storage data on the disk. du shows the result and does not directly view the source code of the ls and du commands. if any error occurs, please correct me ~
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.