Analyze in detail why the statistical results of du and df are different. dudf statistics are different.
Today, I am asked why the statistical results of du and df are different. I gave him some explanation, and then I thought about writing an article to analyze and analyze the principles.
We often use du and df to obtain the occupied space of directories or file systems. However, their statistical results are inconsistent. In most cases, their results are not very different, but sometimes their statistical results are very different.
For example:
##### Df statistical result [root @ xuexi ~] # Df-hT Filesystem Type Size Used Avail Use % Mounted on/dev/sda2 ext4 18G 1.7G 15G 11%/tmpfs 491 M 0 491 M 0%/dev/shm/dev /sda1 ext4 239 M 68 M 159 M 30%/boot // 192.168.0.124/win cifs 381G 243G 138G 64%/mnt ##### du's statistical results on the root directory [root @ xuexi ~] # Du-sh/2>/dev/null244G/
In df, the space used by "/" is 1.7 GB, but the result of du is 244 GB. Here, the du statistical result is greater than df.
Then let's look at the statistical results of the/boot partition.
[root@xuexi ~]# df -hT /boot;echo;du -sh /bootFilesystem Type Size Used Avail Use% Mounted on/dev/sda1 ext4 239M 68M 159M 30% /boot66M /boot
The result of du is 66 M, and the result of df is 68 M. The difference is not big, but the result of df is greater than du.
1. Underlying process of file storage and Deletion
The following describes the underlying mechanism of the file system. For details, see ext file system mechanism.
First, explain how the file is stored in the file system. Store a.txt to the/tmp directory.
When the.txt file is stored in/tmp:
- (1) first, find an empty inodenumber from the inode table and assign it to a.txt, such as 2222. Mark the inode 2222 in inode map (imap) as used.
- (2). Upload a.txt file record in/tmp data block. This record contains a pointer to the inode, for example, "0x2222 ".
- (3). Then, find the idle data blockblock from the block map (bmap) and write the data in a.txt to the data block. Every time a piece of space is written (each time a piece of space is allocated), an idle data block is retrieved from bmap until all data is saved.
- (4) set the data block for the 2222 record in the inode table to find the data block used by a.txt.
To delete the.txt file:
- (1). In inode tabledelete the data block pointer to a.txt. In this case, only the.txt data cannot be found in the external interface. However, this file still exists, but it is a "corrupted" file, because there is no pointer to the data block.
- (2). Mark inode 2222 as unused in imap. Therefore, the inode is released and can be reused by subsequent files.
- (3). Delete the data blockblock of the parent directory/tmp from the.txt record. You only need to delete the file and the file cannot be found.
- (42.16.bmapmark the block occupied by a.txt as unused. After being marked as unused, these data blocks can be overwritten and reused by subsequent files.
Consider a situation where a process is still using this file when a file is deleted. What is the situation?The file cannot be seen or found outside, so the deletion process has reached step (3. However, the process is still using the data of this file and can also find the data of this file because the process has obtained the data block occupied by this file when loading this file, although the file is deleted, the data blocks in bmap are not marked as unused.
2. Principles of du statistics
Du uses the stat command to count the total space occupied by each file (including sub-Directories. It is slow because the stat command is used for each involved file.
1. If other file systems are mounted in the statistics directory, statistics will also be made on this file system.
For example, when "du-sh/" is used, the files of all partitions, including those mounted, are counted. The result of du is 244 GB, which is much larger than that of df, because a partition is mounted to the/mnt directory.
##### Df statistical result [root @ xuexi ~] # Df-hT Filesystem Type Size Used Avail Use % Mounted on/dev/sda2 ext4 18G 1.7G 15G 11%/tmpfs 491 M 0 491 M 0%/dev/shm/dev /sda1 ext4 239 M 68 M 159 M 30%/boot // 192.168.0.124/win cifs 381G 243G 138G 64%/mnt ##### du's statistical results on the root directory [root @ xuexi ~] # Du-sh/2>/dev/null244G/
2. If the file is deleted, even if it is referenced by another process, the du command cannot count it. BecauseThe stat command cannot find this file..
3. You can calculate the total size of some files that you want to calculate across partitions. Because they can all be found and counted by stat.
For example:
Measure the size of all imgfiles in Linux.
[Root @ xuexi ~] # Find/-type f-name "*. img"-print0 | xargs-0 du-csh 19 M/boot/initramfs-2.6.32-504.el6.x86_64.img13M/mnt/linux tool/cirros-0.3.4-x86_64-disk.img31M total
The two imgfiles are included in different partitions.
3. df statistical principles
Df reads the superblock of each partition to obtain idle data blocks and used data blocks, so as to calculate the free space and used space, therefore, df statistics are extremely fast (superblock takes 1024 bytes ).
1. When another partition is attached to a file system, df will not count the partition as well.
This is easy to understand, because df reads the superblock of the respective partitions, even if partition 1 is mounted to the directory of partition 0, when df counts partition 0, you can only read superblocks with shard 0.
For example, the following/mnt and/boot are not included in.
[root@xuexi ~]# df -hT Filesystem Type Size Used Avail Use% Mounted on/dev/sda2 ext4 18G 1.7G 15G 11% /tmpfs tmpfs 491M 0 491M 0% /dev/shm/dev/sda1 ext4 239M 68M 159M 30% /boot//192.168.0.124/win cifs 381G 243G 138G 64% /mnt
2. Since df reads superblock every time, when df collects statistics on a file in the file system, it will automatically convert to statistics on the file system information.
[root@xuexi ~]# df -hT /etc/fstabFilesystem Type Size Used Avail Use% Mounted on/dev/sda2 ext4 18G 1.7G 15G 11% /
3. df will count the files that have been deleted but still referenced by the process.
Under normal circumstances, deleting a file will immediately release the related pointer and mark the bitmap in imap and bmap as unused.As long as bmap changes, the file system will immediately know which data blocks are idle and which data blocks are used in each block group, and the information will be updated to the superblock of the partition.. Therefore, df can immediately collect real-time spatial information.
However, when a file is deleted, if a process is still referencing the file, bmap will not mark the data block of the file as unused according to the previous analysis, the usage of data blocks will not be updated to superblocks. Because df calculates the free space and used space based on the number of data blocks in the superblock, therefore, when df statistics are made, the deleted file is counted into the used space.
For example, create a large file and put it in the "/" directory, and du and df count the space used in the root directory.
[root@xuexi ~]# dd if=/dev/zero of=/my.iso bs=1M count=1000[root@xuexi ~]# df -hT /Filesystem Type Size Used Avail Use% Mounted on/dev/sda2 ext4 18G 2.7G 14G 17% /[root@xuexi ~]# du -sh --exclude="/mnt" / 2>/dev/null2.7G /
They are equal in GB-level units.
Now, you can use a process to reference this file, delete the file, and perform statistics on du and df.
[root@xuexi ~]# tail -f /my.iso &[root@xuexi ~]# rm -rf /my.iso [root@xuexi ~]# ls /my.isols: cannot access /my.iso: No such file or directory[root@xuexi ~]# du -sh --exclude="/mnt" / 2>/dev/null1.8G /[root@xuexi ~]# df -hT /Filesystem Type Size Used Avail Use% Mounted on/dev/sda2 ext4 18G 2.7G 14G 17% /
We can find that the my. iso file cannot be obtained, so du cannot count this file. Df counts the file size because the data block occupied by my. iso is not marked as unused.
Turn off the tail process, and df then counts the space. The result is displayed as normal as du.
[root@xuexi ~]# jobs[1]+ Running tail -f /my.iso &[root@xuexi ~]# kill %1[root@xuexi ~]# df -hT /Filesystem Type Size Used Avail Use% Mounted on/dev/sda2 ext4 18G 1.7G 15G 11% /
If you do not know which files have been deleted in the file system but are still referenced by the process, you can use lsof to obtain them. You can also get the file size to see which file is "occupying and occupying ".
For example, run lsof to check the tail process before it is disabled. We can see that the tail process occupies/my. iso and the file size is 1048576000 bytes.
[root@xuexi ~]# lsof | grep deleted php-fpm 12597 root txt REG 8,2 4058416 931143 /usr/sbin/php-fpm (deleted)php-fpm 12657 nobody txt REG 8,2 4058416 931143 /usr/sbin/php-fpm (deleted)php-fpm 12707 nobody txt REG 8,2 4058416 931143 /usr/sbin/php-fpm (deleted)php-fpm 12708 nobody txt REG 8,2 4058416 931143 /usr/sbin/php-fpm (deleted)tail 14437 root 3r REG 8,2 1048576000 7171 /my.iso (deleted)
After the above analysis, I think there will be no doubt about the results of du and df.
Back to Linux series article outline: http://www.cnblogs.com/f-ck-need-u/p/7048359.html
Back to website architecture series article outline: http://www.cnblogs.com/f-ck-need-u/p/7576137.html
Back to database series article outline: http://www.cnblogs.com/f-ck-need-u/p/7586194.html
Reprinted please indicate the source: http://www.cnblogs.com/f-ck-need-u/p/8659301.html
Note: If you think this article is not bad, please click the recommendation in the lower right corner. Your support can stimulate the author's enthusiasm for writing. Thank you very much!