Analyze in detail why the statistical results of du and df are different. dudf statistics are different.

Source: Internet
Author: User
Tags imap

Analyze in detail why the statistical results of du and df are different. dudf statistics are different.

Today, I am asked why the statistical results of du and df are different. I gave him some explanation, and then I thought about writing an article to analyze and analyze the principles.

We often use du and df to obtain the occupied space of directories or file systems. However, their statistical results are inconsistent. In most cases, their results are not very different, but sometimes their statistical results are very different.

For example:

##### Df statistical result [root @ xuexi ~] # Df-hT Filesystem Type Size Used Avail Use % Mounted on/dev/sda2 ext4 18G 1.7G 15G 11%/tmpfs 491 M 0 491 M 0%/dev/shm/dev /sda1 ext4 239 M 68 M 159 M 30%/boot // 192.168.0.124/win cifs 381G 243G 138G 64%/mnt ##### du's statistical results on the root directory [root @ xuexi ~] # Du-sh/2>/dev/null244G/

In df, the space used by "/" is 1.7 GB, but the result of du is 244 GB. Here, the du statistical result is greater than df.

Then let's look at the statistical results of the/boot partition.

[root@xuexi ~]# df -hT /boot;echo;du -sh /bootFilesystem     Type  Size  Used Avail Use% Mounted on/dev/sda1      ext4  239M   68M  159M  30% /boot66M     /boot

The result of du is 66 M, and the result of df is 68 M. The difference is not big, but the result of df is greater than du.

1. Underlying process of file storage and Deletion

The following describes the underlying mechanism of the file system. For details, see ext file system mechanism.

First, explain how the file is stored in the file system. Store a.txt to the/tmp directory.

When the.txt file is stored in/tmp:

  • (1) first, find an empty inodenumber from the inode table and assign it to a.txt, such as 2222. Mark the inode 2222 in inode map (imap) as used.
  • (2). Upload a.txt file record in/tmp data block. This record contains a pointer to the inode, for example, "0x2222 ".
  • (3). Then, find the idle data blockblock from the block map (bmap) and write the data in a.txt to the data block. Every time a piece of space is written (each time a piece of space is allocated), an idle data block is retrieved from bmap until all data is saved.
  • (4) set the data block for the 2222 record in the inode table to find the data block used by a.txt.

To delete the.txt file:

  • (1). In inode tabledelete the data block pointer to a.txt. In this case, only the.txt data cannot be found in the external interface. However, this file still exists, but it is a "corrupted" file, because there is no pointer to the data block.
  • (2). Mark inode 2222 as unused in imap. Therefore, the inode is released and can be reused by subsequent files.
  • (3). Delete the data blockblock of the parent directory/tmp from the.txt record. You only need to delete the file and the file cannot be found.
  • (42.16.bmapmark the block occupied by a.txt as unused. After being marked as unused, these data blocks can be overwritten and reused by subsequent files.

Consider a situation where a process is still using this file when a file is deleted. What is the situation?The file cannot be seen or found outside, so the deletion process has reached step (3. However, the process is still using the data of this file and can also find the data of this file because the process has obtained the data block occupied by this file when loading this file, although the file is deleted, the data blocks in bmap are not marked as unused.

2. Principles of du statistics

Du uses the stat command to count the total space occupied by each file (including sub-Directories. It is slow because the stat command is used for each involved file.

1. If other file systems are mounted in the statistics directory, statistics will also be made on this file system.

For example, when "du-sh/" is used, the files of all partitions, including those mounted, are counted. The result of du is 244 GB, which is much larger than that of df, because a partition is mounted to the/mnt directory.

##### Df statistical result [root @ xuexi ~] # Df-hT Filesystem Type Size Used Avail Use % Mounted on/dev/sda2 ext4 18G 1.7G 15G 11%/tmpfs 491 M 0 491 M 0%/dev/shm/dev /sda1 ext4 239 M 68 M 159 M 30%/boot // 192.168.0.124/win cifs 381G 243G 138G 64%/mnt ##### du's statistical results on the root directory [root @ xuexi ~] # Du-sh/2>/dev/null244G/

2. If the file is deleted, even if it is referenced by another process, the du command cannot count it. BecauseThe stat command cannot find this file..

3. You can calculate the total size of some files that you want to calculate across partitions. Because they can all be found and counted by stat.

For example:

Measure the size of all imgfiles in Linux.

[Root @ xuexi ~] # Find/-type f-name "*. img"-print0 | xargs-0 du-csh 19 M/boot/initramfs-2.6.32-504.el6.x86_64.img13M/mnt/linux tool/cirros-0.3.4-x86_64-disk.img31M total

The two imgfiles are included in different partitions.

3. df statistical principles

Df reads the superblock of each partition to obtain idle data blocks and used data blocks, so as to calculate the free space and used space, therefore, df statistics are extremely fast (superblock takes 1024 bytes ).

1. When another partition is attached to a file system, df will not count the partition as well.

This is easy to understand, because df reads the superblock of the respective partitions, even if partition 1 is mounted to the directory of partition 0, when df counts partition 0, you can only read superblocks with shard 0.

For example, the following/mnt and/boot are not included in.

[root@xuexi ~]# df -hT Filesystem          Type   Size  Used Avail Use% Mounted on/dev/sda2           ext4    18G  1.7G   15G  11% /tmpfs               tmpfs  491M     0  491M   0% /dev/shm/dev/sda1           ext4   239M   68M  159M  30% /boot//192.168.0.124/win cifs   381G  243G  138G  64% /mnt

2. Since df reads superblock every time, when df collects statistics on a file in the file system, it will automatically convert to statistics on the file system information.

[root@xuexi ~]# df -hT /etc/fstabFilesystem     Type  Size  Used Avail Use% Mounted on/dev/sda2      ext4   18G  1.7G   15G  11% /

3. df will count the files that have been deleted but still referenced by the process.

Under normal circumstances, deleting a file will immediately release the related pointer and mark the bitmap in imap and bmap as unused.As long as bmap changes, the file system will immediately know which data blocks are idle and which data blocks are used in each block group, and the information will be updated to the superblock of the partition.. Therefore, df can immediately collect real-time spatial information.

However, when a file is deleted, if a process is still referencing the file, bmap will not mark the data block of the file as unused according to the previous analysis, the usage of data blocks will not be updated to superblocks. Because df calculates the free space and used space based on the number of data blocks in the superblock, therefore, when df statistics are made, the deleted file is counted into the used space.

For example, create a large file and put it in the "/" directory, and du and df count the space used in the root directory.

[root@xuexi ~]# dd if=/dev/zero of=/my.iso bs=1M count=1000[root@xuexi ~]# df -hT /Filesystem     Type  Size  Used Avail Use% Mounted on/dev/sda2      ext4   18G  2.7G   14G  17% /[root@xuexi ~]# du -sh --exclude="/mnt" / 2>/dev/null2.7G    /

They are equal in GB-level units.

Now, you can use a process to reference this file, delete the file, and perform statistics on du and df.

[root@xuexi ~]# tail -f /my.iso &[root@xuexi ~]# rm -rf /my.iso [root@xuexi ~]# ls /my.isols: cannot access /my.iso: No such file or directory[root@xuexi ~]# du -sh --exclude="/mnt" / 2>/dev/null1.8G    /[root@xuexi ~]# df -hT /Filesystem     Type  Size  Used Avail Use% Mounted on/dev/sda2      ext4   18G  2.7G   14G  17% /

We can find that the my. iso file cannot be obtained, so du cannot count this file. Df counts the file size because the data block occupied by my. iso is not marked as unused.

Turn off the tail process, and df then counts the space. The result is displayed as normal as du.

[root@xuexi ~]# jobs[1]+  Running                 tail -f /my.iso &[root@xuexi ~]# kill %1[root@xuexi ~]# df -hT /Filesystem     Type  Size  Used Avail Use% Mounted on/dev/sda2      ext4   18G  1.7G   15G  11% /

If you do not know which files have been deleted in the file system but are still referenced by the process, you can use lsof to obtain them. You can also get the file size to see which file is "occupying and occupying ".

For example, run lsof to check the tail process before it is disabled. We can see that the tail process occupies/my. iso and the file size is 1048576000 bytes.

[root@xuexi ~]# lsof | grep deleted   php-fpm   12597      root  txt     REG   8,2    4058416   931143 /usr/sbin/php-fpm (deleted)php-fpm   12657    nobody  txt     REG   8,2    4058416   931143 /usr/sbin/php-fpm (deleted)php-fpm   12707    nobody  txt     REG   8,2    4058416   931143 /usr/sbin/php-fpm (deleted)php-fpm   12708    nobody  txt     REG   8,2    4058416   931143 /usr/sbin/php-fpm (deleted)tail      14437      root    3r    REG   8,2 1048576000     7171 /my.iso (deleted)

After the above analysis, I think there will be no doubt about the results of du and df.

 

Back to Linux series article outline: http://www.cnblogs.com/f-ck-need-u/p/7048359.html
Back to website architecture series article outline: http://www.cnblogs.com/f-ck-need-u/p/7576137.html
Back to database series article outline: http://www.cnblogs.com/f-ck-need-u/p/7586194.html
Reprinted please indicate the source: http://www.cnblogs.com/f-ck-need-u/p/8659301.html

Note: If you think this article is not bad, please click the recommendation in the lower right corner. Your support can stimulate the author's enthusiasm for writing. Thank you very much!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.