Linux file System ten questions---in-depth understanding of how files are stored

Source: Internet
Author: User
Tags inode usage

Ten questions about Linux file system
--About the file system, you know? Article Source: File system ten Questions
About the file system, I believe that everyone is not unfamiliar. As a siege lion we deal with it almost every day, but the depth of the section, and how much of it is not enough for us to understand. So let's take a look at the following set of Linux file system related issues:
1, mechanical disk random reading and writing speed is very slow, what is the operating system used to improve the random read and write performance?
2. Does touch a new empty file occupy disk space? How much does it take to occupy?
3. Do you want to create a new empty directory that consumes disk space? How much do you occupy? Which occupies a larger size than a new file?
4. Do you know where the file name is recorded on the disk?
5. How long is the filename? Under what constraints?
6, the file name is too long will affect system performance? Why does it have an impact?
7. How many files can be built in a directory?
8, create a new content size of 1k files, how much disk space will actually occupy?
9, to the operating system to initiate the Read File 2Byte command, the actual operating system will read how much?
10. How can we improve disk IO speed When we use files?
If you can think and do not want to answer up to 80 of the question, then please close this article. If not, and you are like the author of the hobby of prying into operating system privacy, then please come with me to explore these interesting places of the filesystem, and believe that understanding these will be of great help to our work.

I. Disk composition and partitioning
1. Disk Physical Structure
Start with the most basic disk physical structure, note that this article only discusses mechanical disks, SSDs are not covered in this article. We human management of anything is always used to divide a certain structure, and then the rule on the basis of management. The army is divided into armies, divisions, brigades, groups and battalions. The company is divided into business groups, departments, centers and groups. And then. For managing disks, split disk faces, heads, tracks, cylinders, and sectors.
Disk Surface: The disk is made up of a stack of disk faces, as shown in the left-hand image.
Head (Heads): Each head corresponds to a disk face, which is responsible for reading and writing data on the disk surface.
Track: Each disc is divided into concentric circles around the center of the circle, each of which is called a track.
Cylinder (cylinders): The three-dimensional track consisting of the same position on all platters is called a cylindrical surface.
Sector (Sector): Management of the disk in the track is still too large, so the computer predecessors to each track divided into a number of sectors, see below right I fell in love with Linux one of the reasons is that as long as you are willing to work hard, you can put the Linux jacket off to the end, Satisfy all your desires (please think of the crooked year to wall).    You can use the FDISK command on Linux to view these physical information for the disks used by the current system. The above is a virtual machine of my own disk physical information. You can see that my disk has 255 heads, which means there are 255 disks. 3,263 cylinders, which means there are 3,263 tracks on each disc, 63sectors/track says there are 63 sectors on each track. The command result also gives a value of sector size of 512bytes. Let's write the size of the disk.
255 disk face * 3263 CYLINDER * 63 sectors * 512bytes per sector = 26839088640byte.
The result is 26.8G, which matches the total size of the disk (as far as the detailed results of Fdisk vary by about 4M, the author does not thoroughly understand that interested readers can continue to study).
Also looked at the other two machines of the disk situation, found an interesting thing.  For example, whether the disk is large or small, the number of heads and the number of sectors per track is constant, but the track is much more. 2. Partitioning
Partitioning is the first step in the operating system's management of disks, a concept that is very familiar to any of our computer users. For example, the C, D, E, F drives under Windows. So, think about it,
Think: The detailed physical structure of the previous disk has already been, if you want to divide the whole disk into C, D and other partitions, how will you divide?
Scheme one: 255 disks, C disk is 0-100 disc, D disk is 101-200 disc,......
Scenario Two: 3,263 cylinders, C-plate 0-1000 cylinders, D-plate, 1001-20001-cylinder,......
For the above two scenarios, which one would you choose?? First, the process of disk IO. The first step is to move the head radially to find the track where the data is located. This part of the time is called seek time. The second step is to locate the target track and rotate it through the disk face, moving the target sector directly below the head. The third step is to read or write data to the target sector. So far, one disk IO is complete, so:
Single disk IO time = seek time + rotation delay + access time.
For rotation delay, the main server is now often used 1W rpm disk, each rotation of the time required for a week is 60*1000/10000=6ms, so its rotational delay is (0-6ms). For access times, the general time is short, fraction Ms. For seek time, the modern disk is probably in 3-15ms, in which the seek time is mainly affected by the relative distance between the current position of the head and the location of the target track.
In fact, which one, the main thing to see is that way performance faster. Because the data under the same partition is often read together, if the first one is used, then the head will have to jump over more than 3,000 track, so that the disk's seek time doubles and disk performance degrades. For scenario two, if the disk C, only the head in the 1-1000 tracks between the movement can be, greatly reducing the seek time. (In fact, the partition does not start at 0, and the first track of the disk will be used to install the boot loader and the disk partition table). Therefore, the partitioning method of scenario two can reduce the disk IO time in the seek time part, so all operating systems are using scenario two, there is no plan one.
If you use Fdisk for partitioning under Linux, you can notice the following information. This fully proves that the operating system is based on scenario two.
Back to the beginning of question 1, what are the techniques used by the operating system to reduce the performance of random Read and write? The operating system divides the partition by the cylinder corresponding to the track to reduce the seek time spent on disk IO, thus improving the read and write performance of the disk.

Ii. Directories and documents
1. Intro
Well, the disk basics are over, so let's go into the topic and start our Linux file system discussion. Is the file system not a directory or a file? These two people are familiar to us. But are you sure it's not your familiar stranger? I'll go first. Create an empty directory and empty files to see the results such as:

We all know that the fifth column shows the amount of space occupied, so let me ask you a few small questions.
(1) Why does the directory occupy 4096 of the space?


(2) Why are empty files occupying 0 of space?


(3) If the empty file really occupies 0byte space, then the file name, creator and Permissions-rw-rw-r-folder related information to where to save?


2, I will not believe that empty files do not occupy space


In order to solve this riddle, we need to use DF command. Enter Df–i,

The red box location in the Linux results shows information about inodes, and if you're unfamiliar with the concept of inode, you can temporarily take it as a guy with an operating system secret management that takes up space. Next I touch an empty file and then df-i again.

Although the previous operating system tells us that a new empty file occupies 0 of the space. But this experiment proves that the operating system "deceives" us, and it consumes an inode. So what is the node size of the Inode, using the DUMPE2FS command can help us to see the actual size of this stuff.

In the result of the output we can find the following line:

It tells us that the size of each inode is 256Byte. Of course, the size of each machine will be different, it is actually in the system to format the disk when the decision.


Well, the second question has an answer. The original new empty file will occupy disk space, the actual consumption is 256Byte. Oh, no, the exact argument should be an inode size, and the specific value is determined at the time of formatting.

Let's talk about creating a new empty directory. The new empty directory consumes 4KB of disk space. So it's just that? We also use Df–i to monitor system inode usage before and after the new catalog.

The original directory will also occupy an Inode node, the third question also has the answer, a new empty directory will occupy disk space 4KB + inode size. Oh, this is not necessarily 4K on your system, it's actually a block size. Also can be seen under the DUMPE2FS.


But my disk in the format of the use of the size of 4KB, hehe!

3, the mysterious empty directory of 4KB

The mystery of the front has been solved, and as a siege lion I have been intrigued by another thing. is the empty directory occupied by the 4KB, these spaces are used to save what? So mysterious.

CD into our new directory to view.

We'll create a new two empty file, and then we'll look at the space usage of the directory below.

It seems that nothing new has been found. Because the empty file does not occupy block, it is still shown here that the directory occupies the block, and the previous size has not changed. Then I continue to use PHP scripts to create 100 empty files with a file name length of 32Byte.

At this point we found that the directory occupies a larger disk space, 3 blocks. Haha, this is the answer to our fourth question, the file name is in the directory occupied block. Next I also proved that each directory block can be saved in the number of filenames is related to the length of the file name (like a bit of nonsense, but personally prove their guess is a little cool). I also created a new empty directory, creating 100 filenames with a length of 32*3 empty files, the temporary directory occupies the following disk space:

You might ask me why the number of blocks taken up after 3 times times the file name has not become 3 times times. In fact, the Linux file system about the structure of the file, in addition to the filename, there are some other fields, the file name is 3 times times longer than the structure will be 3 times times larger, this can refer to the Linux system kernel-related books.


Well, now that's the beginning of question 6, there's an answer. A long file name will certainly have an impact on system performance, as this may result in more disk IO. Many programmers like to name the file a meaningful long string, so that people can see the file name to know the purpose. Of course I'm not saying it's bad, but if you have a fairly large number of files, you should consider whether your file name causes your directory block to take up too much. The space is small, the disk is cheap, but you have to think about the operating system when looking for files in the directory, the operating system may need to use the file name you provide for string comparison, and bad luck, you need to name all blocks in their names to do it again. (Of course, the length of your file name is not abnormal, and the number is not up to 100,000 orders of magnitude, actually this cost is not too big, but this overhead you still know for good)

As for the opening question 5, the file name is the longest. In fact, the Linux operating system is to avoid the programmer to use the long file name, impose a limit, not more than 255byte.


In addition, we have no experience, in the directory under a lot of files, we use the LS command is very slow. Now you know why, when the operating system is actually reading all the blocks of the current directory, if the block is more, it may require multiple IO operations to complete this simple ls command.

I have created 1100 w empty files in a directory of my own computer, the LS command has not been produced in 1 minutes, I dropped the CTRL + C. Do not do this in your own project, although the operating system can cache your directory data, so that your next call will be a lot of blocks, but I still recommend you in a single directory of the number of files not million. Otherwise, your program may have poor performance when it first runs after a reboot.

Well, back to the beginning of question 7, do you have an answer? The maximum number of files that can be built in a directory, which is actually limited to the inode count of the partition in which you have the directory, you have 100W inode, and you can create up to 100W files. However, it says, the number of files in a single directory is best not to be million, or it will bring system performance problems.


4, the File block


Let's do an experiment about the files. I created a new empty directory, and created a new file under it, only a blank space in the data, save the following the Du command is displayed as follows:

The 8 k in the 4 K is the directory, you can also calculate the operating system to contain only a single space file allocated 4KB. In fact, the file block is relatively simple, unlike the directory block will save a lot of file system structure, the file block will only save the file data. The above experiment shows that the operating system allocates space with block as the smallest unit. That is, as long as your file data is not empty, the operating system will give you at least a block to store, until you exceed 4KB, the operating system assigns you the next block, that is. So for the opening question 8, a new file with a content size of 1k will actually occupy 1 blocks (typically 4k) and an inode (typically 256byte).


In fact, when the file system initiates an IO request to disk, it is also in block size. Even if you only start reading files to the operating system 2Byte, but the operating system will give you a one-time reading 4KB back. So disk IO is really slow, and as long as we have access to the 2Byte, it is very likely to continue to access the next 2byte content, which is the principle of program localization, so the operating system simply once more read back. Oh, that's the answer to question 9.
It's like we're going to the supermarket and it's really a waste of time, and it's a lot slower than the pit daddy's disk IO. We will not stroll around the supermarket to buy an Apple will come back, we will certainly buy more things for the home after the need to prepare, anyway, buy a bunch of things than buy an apple also didn't spend much time, beheading for it, that's the truth.


Besides, the opening question 10, how do we design your file to improve some IO speed? That is, if you know how much space your new file will occupy, such as 1M. Then, when you create a new file, say it to the operating system and let it keep the size of the file for you. In fact, the operating system will be as far as possible for you to allocate a continuous block, so you read this file, the head will save a lot of seek time, IO speed appears much faster.

Three, written in the back of the words


All we said was based on my own filesystem, the case is a block size is 4KB, an inode size is 256byte, including the number of inode on my virtual machine only 140多万个. These values are actually not fixed and you can set them to other values when formatting your hard drive. The principle of setting is to look at the capacity of your hard drive and your use.


If your files are larger than 4KB, even a few m, a few grams of files, then suggest that your block is as large as possible, so that the inode can be less than a few addresses.


If your files are mostly 1K or less, then it's a bit of a waste to actually use a 4K block, and if your boss is extremely cost-critical, you can consider setting your block smaller.


Also, be concerned about the inode of your file system. The operating system hides the Inode node when viewing the disk space information occupied by the directory and the file, which is intended to provide the user with a white-box environment, to let us know the space occupied by the data, and to hide the inode information in order to reduce the difficulty of understanding the operating system. In fact, we, as non-ordinary users of developers should have this right to know. This stuff is directly related to your file system to create the number of files. Otherwise, you find that there is a lot of space left on the online machine disk, but the inode uses the light, and then only reformat or migrate the server, Tian Deng. These two operations think all feel bitter force ah, still can avoid to try to avoid it.


Study questions: We all have an experience is that the directory under too many small files in the case of copying to other places, the speed will be very slow, we tend to compress the directory and then copy. Now can you say why this is fast?

Linux file System ten questions---in-depth understanding of how files are stored

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.