As an important part of a system, the file system is a very important basic knowledge for every programmer. At the same time, the principles of file systems on Linux and Windows are quite different. Windows is in the form of fat tables, but today I mainly talk about the more classic Linux local file system on the storage principle, a little analysis of his structure, as a catalyst.
Data block management for file systems
We all know that files are stored on disk, we have to do with the concept of data block processing and storage, if the last remaining parts do not reach the size of the unit block, but also as a whole block way processing. So the question is, how do we deal with these file block relationships, and how to ensure that lookups are more efficient. The following 3 common practices are available:
1. Continuous distribution. This is the simplest way to do this, looking for an empty physical address on the disk, dividing up enough physical blocks on top of each other to map the logical data block numbers. The advantage is that if you know the first address of the block, you can randomly access the data block by calculating it. But the shortcomings are also obvious, you do not know the size of the data block you want to deposit, how much space to divide. Second, the efficiency of doing so is too low, each time to just find so many contiguous physical space storage data block, each time is a block into a block of a large space, will eventually lead to a lot of debris, because the space is too small to use, the remaining part of the small space will never be used.
2. Link Table method. This method improves a grade compared to the previous method, the first is the requirement that there is no continuous storage space. He is logically guaranteed to be data continuous through a similar list of methods. By setting the physical address of the next block of data in the file data block, each lookup is accessed individually. The following is a simulation diagram;
There are 2 shortcomings, random access efficiency is low, you have to find the past, if you are looking for data in your last data block, it will be tragic. 2. There is also an additional storage overhead, because you have a physical address stored in the file's basic information. So this led to the birth of the next way.
3. Index linked list method. This method seems to be specifically designed to overcome the shortcomings of the last method. is not idle efficiency low, I put the address of the index and real physical data stored separately can solve the problem. This is actually how the indexes in the database are indexed. Avoids multiple IO operations, places the index table directly in memory, and records the location of the next physical block in an index table entry. It is more efficient to find in memory one by one than to provide such frequent IO operations as repeated open files on disk.
How Linux files are stored
The 3 methods mentioned above are to elicit the following topic, in the storage management of Linux, the closest method is the last of the above 3, in Linux, they all files, directories, etc. are abstracted into 1 concepts, I-node node, which contains the metadata of the file and the index of the data node. Here is a simulation diagram:
Then why should have a 1-level index, Level 2, Level 3 Index of things, is designed for oversized data block files, if your file is small enough, 1,,2 a block of data is enough, directly through the I-node directly on the data block pointer can be done. Big words are indexed again, so the expansion can store a lot of data blocks.
How Linux file systems are stored