Implementation of linux File System

Source: Internet
Author: User

Two articles have previously reviewed the structure of the linux Virtual File System and the process of reading and writing linux Files. Some of the implementations by specific file system types are not described in detail, but they are implemented by specific file systems. For example, how does the read/write location of a file correspond to the actual disk block? This is implemented by a specific file system. For example, how do I know if a YYY file exists in the XXX directory when searching for a file path? This is also implemented by a specific file system.
A few days ago, my colleagues asked me about inode in linux. I also sorted out my ideas to see what functions the specific file system needs to implement and how it can be implemented.

Inode
Start with inode. Files in the file system form a tree structure, where each node is a file (the directory is also a file), corresponding to an inode. The specific structure of inode varies depending on the file system type, but it should generally contain the following information: file owner (including uid and gid), permission bit, file size, time (Creation Time, modification time, etc.), file content. The most noteworthy is the file content.
Files can be divided into directories and common files. The contents of these two files have different meanings and may have different structures.

For a common file, the content is the file content that the user can see. When a user program reads and writes a file through a system call, it needs to use an index structure to know the disk block on which the offset data in the file should be stored, and then convert the data into read/write requests for the block. When the file content is small, the index information of the file content can be fully stored in the inode node; if the file is large, the index will also be large, and the inode node may not be installed, you need to allocate new blocks to store the index information. Most of the file content is stored outside the inode node, and it is found by the index information in the inode.
For example, in the ext2 file system, the index structure of the file content is as follows (from ULK3):

The part of the file contains less than 12 blocks, and its block number is directly stored in the I _block structure of inode. Starting from 13th blocks, a level-1 indirect index is used, then, the secondary and tertiary indirect indexes are extended as the offset increases.

For a directory, its content is generally structured information, which describes the file situation under the directory (mainly including the name of each file, inode number, and so on ). This needs to be explained by a specific file system. It makes no sense for the user to see the original content. It is meaningful to users only when the specific file system interprets it as a file list. When a user program looks for a file path, the inode of the current directory can traverse every node in the directory (file name + inode number ). After matching the name of the next-level file, you can get the corresponding inode number to find the inode of the next-level file. The information of these next-level files is stored directly in inode nodes or other blocks because of their different sizes.
For example, in the ext2 file system, the directory information structure is as follows (from ULK3):

For each next-level file, the inode number, file name, and file type are recorded.

A symbolic link is a special object. As a proxy for another file, the content of the symbolic link file is the path of the file it represents. The file path information is stored in inode or the block referenced by inode. The specific file system type implements its specific method of reading symbolic links. You can also call the readlink system to read the file path pointed to by the symbolic link.
When a symbolic link is opened, the file path (that is, the content of the symbolic link file) on behalf of it is generally not concerned by the user program. The kernel will encapsulate this layer so that the user program can automatically access the path of the file it represents.

After talking about symbolic links, we also mention hard links. Hard links do not generate new inode, but the same inode is contained in multiple directory nodes, so that the file can be found in multiple file paths.

Super_block
The preceding section describes how to locate the next-level file through a directory node. Where does the first directory node (that is, the root directory) in the file system come from?
Each file system instance has a super_block (super block), which is the source of the file system instance. Super_block mainly contains the following information: the node number of the inode node in the root directory, the corresponding information of the inode and the block, and the allocation information of the block (total number of blocks, which are used, and so on ).
With the corresponding information of inode and block, you can know where the inode corresponding to a node number is stored. (This information can also be a convention, such as inode with number N, which is stored on M * N blocks .)
The root directory inode number in super_block can be used to obtain the root directory node when the file system is mounted. (The inode number in the root directory can also be specified by the specific file system. For example, the inode Number 1 is the root directory .)
In addition, the size of a file may change at any time after it is created. With the block allocation information recorded in super_block, you can allocate new blocks when the file is increased, and only recycle unnecessary blocks when the file is reduced. (This information is not directly stored in super_block, but can be found through super_block, or as a convention. For example, the block after the super_block is a structured information, and the block allocation information is saved in it .)

So where does this super_block come from? This is generally only agreed on (for example, The 0th block is super_block ).
We know that before using a disk, You Need To format it into a format supported by a specific file system. This includes generating super_block, root directory inode, and some necessary files on the disk. In addition, according to the conventions of the file system, place them correctly.

Finally, the block numbers mentioned above are actually virtual block numbers. When a disk is divided into several partitions, each partition is treated as a device and has its own independent block number. Therefore, if super_block is stored on Block 0, therefore, the partition 0 and partition 1 of the same disk both have their own Block 0 (they correspond to two different blocks of the actual disk respectively), so that they both have their own super_block.

Block Cache
As mentioned in linux File read/write analysis, there is a disk high-speed cache layer that caches the file content in the memory. Is there any similar cache for information other than the file content (such as inode?

In linux, devices are also treated as files. Read a disk device file to read the raw data on the disk. What is the raw data? There are super_block, inode, file content, and some dirty data (which is generally understood as idle space by the file system ). In the same way, disk device files have high-speed cache as common files. When the kernel needs to read a certain block on the disk (for example, to read the inode stored on a block), it is actually first to find the corresponding content in the high-speed cache of the disk, if no disk is found, initiate a disk request and update the disk cache after reading the data.
However, inode-like block read/write is not the same as reading and writing the file content. The file content is continuous. When the specific file system stores the file content on the disk, it also tries its best to allocate continuous blocks. In this way, a single disk DMA can read several blocks, which is beneficial to efficiency. However, something like inode is isolated. If block N is an inode, Block N + 1 may be unrelated to it. The read and write operations on an inode are basically for a block and do not require pre-read features like file content cache.
These two caches have different names. The cache of file content is called cache, and the block cache such as inode is called buffer (when you view memory usage in linux, we usually see the buffer and cache sizes respectively ).

In the disk cache, the cache unit is page (because the cache is memory ). The size of a memory page is often several times the block size (common values: 4 kb for page and 1 kb for block ), therefore, you need to make some segments on the memory page of the disk cache to be accurate to the block. Append the buffer_head information to the page Structure of the disk cache page (from ULK3):

Then, when initiating disk read/write requests to the general block layer (see linux File read/write analysis), bio is constructed based on buffer_head (and for file content read/write, bio is constructed based on page ).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.