Analysis of linux Virtual File System

Source: Internet
Author: User

 

Virtual File System (VFS)

In my opinion, the word "virtual" has two meanings:

1. In the same directory structure, several different file systems can be mounted. VFS hides their implementation details and provides users with a unified interface;

2. The directory structure is not absolute. Each process may see a different directory structure. the directory structure is described by "namespace". Different processes may have different namespaces, different namespaces may have different directory structures (because they may mount different file systems ).

 

Operate on opened files

VFS users are processes (users always need to start processes to access the file system ). in the task_struct structure that describes a process, the files Pointer Points to a files_struct structure, and the latter describes the file set opened by the process.

The files_struct structure maintains a pointer array of the file structure corresponding to the opened file. The array subscript is used as the handle (usually referred to as fd) for user program operations on the opened file ). files_struct also maintains the fd bitmap used to allocate an unused fd to the file when it needs to be opened.

 

The file structure is an instance of opened files. the user program uses fd to operate an opened file. The process is simple, from fd index to the corresponding file structure, and then the corresponding operation in f_op of the file structure (such as read, write ).

Different file structures may have different f_op because they have different file types (such as common files, socket, fifo, and so on ).

The corresponding f_op is assigned a value when the file is opened. For an opened file, you only need to use the function in f_op to determine the type of the file. this article does not describe how to implement the specific f_op functions (in fact, this part is also very complicated, see <Linux Kernel File read/write analysis> ).

 

A user program may not call a function in f_op to operate an opened file. Some operations only involve the file structure. for example, if the file structure maintains the current position (f_pos) of the file, the lseek system call is only responsible for moving the pos value.

Attributes such as f_pos and f_mode are stored in the file structure, which means these attributes are related to an instance with opened files. A file may open multiple instances (in one or more processes). These values may be different for each instance.

For example, the two processes simultaneously open the same file and perform read operations. Because the f_pos of the two instances (file structure) is different, the two read operations do not affect each other.

Sometimes multiple processes share the same open file instance. When the clone system call is used to create a sub-process and the CLONE_FILES flag is set, the Parent and Child processes share the files_struct structure, to share all opened file instances. A typical example is multithreading.

 

Open a file

The process of opening a file is complex compared to the operation of opening a file. as shown in the preceding figure, operations on opened files only occupy a small amount of space, while other contents are related to opening files.

 

To open a file, you first need the file path, such as "dir0/dir1/file ". this path is split into multiple levels by '/'. Each level is a file (the directory is also a file, such as dir0 and dir1 ).

At the beginning of searching for this file path, we need a starting point. If the file path starts with '/', it starts with the root directory; otherwise, it starts with the current path.

The two possible start points are stored in the fs_struct structure corresponding to the task_struct of the process. each file is represented by a directory item (dentry) structure in the directory structure. The "Start Point" is also a dentry structure.

When we execute the cd command in shell, it actually changes the dentry representing the current path in the fs_struct structure.

The process can also use the chroot system call to change the dentry representing the root path in the fs_struct structure. In this way, the paths above the dentry will not be visible to the process.

 

As the index structure of the file, several dentry depicts a tree directory structure, which is the directory structure you see (we call it the dentry tree for the time being .)

Each dentry points to an index node (inode) structure, and the latter is the structure that actually describes the file information. Multiple dentry can point to the same inode, thus implementing link.

 

Dentry implements a set of methods (d_op), mainly used to match subnodes. dentry implements a hash to facilitate searching for subnodes.

D_op may vary with the file system type. For example, the hash method may be different and the node matching method may be different (some file system file names are case-sensitive, while others are not ).

The process of searching for the file path is to find the sub-dentry in the dentry tree until the last dentry in the path is found.

 

Although the dentry tree depicts the directory structure of the file system, these dentry structures are not resident memory. The entire directory structure may be very large, so that the memory cannot be installed.

In the initial state, only the dentry representing the root directory and the inode it points to (this is generated when the root file system is mounted, see below ). to open a file, the corresponding node in the file path does not exist. The root directory's dentry cannot find the desired child node (it does not have any child nodes yet ). at this time, we need to use the lookup method in inode-> I _op to find the required inode subnodes (this is usually through the method defined by the specific file system type, search from the file system storage medium. See linux File System implementation analysis). After finding it (inode has been loaded into the memory at this time), create a dentry associated with it.

This process shows that inode and dentry are available first. inode exists in the storage medium of the file system, while dentry is generated in the memory. the existence of dentry accelerates the query of inode.

 

Since the entire directory structure may not be fully loaded into the memory, the dentry generated in the memory will be released when no one is using it. the d_count field records the reference count of dentry. When the reference value is 0, dentry will be released.

The so-called release dentry is not directly destroyed and recycled, but put the dentry into a "least recently used (LRU)" Queue (associated with the corresponding super block ). when the queue is too large or the system memory is insufficient, the dentry that is used at least recently is actually released.

This LRU queue is like a cache pool that accelerates access to duplicate paths. when the dentry is actually released, the corresponding inode will be removed and referenced. if the reference is 0, inode is also released.

When looking for a file path, there are three situations for each node that goes through:

1. The corresponding dentry reference count has not been reduced to 0. They are still in the dentry tree and can be directly used;

2. If the corresponding dentry is not in the dentry tree, try to find it from the LRU queue. dentry In the LRU queue is hashed to a hash for search. after the required dentry is found, the dentry is taken out of the LRU queue and added to the dentry tree again;

3. If the corresponding dentry cannot be found in the LRU queue, you have to find the inode in the storage medium of the file system. After finding the inode, The dentry is created and added to the dentry tree;

 

File System mounting

VFS allows different file systems to be mounted to the same directory structure. The file system mount path is called a mount point.

For example, A disk has two partitions, A and B. A is mounted to the "/" path as the root file system, and B is used as the sub-File System of, mounted under "/mnt/B.

To complete the mounting, the directory "/mnt/" must be included in file system. no matter whether "/mnt/B" exists in A, A dentry corresponds to it, however, this dentry does not correspond to the inode corresponding to "/mnt/B" in A (even if this inode exists ). the d_mounted mark in this dentry is set to indicate that this is a mount point.

If such a mount point is encountered while searching for a file path, the pointer representing the current path will switch from the current dentry to the dentry corresponding to the mounted file system. that is to say, when you access the "/mnt/B" Path in partition A, you actually access the "/" Path in partition B.

 

The file system uses the vfsmount structure to describe. Multiple mounted file systems are also organized into a tree structure.

The vfsmount structure has two pointers to dentry, and mnt_mountpoint points to the mount point of the parent File System dentry (for example, "/mnt/B" in partition "), mnt_root points to the root path dentry of the current file system (for example, "/" in partition B "/"). the two pointers can be used to switch the current path mentioned above.

Therefore, in the process of searching for the file path, you must record the current dentry and the current vfsmount. if the current dentry is a mount point, find the sub-vfsmount whose son is the current dentry through the current vfsmount, and then obtain the mnt_root of the sub-vfsmount.

Multiple vfsmount entries may be mounted to the same dentry. At this time, only one of the vfsmount entries will be selected, and other vfsmount entries will be hidden. the hidden vfsmount will not be selected until the selected vfsmount is uninstalled. with this feature, we can hide directories. for example, if some files are stored in/home/kouu/secret that you do not want others to see, you can mount tmpfs in this directory to hide the files.

 

The sub-file system is always attached to a dentry in the parent file system, while the root file system is referenced by the mnt_namespace object. different mnt_namespace can reference different root file systems and organize different file system mount trees to form different directory structures.

Generally, the newly created process always shares mnt_namespace with its parent process. all processes are child processes of process 1 (init). Generally, all processes use the same mnt_namespace and all live in the same directory structure.

However, when calling the clone system to create a new process, you can specify the CLONE_NEWNS flag to create a new namespace for the sub-process (including mnt_namespace and other namespace ).

 

Previously, we only mentioned that a device is mounted. In fact, in addition to adding the device files of the storage medium to the file system, we also need to register the file system type (corresponding to the file_system_type structure) in the kernel (such as ext2, ext3, tmpfs ). A file system always contains two elements: Device and type.

Registered file_system_type is stored in the linked list structure and found by their registered names (such as ext3. they are file data interpreters that explain the data in the physical storage medium corresponding to the device file.

Each file system has a super block (corresponding to the super_block structure), which is read from the block device through the get_sb method of the file_system_type structure.

A file system can be mounted multiple times to form multiple vfsmount structures. they all correspond to the same super_block. in fact, the file system will read its super_block only when it is mounted for the first time. otherwise, this super_block already exists and can be directly referenced.

During the get_sb process, the inode corresponding to the root path of the file system will also be loaded from the storage medium and the corresponding dentry will be created. super_block-> s_root points to the root path dentry.

 

Data Structure Summary

Finally, let's sort out some of the above data structures and their function pointer sets. These items are easy to find.

 

File_system_type

Meaning: file system type, such as ext2 and ext3

Creation: When the kernel is started or the kernel module is loaded, A file_system_type structure is created for each file system type.

Function: get_sb, which is used to obtain the super block. This function is provided when the file system type is registered.

 

Super_block

Meaning: super block, corresponding to a device storing files

Create: when the file system is mounted, read from the device through the corresponding file_system_type-> get_sb and initialize it. (visible, some information in the super_block structure is saved on the device, some of them are internally initialized)

Function: s_op, a super block function set, mainly including operations on index nodes and file system instances. file_system_type-> get_sb: after reading the super block from the device, use the specific function set corresponding to file_system_type for initialization.

 

Inode

Meaning: Index node, a file stored on the corresponding device

Create: 1) when a super block is loaded, inode as the root is loaded together; 2) Call mknod to create a new index node; 3) when searching for a file path, read from the device and initialize it (like super_block, some information in the inode structure is stored in the device, and some information is initialized in the inner)

Function: I _op, an index node function set, mainly including operations such as creating and deleting subinode. f_op, a file function set, mainly including read and write operations on inode. after inode is created, 1) if it is a special file, based on the type of the corresponding file (including Block devices, character devices, fifo, etc) assign a specific function set (not directly related to the device and file system type); 2) otherwise, the corresponding file system type will provide the corresponding function set, and the Directory and file function set may be different.

 

Dentry

Description: Directory item. It is a tree structure used to search for a file path and is associated with inode.

Create: After inode is created, dentry is created and initialized.

Function: d_op, directory item function set, mainly including query operations for sub-dentry. determined by the file system type

 

File

Description: instance for opening a file

Create: It is created during the open call and corresponds to an inode.

Function: f_op, file read/write, and other operations. 1) equal to inode-> f_op, for common files, block device files, etc.; 2) specified when the file is opened by inode-> f_op-> open function, A typical scenario is a character device. all character devices have the same inode-> f_op. During the inode-> f_op-> open process, find the f_op registered by the corresponding device driver and assign it to file-> f_op.

 

From kouu's home

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.