VFS File System Structure Analysis

Source: Internet
Author: User
Tags symlink

VFS File System Structure Analysis

This article was originally published by fireaxe and can be freely copied and reproduced using GPL. However, for reprinting, please maintain the integrity of the document and indicate the original author and original link. The content can be used at will, but no guarantee is made for the consequences caused by the use of the content.

Author: fireaxe_hq@hotmail.com blog: fireaxe.blog.chinaunix.net
VFS is a core concept of Linux. Most operations in linux require VFS-related functions. VFS is briefly described from the user's point of view. Users not only need to know which file operation functions are available in Linux, but also need to have a clear understanding of the VFS structure to better use it. For example, if hard link and symbolic are not understood about the VFS structure, they cannot be used.
This article first sets up a simple directory model, then introduces the structure of the directory in VFS, and finally summarizes how to use various file operation functions.

In line with the principle of simple use, we mainly use the method of analysis and speculation. In view of my limited level, this article will inevitably have some errors. You are welcome to read rationally and criticize it boldly. Your criticism is the driving force of my progress.

1. directory model

The following directory is used as an example.

Dir is the first level directory. dir contains two subdirectories, subdir0 and subdir1, and a file file0. "Subdir0" contains two files: file1 and file0. Subdir1 has a file file3.

2. Concept of VFS

VFS is a Virtual File System in Linux, also known as Virtual File System Switch ). It provides an abstraction for application programmers and shields the differences between various underlying file systems. As shown in:

Different file systems, such as Ext2/3, XFS, and FAT32, have different structures. If you call open and other file IO functions to open files, the specific implementation will be very different. To avoid this difference, Linux introduces the concept of VFS. As a result, Linux has built a new file system stored in memory. All other file systems must be converted to the VFS structure before they can be called by users.

3. Build VFS

The so-called VFS construction is the process of loading the actual file system, that is, the process of mount being called. As shown in, the following uses an ext2 file system as an example.

This is a simplified Ext2 disk structure and is only used to describe the basic process of using it to build VFS.

The general form of the mount command is: mount/dev/sdb1/mnt/mysdb1

/Dev/sdb1 indicates the device name and/mnt/mysdb1 indicates the mount point.

The basic structure of the VFS file system is the dentry structure and inode structure.

Dentry indicates a point in a file directory, which can be a directory or a file.

Inode represents a file on the disk, which corresponds to disk files one by one.

Inode and dentry do not necessarily correspond to each other. One inode may correspond to multiple dentry items. (Hard link)

During Mount, linux first finds the super block of the disk partition, and then parses the inode table and file data of the disk to construct its own dentry list and indoe list.

Note that VFS is actually built in Ext mode, so the two are very similar (after all, Ext is a Linux native File System ).

For example, inode nodes, Ext and VFS call the file management structure inode, but they are actually different. The inode node of Ext is on the disk, and the inode node of VFS is in the memory. Some member variables in Ext-inode are useless, such as reference counting. They are retained to be consistent with vfs-node. In this way, when using ext-inode nodes to construct vfs-inode nodes, you do not need to assign values one by one, and only need to copy the memory once.

If a non-EXT disk is used, it is not so lucky. Therefore, the mount of a non-EXT disk will be slower.

4. VFS Structure

After the VFS file system is built, the next step is to map the directory model mentioned in section 1 to the VFS struct.

As mentioned above, VFS mainly consists of denty and inode. Dentry is used to maintain the directory structure of VFS. Each dentry item represents the item we read when using ls (each directory and each file corresponds to a dentry item ). Inode is a file node, which corresponds to files one by one. In Linux, a directory is also a file, so dentry also corresponds to an inode node.

Is the structure of the directory model in VFS in section 1.

5 Dentry cache

Each file must correspond to one inode node and at least one dentry item. Suppose we have a GB hard disk, which is filled with empty files. How much memory does it need to reconstruct VFS?

The file occupies at least 1 block (generally 4 K ). If a false dentry and an inode need 100 bytes, The dentry and inode need to occupy 1/40 of the space. GB hard drive requires 2g space. I have recently started to replace the 1 TB hard drive. It takes 25 GB of memory to put down inode and dentry. I believe few computers can afford it.

To avoid resource waste, VFS adopts the dentry cache design.

When a user uses the ls command to view a directory or open a file, VFS creates a dentry item and inode for each directory item and file used here, that is, "create on demand ". Then, a LRU (Least Recently Used) list is maintained. When Linux considers that VFS occupies too many resources, VFS releases dentry and inode items that have not been Used for a long time.

Note that the release is based on memory usage. From a Linux perspective, dentry and inode are inherent in VFS. The difference is whether VFS reads dentry and inode to the memory. For Ext2/3 file systems, the process of building dentry and inode is very simple, but for other file systems, it will be much slower.

After understanding the concept of Dentry cache, we can understand why there are two file locating methods below.

6. Locate the file without dentry

Because of the Dentry Cache mentioned above, VFS cannot ensure that the dentry and inode items are available at any time. The following describes how to locate an entry without a dentry or inode entry.

To simplify the problem, we assume that the dir dentry item has been found (the process of finding the dentry item will be explained later ).

First, find the inode0 node through the dentry0 corresponding to the dir. With the inode node, you can read the information in the directory. The directory contains the list of the next-level directories and file files, including the name and inode number. This information is actually viewed using the ls command. "Ls-I" displays the inode Number of the file.

> Ls-I

975248 subdir0 975247 subdir1 975251 file0

Then, inode2 is reconstructed based on the inode number corresponding to subdir0, And the dentry node of subdir0 is rebuilt through file data (the directory is also a file) and inode2: dentry1.

> Ls-I

975311 file1 975312 file2

Then, inode4 is rebuilt based on the inode number corresponding to file1, And the dentry node of file1 is rebuilt through file data and inode4.

Finally, you can access the file through the inode4 node.

Note: The inode number corresponding to the file is determined, but the inode struct needs to be re-constructed.

7. Locate the file when dentry is available

Once the Dentry item is set up in the dentry cache, the next access will be very convenient.

A key variable in Dentry is d_subdirs, which stores the list of the next-level directories for Fast File locating.

First, find the dentry item named "subdir0" in d_subdirs that represents dentry0 in the dir directory and find dentry1.

In dentry1, find the dentry item named "file1" and find the dentry item corresponding to file1,

Finally, inode4 corresponding to file1 is obtained through the dentry item corresponding to file1.

Compared with the absence of dentry items, operations with dentry items are much simpler.

8 Symbolic link

The command for creating symboliclink is the target file of the ln-s source file.

The symbolic link in Linux is similar to the shortcut in Windows. As shown in, symlink1 is the symbolic link pointing to file1. Symlink1 is also a file, so it has its own independent inode node. In symlink, the relative path of the source file is actually stored.

Most file operations directly perform operations on the target pointed to by the symbolic link, such as open ("symlikn1"). In fact, file3 is opened.

What will happen if file3 is not there? The open function will still open the file according to the file path in symlink1. But file3 does not exist. Therefore, an error is reported indicating that the file does not exist.

9 hard link

In addition to symbolic link, Linux also has the concept of hard link.

Hard link creation is actually a copy of The dentry item, all of which point to the same inode node. When we use write to rewrite the content of file1, the content of hardlink1 will also be rewritten, because they are actually the same file.

As shown in, hardlink1 is a hard link of file1. They all point to the same inode1 node. Inode1 has a counter used to record several dentry items pointing to it. Deleting any dentry does not cause inode1 to be deleted. Inode1 is deleted only when all dentry pointing to inode1 is deleted.

They actually

In a sense, all dentry items are hard links.

10 process management of files

The process control block task_struct contains two variables related to files: fs and files.

Files stores the root and pwd pointer to the dentry item. When the user sets the path, the absolute path will be located through root; the relative path will be located through pwd. (Root of a process is not necessarily the root directory of the file system. For example, the root directory of the ftp process is not the root directory of the file system, so that you can only access the content under the ftp directory)

Fs is a file object list. Each node corresponds to an opened file. When a process locates a file, it constructs a file object and associates it with the inode node through f_inode. When the file is closed, the process releases the corresponding file object. F_mode in File object is the permission selected when it is enabled, and f_pos is the read/write location. When a file is opened multiple times, a new file object is created each time. Each file object has an independent f_mode and f_pos.

11 open process

Opening a file involves a series of structural adjustments, which are described in steps below:

First, create a file management structure, as shown in. The process has opened two files, and then we open a new file.

Step 1: Find the file;

From the above, we can locate the inode node of our file and find the inode node.

Step 2: create a file object;

Create a new file object, put it in the file object list, and point it to the inode node.

Step 3: create a file descriptor

File descriptor is the fd_array maintained in files in the process control module task_struct. Because it is an array, file descriptor has already allocated space in advance. Here, we need to associate an idle file descriptor with a file object. The index number of the file descriptor in the array is the file fd obtained when the file is open.

12 open and dup

The same file can be opened multiple times, as shown in the structure. A new file descriptor and file object will be created each time you open the file. Then point to the inode node of the same file. If the open file and fd1 point to the same file, the newly created file object 2 and fd1 file object 2 point to the same inode2 node.

Linux also provides the dup function for copying file descriptor. Using dup will not create a new non-file object, so the newly created file descriptor and the original filedescriptor will point to the same file object at the same time. In, we get fd2 through dup (fd1), then fd2 and fd1 point to the same file object2.

Because a new object is generated after the two open operations, the file read/write attributes, file read/write location (f_pos), and other information are independent. After dup is used to copy file descriptor, because there are no independent objects, the attributes of a fd or the file read/write location will change accordingly.

13. Effect of Fork on file opening

The operation of Dup is similar to that of fork sub-process.

Is the file structure of an existing parent process:

The structure after fork is used is as follows. Similarly, no new file object is created. Therefore, when fd1 in parent process is moved (such as reading and writing), fd1 in child process is also affected. That is to say, the opened files list is not part of the process, so it will not be copied. Opened files list should be a global resource linked list, and the process maintains a pointer list fd table. Therefore, only the pointer list fd table is copied, not the opened files list.

14 file operation function Parsing

Through the above analysis, you can have a clearer understanding of the scope and usage of each function. Common file operations are listed below:

Function Name

Target object



Dentry, inode

When a file is created, a new dentry and inode are created.


File object

If the object does not exist and the O_CREAT parameter exists, the creat


File object

Delete a file object, but does not delete the object.



Read inode content. If the target is a symbolic link, stat reads the content pointed to by the symbolic link; lstat reads the symbolic link file itself.


File object

Change f_mode in file object


File object

Change f_uid and f_gid in file object



Change the file length.


File object

Reading a file changes f_pos in the file object.


File object, inode

Writing a file changes f_pos in the file object, and changes the file content and update modification time.


File object

Create a new file object


File object

Change f_pos in file object



Create a new dentry entry pointing to the same inode node.



Delete A dentry item. If the inode node to which the dentry points is not used by other dentry items, delete the inode node and the disk file.



Modify the d_name in the dentry phase



Read cannot read the content of the symbolic link file. You must use readlink to read the content.


Dentry, inode

The function is similar to creat, but the property of the created file is symbolic link.

Note: Disk Files correspond to inode nodes one by one. Therefore, disk files are not listed separately in the table.

Reference file:
Advanced Programming in the UNIX Environment (3rd) W. Richard Steven S & Stephen A. Rago
Understanding the Linux Kernel (3rd) Daniel P. Bovet & Marco Cesati

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.