Linux Virtual File System

Source: Internet
Author: User

Original address




In Linux, multiple file systems, such as ext2, ext3, and vfat, are allowed to coexist. You can use the same file I/O system to operate any file in Linux without considering the specific file system format. Further, file operations can be performed across file systems. 1. We can use the CP command to copy data from the hard disk in vfat file system format to the hard disk in ext3 file system format. This operation involves two different file systems.


"Everything is a file" is one of the basic philosophies of Unix/Linux. Not only common files, directories, character devices, Block devices, sockets, etc. are treated as files in Unix/Linux. Although they are of different types, however, it provides the same operation interface.


The virtual file system is the key to achieving the above two Linux features. Virtual File System (VFS) is a software layer in the Linux kernel that provides file system interfaces for user space programs, it also provides an abstract function in the kernel, allowing different file systems to coexist. All file systems depend not only on VFS, but also on VFS.

To support various actual file systems, VFS defines the basic and conceptual interfaces and data structures supported by all file systems; at the same time, the actual file system also provides the abstract interface and data structure expected by VFS, and maintains its own concepts, such as files and directories, in a form consistent with the definition of VFS. In other words, to be supported by Linux, an actual file system must provide an interface that complies with the VFS standard to work with VFS.The actual file system hides the specific implementation details under the unified interface and data structure, so in the VFS layer and other parts of the kernel, all file systems are the same. Figure 3 shows the relationship between VFS and the actual file system in the kernel.


We already know that because VFS is introduced in the kernel, cross-File System File operations can be achieved, and "Everything is a file" can be committed. Why can we implement these two features by introducing VFS? Next, we will begin with the topic of this article: we will briefly introduce some data structures used to describe the VFS model, summarize the relationships between these data structures. Then, select two representative files: sys_open () and sys_read () to illustrate in detail how the kernel interacts with a specific file system through VFS to achieve cross-File System File Operations and promise that "Everything is a file.

Back to Top

Essentially, a file system is a special hierarchical data storage structure that contains files, directories, and related control information. To describe this structure, Linux introduces some basic concepts:

FileA series of logically complete information items. In Linux, apart from common files, other files such as directories, devices, sockets, and so on are also treated as files.In short, "Everything is a file ".

DirectoryA directory is like a folder used to hold related files. Because directories can contain subdirectories, directories can be nested to form file paths. In Linux, directories are also treated as special files, so file operations can also be used in directories.

Directory items are in a file path. Each part of the path is called a directory item, for example, path/home/source/helloworld. c, directory/, home, source and file helloworld. C is a directory item.

Index NodeA Data Structure used to store the metadata of a file. The metadata of a file, that is, the information of a file, is different from that of a file. It contains information related to files, such as the file size, owner, Creation Time, and disk location.

Super BlockThe data structure used to store the control information of the file system. Describes the file system status, file system type, size, number of blocks, index node points, and so on, stored in a specific sector of the disk.

The positional relationship between the above concepts in the disk is shown in Figure 4.


Three confusing concepts about file systems:

CreateThe process of formatting a disk in some way is the process of establishing a file system on it. When a file system is created, control information about the file system is written to a specific location on the disk.

RegisterReport to the kernel and declare that it can be supported by the kernel. It is generally registered when the kernel is compiled; it can also be manually registered by Loading modules. The registration process practically instantiates the data structure struct file_system_type of each actual file system.

InstallThat is, we are familiar with the Mount operation. We can add the file system to the directory tree structure of the Linux root file system so that the file system can be accessed.

VFS describes its structure information based on four main data structures and some auxiliary data structures. These data structures act like objects; each primary object contains an operation object consisting of an operation function table. These operation objects describe the operations that the kernel can perform on these primary objects.

Stores the control information of an installed file system, representing an installed file system. Each time an actual file system is installed, the kernel reads some control information from a specific disk location to fill in the super block object in the memory. One installation instance corresponds to one super block object. A super block records the type of the file system to which it belongs through a domain s_type in its structure.

Based on the needs of source code tracing in the third part, the following describes some of the member domains of the super block structure (the same as below ):

Struct super_block {// the data structure of the super block, struct list_head s_list;/* pointer to the super block linked list */...... Struct file_system_type * s_type;/* file system type */struct super_operations * s_op;/* super block method */...... Struct list_head s_instances;/* file system of this type */......}; Struct super_operations {// super block method ...... // This function creates and initializes a new index Node object struct inode * (* alloc_inode) (struct super_block * SB) under the given super block );...... // This function reads the index node from the disk and dynamically fills in the remaining part of the corresponding index Node object in the memory void (* read_inode) (struct inode *);......};


The index Node object stores information about a file, representing an actual physical file on the storage device. When a file is accessed for the first time, the kernel will assemble the corresponding index Node object in the memory to provide the kernel with all the information necessary to perform operations on the file;Some of this information is stored in a specific disk location, and the other part is dynamically filled during loading.

Struct inode {// index node structure ...... Struct inode_operations * I _op;/* index node operation table */struct file_operations * I _fop;/* file operation set of the file corresponding to the index node */struct super_block * I _sb; /* related Super blocks */......}; Struct inode_operations {// method of indexing nodes ...... // This function creates a new index node for the file corresponding to the dentry object. It is called by the open () system to call int (* Create) (struct inode *, struct dentry *, Int, struct nameidata *); // find the index node struct dentry * (* lookup) (struct inode *) corresponding to the dentry object in a specific directory *, struct dentry *, struct nameidata *);......};


The concept of directory items is introduced mainly for the convenience of searching for files. Each part of a path, whether a directory or a common file, is a directory item object. For example, in the path/home/source/test. C, the directory/, home, source, and file test. c correspond to a directory item object. Different from the previous two objects, the directory item object does not have the corresponding disk data structure, and VFS parses them into Directory item objects one by one during the path name traversal process.

Struct dentry {// directory item structure ...... Struct inode * d_inode;/* Related Index node */struct dentry * d_parent;/* directory item object of parent directory */struct qstr d_name; /* directory item name */...... Struct list_head d_subdirs;/* subdirectory */...... Struct dentry_operations * d_op;/* directory item operation table */struct super_block * d_sb;/* file super block */......}; Struct dentry_operations {// determines whether the directory item is valid; int (* d_revalidate) (struct dentry *, struct nameidata *); // generates a hash value for the Directory item; INT (* d_hash) (struct dentry *, struct qstr *);......};


A file object is a representation of an opened file in the memory. It is mainly used to establish the correspondence between processes and files on the disk.It is created on site by sys_open () and destroyed by sys_close. The relationship between file objects and physical files is a bit like that between processes and programs. When we stand in the user space and look at VFS, we just need to deal with file objects without worrying about Super blocks, index nodes, or directory items. Because multiple processes can open and operate the same file at the same time, the same file may have multiple corresponding file objects. A file object only indicates an opened file in the process view, which in turn points to the directory item object (which in turn points to the index node ). The object corresponding to a file may not be unique, but its corresponding index node and directory item object are undoubtedly unique.

Struct file {...... Struct list_head f_list;/* file object linked list */struct dentry * f_dentry;/* related directory item object */struct vfsmount * f_vfsmnt; /* Related File System installation */struct file_operations * f_op;/* file operation table */......}; Struct file_operations {...... // File read operation ssize_t (* read) (struct file *, char _ User *, size_t, loff_t *);...... // File write operation ssize_t (* write) (struct file *, const char _ User *, size_t, loff_t *);...... INT (* readdir) (struct file *, void *, filldir_t );...... // Int (* open) (struct inode *, struct file *);......};


Different file system types are distinguished based on the physical media where the file system is located and the organization of data on the physical media. The file_system_type structure is used to describe the type information of a specific file system. All file systems supported by Linux have only one file_system_type structure, regardless of whether zero or multiple instances are installed in the system.

In contrast, each time a file system is installed, a vfsmount struct is created, which corresponds to an installation point.

Struct file_system_type {const char * Name;/* file system name */struct subsystem subsys;/* sysfs subsystem object */INT fs_flags; /* file system type flag * // * When the file system is installed, read the super block from the disk and assemble the super block object in the memory */struct super_block * (* get_sb) (struct file_system_type *, Int, const char *, void *); void (* kill_sb) (struct super_block *);/* terminate access to the super block */struct module * owner; /* file system module */struct file_system_type * Next;/* The next file system type in the linked list */struct list_head fs_supers; /* super block object linked list with the same file system type */}; struct vfsmount {struct list_head mnt_hash;/* hash list */struct vfsmount * mnt_parent; /* parent File System */struct dentry * mnt_mountpoint;/* directory item object of the Installation Point */struct dentry * mnt_root; /* root directory item object of the file system */struct super_block * mnt_sb;/* super block of the file system */struct list_head mnt_mounts; /* subfile system linked list */struct list_head mnt_child;/* subfile system linked list */atomic_t mnt_count;/* use count */INT mnt_flags; /* installation flag */char * mnt_devname;/* Device File name */struct list_head mnt_list;/* descriptor linked list */struct list_head mnt_fslink; /* Expiration list of the specific file system */struct namespace * mnt_namespace;/* related namespace */};


Struct files_struct {// The number of opened file sets atomic_t count;/* structure usage count */...... Int max_fds;/* Maximum number of file objects */INT max_fdset;/* Maximum number of file descriptors */INT next_fd;/* Next file descriptor */struct file ** FD; /* array of all file objects */...... }; Struct fs_struct {// establish the relationship between the Process and the file system atomic_t count;/* use count of the Structure */rwlock_t lock;/* protect the lock of the struct */INT umask; /* default file access permission */struct dentry * root;/* directory item object in the root directory */struct dentry * Pwd; /* directory item object of the current working directory */struct dentry * altroot;/* directory item object of the root directory to be selected */struct vfsmount * rootmnt; /* install point object in the root directory */struct vfsmount * pwdmnt;/* Pwd install point object */struct vfsmount * altrootmnt; /* Available root directory Installation Point Object */};


Struct nameidata {struct dentry * dentry;/* directory item object address */struct vfsmount * MNT;/* Installation Point Data */struct qstr last; /* The last component */unsigned int flags in the path;/* Find the identifier */INT last_type;/* type of the last component in the path */unsigned depth; /* the nesting depth of the current symbolic link, which cannot be greater than 6 */char * saved_names [max_nested_links + 1]; // * pathname */Union {struct open_intent open related to the nested symbolic link;/* describes how to access the file */} intent;/* dedicated data */};


The above data structure does not exist in isolation. It is through their organic connection that VFS can work normally. The following figures describe the relationship between them.

As shown in figure 5, all file systems supported by Linux have only one file_system_type structure, regardless of whether zero or multiple instances are installed in the system. Each time a file system is installed, there is a super block and a decoration. A super block points to its specific file system type through its domain s_type. The specific file system uses a fs_supers domain in file_system_type to link a super block with the same file type. Super blocks of the same file system type are chained through the domain s_instances.


See Figure 6:A process uses files_struct files in task_struct to understand the file objects it is currently opening. What we call a file descriptor is actually the index value of the file object array opened by the process. The file object uses the domain f_dentry to find its corresponding dentry object, and then the domain d_inode of the dentry object finds its corresponding index node, in this way, the association between the file object and the actual physical file is established.Finally, it is very important that the file operation function list corresponding to the file object is obtained through the I _fop field of the index node. Figure 6 plays a major role in understanding the third part of the source code.


Back to Top

So far, this article mainly describes the operating mechanism of VFS theoretically. Next we will go deep into the source code layer and elaborate on two representative systems that call sys_open () and sys_read () to better understand the interface mechanism provided by VFS to a specific file system. As this article focuses more on the entire process system of file operations, we will not care about some detailed processing when tracking the source code. Due to space limitations, only the relevant code is listed. Source code in this article comes from the linux-2.6.17 kernel version.

Before going deep into sys_open () and sys_read (), Let's first look at the context that calls sys_read. Figure 7 describes the entire process from the read () call of the user space to the data read from the disk. When your application calls the file I/O read () operation, sys_read () is triggered. sys_read () finds the specific file system where the file is located, pass control to the file system, and then the specific file system interacts with the physical media to read data from the media.


Sys_open () system calls to open or create a file, and returns the file descriptor of the file. Figure 8 shows the main function call relationship diagram in sys_open () implementation code.


Because sys_open () has a large amount of code and the function call relationship is complex, the following describes the overall analysis of the function. For some key points, the key code is listed.

A. From the function call relationship diagram of sys_open (), we can see that sys_open () passes the baton to do_sys_open () after some simple parameter tests ():

1) First, get_unused_fd () gets an available file descriptor. Through this function, we can see that the file descriptor is actually the index value of a file object corresponding to the list of files opened by a process;

2) then, do_filp_open () opens the file and returns a file object, representing a file opened by the process. The process reads and writes physical files through such a data structure.

3) Finally, fd_install () establishes the connection between the file descriptor and the file object. Later, the process will perform read and write operations on the file descriptor by manipulating the file descriptor.

B. do_filp_open () is used to open the file and return a file object. You need to find the file before opening it:

1) open_namei () is used to search for a file based on the file path name. It is carried out by using the nameidata data structure with the path information;

2) After the search is complete, the nameidata filled with path information will be returned to the following function nameidata_to_filp () to obtain the final file object, the nameidata data structure will be released immediately.

C. open_namei () is used to find a file:

1). path_lookup_open () implements the file search function. If the file to be opened does not exist and a new process is required, you can call path_lookup_create (), the latter and the former encapsulate the same actual path lookup function, but the parameters are different, so that they are biased in the processing details;

2) When opening a file in the form of a new file, that is, when the o_creat ID is set, a new index node needs to be created to create a file. In vfs_create (), the core statement Dir-> I _op-> Create (Dir, dentry, mode, Nd) it can be seen that it calls the method provided by the specific file system to create an index node. Note:The concept of the index node here is only in memory, and its relationship with the physical index node on the disk is the same as that of the files in memory and on the disk. At this time, the newly created index node cannot completely mark the successful creation of a physical file. Only when the index node is written back to the disk is the real creation of a physical file.Think about how to open a file in the new way, read and write it, but it is not saved and closed, then the index node in the memory will go through the process from new to disappear, however, the disk never knows that someone once thought about creating a file, because the index node does not write back.

3). path_to_nameidata () is filled with the nameidata data structure;

4) may_open () checks whether the file can be opened. Some files, such as linked files and directories with write permission, cannot be opened, first, check whether the file specified by Nd-> dentry-> inode is such a type of file. If yes, an error is returned. Some files cannot be opened in trunc mode. If the files referred to by Nd-> dentry-> inode belong to this type, the trunc flag is explicitly disabled. If a file is opened in trunc mode, update the information of Nd-> dentry-> inode.

Whether path_lookup_open () or path_lookup_create (), the _ path_lookup_intent_open () function is called to find files.When searching, the path components are parsed into Directory item objects layer by layer during path traversal. If this directory item object is in the directory item cache, it is obtained directly from the cache. If this directory item does not exist in the cache, an actual disk reading operation is performed to read the index node corresponding to this directory item from the disk. After the index node is obtained, the connection between the index node and the Directory item is established. In this loop, the index node is found until the directory item corresponding to the target file is found, the index node can find the corresponding super block object to know the type of the file system where the file is located. Read the index node corresponding to this directory item from the disk. This triggers an interaction between VFS and the actual file system.According to the previous VFS theory,The read index node method is provided by the super block. When an actual file system is installed, the information of the super block created in the memory is filled by the information of the actual file system, the relevant information here includes the list of super block operation functions defined by the actual file system, and of course the specific execution method of the read index node.When you continue to track the ext3_read_inode () of an actual file system ext3, it can be found that an important task of this function is to set different index node operation function tables and file operation function tables for different file types.

Void ext3_read_inode (struct inode * inode ){...... // The normal file if (s_isreg (inode-> I _mode) {inode-> I _op = & strong; inode-> I _fop = & ext3_file_operations; ext3_set_aops (inode );} else if (s_isdir (inode-> I _mode) {// directory file inode-> I _op = & ext3_dir_inode_operations; inode-> I _fop = & ext3_dir_operations ;} else if (s_islnk (inode-> I _mode) {// is the connection file ...... } Else {// if all three of the above conditions are excluded, the device driver // The device here also includes a set of pseudo devices such as character writing and FIFO ...... }


This is a key link between VFS and the actual file system. According to the analysis in section 3.1.1,When the actual file system is called to read the index node, the actual file system will assign different file operation function sets to the index node based on different types of files, for example, a common file has a set of operation functions corresponding to a common file, and a device file has a set of operation functions corresponding to a device file.In this way, when the file operation function set of the corresponding index node is assigned to the file object, and later operations on the file, such as read operations, although VFS executes the same read () operation interface for different files, the kernel knows how to differentiate different file types during real reading.

Static struct file * _ dentry_open (struct dentry * dentry, struct vfsmount * MNT, int flags, struct file * F, INT (* open) (struct inode *, struct file *) {struct inode * inode ;...... // The whole function is to fill in a file object ...... F-> f_mapping = inode-> I _mapping; F-> f_dentry = dentry; F-> f_vfsmnt = mnt; F-> f_pos = 0; // assign the file operation function set of the corresponding index node to the operation list of the file objectF-> f_op = fops_get (inode-> I _fop );...... // If the file defines the open operation, execute this specific open operation. If (! Open & F-> f_op) Open = f-> f_op-> open; If (open) {error = open (inode, f); If (error) goto cleanup_all; ...... Return F ;}


Sys_read () system call is used to read data from opened files. If the read operation succeeds, the number of bytes read is returned. If the end of the file has been reached, 0 is returned. Figure 9 shows the function call relationship diagram in the sys_read () implementation code.


To read an object, open it first. From the 3.1 Summary, we can see that when opening a file, a file object will be assembled in the memory, and the operation method you want to perform on the file has been set in the file object. Therefore, when reading a file, VFS performs some simple conversions (the file object is obtained by the file descriptor; the core idea is to return the file object pointed to by current-> files-> FD [FD]), you can use the statement file-> f_op-> Read (file, Buf, count, pos) It is easy to call the corresponding method of the actual file system to read the file.

Back to Top

At this point, we can also explain why files can be operated across file systems in Linux. For example, copy the.txt file in the vfatformat file to the ext3format file and name it B .txt. This includes two processes: Reading a.txt and writing B .txt. Open the file before performing read/write operations. According to the previous analysis, when opening a file, VFS will know the file system format corresponding to the file. When operating on the file later, VFS will call the operation method of the corresponding actual file system. The slave node writes data to the disk. This achieves the final cross-File System replication operation.

VFS treats common files, special directories, devices, and so on as files and operates them on the same file operation interface. Open the file first. When opening the file, VFS will know the file system format corresponding to the file. When VFS passes control to the actual file system, the actual file system then makes a specific distinction to perform different operations on different file types. This is the root of "Everything is a file.

Back to Top

VFS is an abstract software layer in a Linux File System. Because of its support, many different actual file systems can coexist in Linux, and cross-file system operations can be implemented. VFS uses its four main data structures: super block, index node, directory item, file object, and some auxiliary data structures, provides the same operation interface to Linux Files, directories, devices, sockets, and so on, such as opening, reading, writing, and closing. Only when control is passed to the actual file system will the actual file system be differentiated to perform different operations on different file types. It can be seen that, with the existence of VFS, cross-file system operations can be performed, and "Everything is a file" in Unix/Linux can be implemented.


Related Article

E-Commerce Solutions

Leverage the same tools powering the Alibaba Ecosystem

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.