This article is partially organized from the network
file System management under Linux1.VFS File System Overview
Linux uses VFS to manage file systems, and one of the principles of Linux design is the everything is file. Therefore, the file management system is the core embodiment of Linux design.
The full name of the VFS is the virtual file system.
In general, the file system under Linux can be divided into three main parts: first, the system call of the upper file system, the second is the virtual file system VFS (Filesystem Switch), and the third is the actual file system attached to the VFS, such as ext2,jffs.
VFS is a software mechanism, perhaps called the Linux file System Manager more precise point, and its related data structure exists only in the physical memory. So during each system initialization, Linux is the first to construct a VFS directory tree in memory (called namespace in the Linux source code), in fact, to establish the corresponding data structure in memory. VFS directory tree in the Linux file system module is a very important concept, I hope readers do not confuse it with the actual file system directory tree, in my opinion, the main purpose of the VFS directory is used to provide the actual file system mount point, of course, in the VFS also involved in file-level operations, This article does not elaborate on this situation. The following refers to a directory tree or directory, or, if not specifically described, the directory tree or directory of the VFS. The figure is a possible image of the directory tree in Memory:
2. Registration of File Systems
The file system here refers to the actual file systems that may be mounted to the directory tree, the so-called actual file system, meaning that the actual operations in the VFS are ultimately done through them, and does not mean that they must exist on a particular storage device. For example, in some Linux machines are registered with "Rootfs", "proc", "ext2", "SOCKFS" and other more than 10 kinds of file systems.
2.1 Data Structures
In the Linux source code, each of the actual file systems is represented by the following data structures:
struct File_system_type {const char *name;int fs_flags;struct Super_block * (*read_super) (struct super_block *, void *, in t); struct module *owner;struct file_system_type * next;struct list_head fs_supers;};
The registration process will actually represent the instantiation of the struct FILE_SYSTEM_TYPE data structure of each actual filesystem, and then form a list of the links in the kernel with a global variable called File_systems that points to the table header of the linked list.
2.2 Registering the Rootfs file system
In many of the actual file systems, the reason why the Rootfs file system registration process, is because the file system VFS is too close, if EXT2/EXT3 is the native file system of Linux, then the Rootfs file system is the basis of the VFS existence. The general file system registration is done through the Module_init macro and the Do_initcalls () function (readers can understand the process by reading the MODULE_INIT macro declaration and Arch\i386\vmlinux.lds file), but Rootfs Registration is done through the INIT_ROOTFS () initialization function, which means that the ROOTFS registration process is an integral part of the Linux kernel initialization phase.
Init_rootfs () completes the Rootfs file system registration by calling the Register_filesystem (&rootfs_fs_type) function, where rootfs_fs_type is defined as follows:
struct File_system_type rootfs_fs_type = {name: "Rootfs", Read_super:ramfs_read_super, fs_flags:fs_nomount| Fs_litter, Owner:this_module, }
The File_systems linked list structure after registration is as follows:
the establishment of the 3.VFS directory tree
Since it is a tree, the root is the basis of its existence, this section explains how Linux establishes the root node in the initialization phase, the "/" directory. This will include the specific process of mounting the Rootfs file system to the root directory "/". The code that constructs the root directory is in the Init_mount_tree () function (FS\NAMESPACE.C).
First, the Init_mount_tree () function calls Do_kern_mount ("Rootfs", 0, "Rootfs", and NULL) to mount the previously registered Rootfs file system. This may seem a bit odd, because according to the previous statement, it seems that the mount directory should be mounted before the corresponding file system is mounted, but the VFS does not seem to have established its root directory. It doesn't matter, this is because here we call Do_kern_mount (), which naturally creates our most important and most critical root directory (in Linux, the data structure of the directory is struct dentry).
In this scenario, the work of Do_kern_mount () is mainly:
1) Call the Alloc_vfsmnt () function to request a chunk of that type of memory space (struct Vfsmount *mnt) in memory and initialize its partial member variables.
2) Call the Get_sb_nodev () function to allocate a super block struct (struct Super_block) sb in memory and initialize its partial member variable to insert the member s_instances into the Rootfs file system type Structure fs_supers refers to the To the doubly linked list.
3) Call the Ramfs_read_super () function through the Read_super function pointer in the Rootfs file system. Remember that when you registered the Rootfs file system, its member Read_super Pointer pointed to the Ramfs_read_super () function, see.
4) The Ramfs_read_super () function call Ramfs_get_inode () allocates an inode structure (struct inode) inode in memory and initializes some of its member variables, of which the more important ones are i_op, I_FOP, and i_s B:
Inode->i_op = &RAMFS_DIR_INODE_OPERATIONS;INODE->I_FOP = &DCACHE_DIR_OPS;INODE->I_SB = SB;
This allows future commands such as file-system invocation of VFS-initiated files to be taken over by the corresponding function interface in the Rootfs file system.
5) After allocating and initializing the inode structure, the Ramfs_read_super () function calls the D_alloc_root () function to establish a critical root directory (struct dentry) dentry for the VFS directory tree, and Dentry in D_ The SB pointer points to the Sb,d_inode pointer to the inode.
6) Point the MNT_SB pointer in the MNT to the Sb,mnt_root and mnt_mountpoint pointers to Dentry, while the mnt_parent pointer points to itself.
Thus, when the Do_kern_mount () function returns, the relationship between the above-assigned data structures and the Rootfs file system will be as shown.
The numbers below the structure blocks of MNT, SB, Inode, and Dentry indicate the order in which they are allocated in memory. For reasons of length, only part of the member variables are given in each structure, and the reader can retrace the source code as shown in the diagram to deepen understanding.
Finally, the Init_mount_tree () function prepares the namespace domain in its process data block for the system's most-started process (that is, the init_task process), primarily to set the MNT and Dentry established in the Do_kern_mount () function The information is recorded in the process data block of the init_task process, so that all subsequent processes from the Init_task process inherit this information, and we can see why this is done in the process of creating a directory in the VFS later with Sys_mkdir. The main code to establish namespace for a process is as follows:
namespace = Kmalloc (sizeof (*namespace), gfp_kernel); List_add (&mnt->mnt_list, &namespace->list); MNT is returned by Do_kern_mount () Namespace->root = Mnt;init_task.namespace = Namespace;for_each_task (p) {get_ Namespace (namespace);p->namespace = namespace;} Set_fs_pwd (Current->fs, Namespace->root, namespace->root->mnt_root); Set_fs_root (Current->fs, Namespace->root, Namespace->root->mnt_root);
The last two lines of the code are records of the MNT and dentry information established in the Do_kern_mount () function in the FS structure of the current process.
Above the history of a large number of data structures, in fact, the ultimate goal is to build a VFS directory tree in memory, more precisely, init_mount_tree () This function for the VFS set up the root directory "/", and once the root, then this tree can grow, For example, it is possible to set up a new leaf node in this tree by system call Sys_mkdir, so the system designer then mounts the Rootfs file system to the root directory of the tree. About ROOTFS This file system, if the reader looks at the previous figure 2 in its file_system_type structure, it will find a member function pointer Read_super point to the Ramfs_read_super, single from the function name of Ramfs, The reader will presumably be able to guess that the file involved in the file operation is directed at the data object in memory, and indeed it is. From another perspective, because the VFS itself is a data object in memory, the operation on it is limited to memory, which is also very logical. In the following chapters, we will use a concrete example to discuss how to add a new directory node to the VFS using the ROOTFS provided by the function tree.
The main purpose of each directory in the VFS is to provide a mount point for mounting the file system later. Therefore, the real file operation is to be carried out by the mounted file system provided by the function interface.
4. The establishment of the VFS directory
To better understand VFS, let's look at a practical example of how Linux builds a new directory "/dev" in the VFS root directory.
To create a new directory in the VFS, first we have to search the directory to find information about the parent directory of the directory that will be established, because "with, Mao". For example, to set up a directory/home/ricard, you must first search along the directory path, in this case first from the root directory, and then found in the root directory home, and then down, is to create a new directory name Ricard, then the first is to search the directory, in this case is to find Ricard the parent directory of this new directory, which is the information corresponding to the home directory.
Of course, if the search process found an error, such as the parent directory to build the directory does not exist, or the current process does not have the appropriate permissions, and so on, the system will inevitably call the relevant procedures for processing, for this situation, this article is not mentioned.
Linux uses system invoke Sys_mkdir to add new nodes to the VFS directory tree. At the same time, the following data structure is introduced for the matching path search:
struct Nameidata {struct dentry *dentry;struct vfsmount *mnt;struct qstr last;unsigned int flags;int last_type;};
This data structure is used in the process of path search to record relevant information, which plays a role like "road sign". The Dentry in the first two entries record information about the parent directory to be built, and the MNT members will explain next. The last three entries record information for the final node of the found path (that is, the directory or file to be built). Now call Sys_mkdir ("/dev", 0700) to establish the directory "/dev", where parameter 0700 we do not control it, it is just a certain pattern of the directory that will be established. The Sys_mkdir function first calls path_lookup ("/dev", lookup_parent, &nd) to find the path, where ND is the variable that the struct nameidata nd declares. In the following narrative, because of the cumbersome function call relationship, in order to highlight the process line, will not strictly follow the function of the call relationship to describe.
Path_lookup found that "/dev" starts with "/", so it looks down from the root directory of the current process, with the following code:
Nd->mnt = Mntget (current->fs->rootmnt); nd->dentry = Dget (current->fs->root);
Remember that in the second half of the Init_mount_tree () function, the newly established VFS root information was recorded in the process data block of the init_task process, so in this scenario, nd->mnt points to the MNT variable in Figure 3, Nd->dentr Y points to the dentry variable in Figure 3.
Then call the function Path_walk then look down, find the last variable nd return information is nd.last.name= "Dev", Nd.last.len=3,nd.last_type=last_norm, as for the ND in MNT and dentry into In this scenario, there is no change in the value set above. In such a lap, just use ND to record the relevant information, the actual directory establishment work does not really unfold, but the previous work to create a new node to collect the necessary information.
Okay, that's it. The actual creation of a new directory node will be expanded, which is done by the function lookup_create, which will pass two parameters when calling this function: Lookup_create (&nd, 1), where the parameter nd is the previously mentioned variable, The parameter 1 indicates that a new directory is to be created.
The general process here is a new allocation of memory space for a struct dentry structure that records the information corresponding to the dev directory, which will be attached to its parent directory, which is the dentry structure corresponding to the "/" directory in which the linked list is implemented. Next, a struct inode structure is assigned. The I_SB in the Inode and the D_SB in the Dentry respectively point to the SB, so it seems that there is no need to reassign a super block structure when creating a new directory under the same filesystem, since they all belong to the same filesystem, so a filesystem only corresponds to one super block.
In this way, when the call to Sys_mkdir succeeds in creating a new directory "/dev" in the VFS directory tree, the relationship between the new data structure is as shown in the base. Two rectangular blocks of darker color new_inode and new_entry are the newly allocated memory structures in the Sys_mkdir () function, and the structure of the mnt,sb,dentry,inode in the graph is still the corresponding data structure, and the link relationship between them is constant ( In order to avoid too many link curves, it ignores some link relationships, such as the link between mnt and Sb,dentry, which the reader can see on the basis of the above.
It is important to emphasize that since the Rootfs file system has been mount to the VFS tree, it will inevitably participate in the process of sys_mkdir, in fact, throughout the process, Rootfs file system Ramfs_mkdir, Ramfs_lookup and other functions are had been called.
5. Mount the file system in the VFS tree
In this section, you describe the process of mounting (mount) a file system to one of the directories (Mount point) in the VFS directory tree.
This process can be simply described as installing a file system (File_system_type) on a device (Dev_name) to an installation point (Dir_name) on the VFS directory tree. It solves the problem of translating operations on a directory in the VFS directory tree into the corresponding operation of the actual file system on which it is installed. For example, if you install the root file system on Hda2 (assuming that the file system type is ext2) to the newly established "/dev" directory in the previous section (at which point the "/dev" directory becomes the installation point), then the installation should achieve the following purpose: "/dev" to the VFS file system The directory executes the "LS" directive, which should list all directories and files in the root directory of the ext2 file system on Hda2. It is clear that the key here is how to translate the directory Operations directive for "/dev" in the VFS tree into the corresponding instruction in the actual filesystem of the ext2 installed on it. So, the next narrative will grasp how to transform this core issue. Before the narrative, readers may wish to imagine how the Linux system will solve this problem. Remember that the operation of the directory or file will ultimately be performed by the corresponding function in the function table pointed to by I_op and I_FOP in the inode structure corresponding to the directory or file. So, regardless of the final solution, it is conceivable that the calls to I_op and I_fop in the inode corresponding to the "/dev" directory will be converted to I_OP and I_FOP operations in the inode corresponding to the root file system ext2 in the Hda2.
The initial process is initiated by the Sys_mount () system call function, which declares the following:
Asmlinkage Long Sys_mount (char * dev_name, char * dir_name, char * type,unsigned long flags, void * data);
where the parameter char *type identifies the file system type string that will be installed, which is "ext2" for the ext2 file system. The parameter flags are the number of pattern identifiers at the time of installation and, like the following data parameters, this article does not focus on it.
To help readers understand the process better, the author uses a concrete example to illustrate that we are going to install the Ext2 file system on the 2nd partition (HDA2) of the future autonomous hard disk into the "/dev" directory created earlier. Then the call to the Sys_mount () function is as follows:
Sys_mount ("Hda2", "/dev", "ext2",...) ;
After copying the parameters from user space to kernel space, the function calls the Do_mount () function to start the actual installation of the file system. Similarly, to facilitate the narrative and clarity of the main process, the following instructions will not be strictly followed by specific function invocation details.
The Do_mount () function calls the Path_lookup () function first to obtain information about the installation point, as described in creating a directory, and the information for that installation point is ultimately recorded in a variable of the struct Nameidata type, which is convenient for the narrative and is written in the variable ND. In this example, when the Path_lookup () function returns, the information recorded in ND is as follows: Nd.entry = New_entry; Nd.mnt = MNT, as shown in variables 3 and 4 here.
The Do_mount () function then decides to invoke one of the following four functions according to the calling parameter flags: Do_remount (), Do_loopback (), Do_move_mount (), Do_add_mount ().
In our current example, the system calls the Do_add_mount () function to install an actual file system to the installation point "/dev" in the VFS tree. In Do_add_mount (), the main completion of two important things: first, to obtain a new installation area block, the second is to add the new Installation area block to install the system linked list. They are done by calling the Do_kern_mount () function and the Graft_tree () function, respectively. Here the description may be a bit abstract, such as installing the area block, install the system chain list, but do not worry, because they are the author's own definition of the concept, and so on to the back there will be a special chart interpretation, will be clear.
Do_kern_mount () function to do is to create a new installation area block, the specific content in the previous chapter of the VFS directory tree in the establishment of the story has been described here, not to repeat.
What the Graft_tree () function does is to add a variable of the struct Vfsmount type returned by the Do_kern_mount () function to the installation system list, and Graft_tree () the newly assigned struct vfsmount type Variables into a hash table, the purpose of which we will see later.
Thus, when the Do_kern_mount () function returns, on the basis of Figure 4, the relationship between the new data structure is shown in 5. Among them, the red circle area inside the data structure is called to do the installation area block of things, which may be called E2_MNT for the installation of the area block pointer, the blue arrow curve constitutes the so-called installation system chain list.
After figuring out the relationship of the data structures formed after these functions are called, let's go back to the question we started with in this chapter, after installing the Ext2 file system on "/dev" and how the operation on that directory translates to the corresponding operation on the ext2 file system. As you can see from Figure 5, the call to the Sys_mount () function does not directly alter the I_OP and I_FOP pointers in the inode (the New_inode variable in the figure) in the "/dev" directory, and the dentry for "/dev" (that is, the New_dentry variable) structure is still in the VFS directory tree, and is not hidden from it, correspondingly, from the hda2 on the ext2 file system root directory corresponding to the e2_entry is not as I originally imagined the VFS directory tree New_dentry Instead, how is this transformation going to be achieved?
Please note the following code:
while (D_mountpoint (dentry) && __follow_down (&nd->mnt, &dentry));
This code is called in the Link_path_walk () function, and Link_path_walk () will eventually be called by the Path_lookup () function, and if the reader reads the Linux file system part of the code, you should know Path_lookup () function is an important basic function in the entire Linux cumbersome file system code. In short, this function is used to parse the file path name, where the file pathname is the same as the concept we normally involve in the application, such as when open or read a file/home/windfly.cs in a Linux application, where the/home/ Windfly.cs is the file path name, the responsibility of the Path_lookup () function is to search the file path name, until the target file to find the corresponding directory of Dentry or target is directly a directory, I do not want to be in a limited space to explain this function in detail, Readers just remember that path_lookup () will return a target directory.
The code above is so insignificant that it is often overlooked when you first read the file system's code, but the conversion from VFS to the actual file system operation is done by it, and the installation of the file system implemented in the VFS is a must. Now let's take a closer look at the code: D_mountpoint (Dentry) is simple, it simply returns the value of the d_mounted member variable in dentry. The dentry is still something on the VFS directory tree. If a directory on the VFS directory tree has been installed once, the value is 1. A directory in the VFS can be installed multiple times, followed by an example to illustrate this situation. In our example, the new_dentry in "/dev" corresponds to d_mounted=1, so the first condition in the while loop is satisfied. What does the __follow_down (&nd->mnt, &dentry) code do next? To this we should remember, here nd in the Dentry member is in the new_dentry,nd in the MNT member is in the MNT, so we can now put __follow_down (&nd->mnt, &dentry) rewrite into _ _follow_down (&mnt, &new_dentry), next we will rewrite the code of the __follow_down () function (just to get rid of some of the less relevant code, and for the sake of illustration, add a sequence number to the line of code) as follows:
static inline int __follow_down (struct vfsmount **mnt, struct dentry **dentry) {struct Vfsmount *mounted;[ 1]mounted = Lookup_mnt (*mnt, *dentry), if (mounted) {[2]*MNT = mounted;[ 3]*dentry = Mounted->mnt_root;return 1;} return 0;}
The Lookup_mnt () function in the code line [1] is used to find a pointer to the block of the installation area of a directory that was last mounted at the time of the VFS directory tree, and in this case, the e2_mnt in Figure 5 will eventually be returned. As for the principle of search, here is a rough description. Remember when we install the Ext2 file system to "/dev", in the later call the Graft_tree () function, in this function will be in Figure 5 of the installation area block pointer e2_mnt to a hash table (Linux 2.4.20 source code called Mount_hash table), and the key value of the item is generated by the dentry (in this case, New_dentry) and Mount (mnt in this example) that corresponds to the mount point, so naturally, when we know that a dentry in the VFS tree has been installed (the Dentry becomes a mount point), and to find its most recent installation area block pointer, the same dentry and mount that corresponds to that installation point generates a key value, which is the value to index mount_hashtable, Naturally you can find the mount point corresponding to the mounting area block pointer formed by the head pointer of the linked list, and then traverse the list when a block pointer is found for an installation area, which is recorded as P, when the following conditions are met:
(p->mnt_parent = = mnt && P->mnt_mountpoint = = dentry)
P is the mounting area block pointer corresponding to the mount point. When the pointer is found, the MNT member in ND is replaced with the installation area block pointer, and the Dentry member in ND is replaced by the Dentry pointer in the installation area block. In our example, the E2_mnt->mnt_root member points to E2_dentry, which is the "/" Directory of the Ext2 file system. Thus, when the path_lookup () function searches for "/dev", the Dentry member in ND is E2_dentry, instead of the original new_dentry, and the MNT member is replaced with E2_MNT, the conversion is done unconsciously.
Now consider the installation of an installation point multiple times, as an example, we assume that after installing a ext2 file system on "/dev", an NTFS file system is installed on it. The Path_lookup () function is also called on the path where the installation point is located before the installation, but this time because the Ext2 file system has been installed on the "/dev" directory, the information returned by ND is: nd.dentry = E2_dentry, Nd.mnt = e2_mnt. Thus, at the second installation, the installation point has been changed from Dentry to E2_dentry. Next, again, the system allocates an installation area block, assuming that the pointer to the installation area block is ntfs_mnt, and the Dentry in the zone block is ntfs_dentry. The ntfs_mnt's parent pointer points to Mnt_root in e2_mnt,mnfs_mnt, pointing to Ntfs_dentry that represents the root of the NTFS file system. The system then uses E2_dentry and e2_mnt to generate a new hash key value, which is used as an index to add ntfs_mnt to the mount_hashtable, while the e2_dentry value of the member d_mounted is set to 1. This will end the installation process.
As the reader may already know, the most recent installation on the same installation point hides several previous installations, and we explain the process in the following example:
After installing the EXT2 and NTFS file systems into the "/dev" directory, we then called the Path_lookup () function to search for "/dev", where the function first found the dentry and mnt corresponding to the installation point "/dev" under the VFS directory tree, when it sent Now Dentry member of the d_mounted is 1, so it knows that there is already a file system installed on the dentry, so it through dentry and mnt to generate a hash value, through this value to search mount_hashtable, according to the installation process, it It should be possible to find the E2_MNT pointer and return it, while the original dentry has been replaced with E2_dentry. Look back at the following code that was mentioned earlier: while (D_mountpoint (dentry) && __follow_down (&nd->mnt, &dentry)); When the first cycle is over, nd->mnt is already e2_mnt, and Dentry becomes e2_dentry. At this point because the member d_mounted value in E2_dentry is 1, so the first condition of the while loop satisfies, to continue calling the __follow_down () function, this function has been parsed before, when it returns ND->MNT becomes ntfs_mnt, The Dentry became ntfs_dentry. Since Ntfs_dentry has not been installed at this time, its member d_mounted should be 0 and the loop ends. The Path_lookup () function, initiated for "/dev", eventually returns the Dentry for the NTFS file system root directory. This is why the "/dev" itself and the ext2 installed on it are hidden. If you make an LS command for the "/dev" directory at this point, all files and directories are returned to the installed NTFS file system root directory.
6. Installing the root file system
With the foundation of chapter 4 above, it is not difficult to understand the installation of Linux under the root file system, because anyway, the process principle of installing a file system to an installation point in the VFS is the same.
This process is roughly the first to determine the source of the Ext2 file system to be installed, followed by determining the installation point of the Ext2 file system in the VFS, and then the specific installation process.
On the first question, the Linux 2.4.20 kernel has a lot of code to solve, confined to space, I do not want to go here to specify this process, probably remember it is to solve where to find the file system to install on it, Here we might as well assume that the root filesystem to be installed comes from the first partition hda1 of the primary hard disk.
On the second issue, the Linux 2.4.20 kernel installs the Ext2 file system from the hda1 onto the "/root" directory in the VFS directory tree. In fact, it is not important to install the Ext2 file system under the VFS directory tree (except for the root directory of the VFS), as long as the installation point is present in the VFS tree, and the kernel has no additional use for it. If the reader likes, you can create a "/windows" directory in the VFS, and then install the Ext2 file system as the root of the future user process, nothing is not possible. The crux of the problem is to set the root directory of the process and the current working directory, because after all, only the user process to care about the real file system, you know the author of this article is to be saved to the hard drive up.
Under Linux, the current working directory of a process is set by the system call Sys_chdir (). During initialization, when Linux installs the Ext2 file system on HDA1 to "/root", the current process, the/root process's current working directory (PWD), is set to the root of the Init_task file system by calling Sys_chdir ("ext2"). Recorded. Remember that at this point the root of the init_task process is still the dentry in Figure 3, which is the root directory of the VFS tree, which is obviously not possible, since all processes in the Linux world are derived from this init_task process, without exception, to inherit the root of the process, If so, it means that when a user process searches for a directory from the root directory, it actually starts at the root of the VFS, and in fact begins the search from the root file of the ext2. The solution to this contradiction is to rely on the following two functions of the system call after the Mount_root () function is called:
Sys_mount (".", "/", NULL, ms_move, NULL); Sys_chroot (".");
Its main function is to convert the root directory of the init_task process into the root directory of the installed Ext2 file system. Interested readers can study the process on their own.
So in the user space, more situation is only to see the VFS the tree of a leaf, and is still installed in the file system, in fact, the user space is still not visible. I think that VFS is more used by the kernel to implement its own functions, and in the way of system calls to provide user process use, as for the implementation of the different file system installation, is only one of the functions.
Application layer development does not need to care about the specific implementation of the VFS source code, only need to know the various types of VFS external file system interface functions.
Reference
http://www.ibm.com/developerworks/cn/views/linux/libraryview.jsp
From the network, reproduced please indicate the source: http://blog.csdn.net/suool/article/details/38172057