Introduction to the Linux Virtual file system (VFS)

Source: Internet
Author: User
Tags parent directory

1. Generic file model

Linux The kernel supports loading different file system types, and different file systems have their own way of managing files. The standard file system in Linux is the Ext file System family, and of course, developers cannot use different file access methods for each file system they are using, which runs counter to the operating system as an abstraction mechanism.

To support various file systems, theLinux kernel introduces an abstraction layer between the user process (or the C Standard library) and the specific file system, which is called the "virtual file system (VFS)".

The VFS provides a unified approach to manipulating files, directories, and other objects so that user processes do not have to know the details of the file system. On the other hand, the various methods provided byVFS must reach a compromise with the implementation of specific file systems, after all, it is not easy to manage dozens of types of file systems in a unified manner.

To do this, a generic file model is defined in theVFS to support a unified view of the objects (or files) in the file system.

Linux Support for the ext file System family is best because the VFS abstraction Layer is similar to the ext file system, which improves performance when dealing with the ext file system. Because the conversion between Ext and VFS will hardly lose time.

The kernel processing files is the inode, each file (and directory) has only one corresponding inode(struct inode instance), which contains metadata and pointers to file data, but The Inode does not contain a file name. All inode in the system has a specific number that uniquely identifies each inode. The filename can be changed at any time, but the index node is unique to the file and exists with the file's presence.

for each mounted file system,VFS generates a super block structure (struct super_block instance) in the kernel, which represents an already installed file system, The control information used to store the file system, such as file system type, size, all inode objects, Dirty inode lists, and so on.

The inode and Super block are actually mapped in the storage medium, where there are also super blocks and inodein the storage medium. However, because of different types of file system differences, the structure of the Super block and the inode varies. The role of VFS is to obtain a super block and inode node in a file system through a specific device driver, and then populate the information in the kernel with the struct super_block and struct Inode , which attempts to unify the management of different file systems.

Because the block device is slower (in memory terms), it may take a long time to find the inodeassociated with a file name. Linux uses the directory entry (dentry) cache to quickly access the results of previous find operations. After the VFS has read the data for a directory or file, a dentry instance (struct dentry) is created to cache the found data.

The main purpose of the dentry structure is to establish a link between the file name and the associated inode . A dentry object in a file system is placed in a hash table, and the dentry object that is no longer used is placed in an LRU linked list that the Super block points to, and at some point the older object is deleted to free memory.

Another simple mention of two data structures:

Each file system type that is registered to the kernel is represented by a struct FILE_SYSTEM_TYPE structure, and each file system type has a linked list that points to all the super blocks belonging to the file system of that type.

When a file system is mounted on a directory tree of the kernel file system, a mount point is generated to manage the information of the mounted file system. The mount point is represented by a struct VFSMOUNT structure, which is mentioned later in this structure.

The relationships of the above structures are broadly as follows:


Where the list of red fonts is the global linked list in the kernel.

2. Mounting the file system

In the user program, use the Mount system to mount the file system, using umount to unmount the file system accordingly. Of course, the kernel must support the type of file system that will be mounted, when the kernel boots or when installing kernel modules, you can register a specific file system type to the kernel, and the registered function is register_filesystem ().

The most common way to mount commands is mount [-t fstype] something somewhere

Where something is the device or directory that will be mounted,somewhere indicates where to mount. The- t option indicates the type of file system being mounted. Because the device that the something points to is a known device, that is, the file system type on it is OK, so the- t option must be set correctly to mount successfully.

Each mounted file system corresponds to an instance of the vfsmount structure.

because the loading process is to add mount points to the kernel file system tree, there is a parent-child relationship between the mount points, which is similar to the parent directory to the subdirectory. For example, my root file system type is SQUASHFS , loaded to the root directory "/", Generate a mount point, and then I mount the ramfs file system in the /tmp directory, The tmp directory in the root file system generates a mount point, which is a parent-child relationship. This relationship is stored in the struct vfsmount structure.

In, the root file system is Squashfs, the root directory is "/", then the/ tmp directory is created, mounted as Ramfs, and then created /tmp/usbdisk/ Volume9 and /tmp/usbdisk/volume1 two directories, and mount the /tmp/dev/sda1 and /tmp/dev/sdb1 two partitions on both of these directories. The /tmp/dev/sda1 device has the following files:

gccbacktrace/

----> Gcc_backtrace.c

---->man_page.log

---->readme.txt

Notes-fs.txt

Smb.conf

when the mount is complete, The relationship of the related data structures in the VFS.

The Mount system call entry point in the kernel is the sys_mount function, which copies the loaded options from the user state and then calls the do_mount () function to mount the What this function does is to read the Super block and Inode information through a particular file system, then build the VFS data structure and establish the relationship.

when you mount another file system on a directory in the parent file system, the original contents of the directory are hidden. For example,/tmp/samba/is non-empty, and then I mount the/tmp/dev/sda1 to/tmp/samba, then the/tmp/samba/directory can only see the files on the/TMP/DEV/SDA1 device until the device is uninstalled. The files in the original directory will not be displayed. This is achieved through the Mnt_mountpoint and Mnt_root two members in the struct vfsmount, which save the dentry of the mount point in the parent file system and the dentry of the mount point in the current file system, after unloading the current mount point, You can retrieve the Dentry object in the parent file system of the mounted directory.

3. File system-related information in a process
struct Task_struct {.../* filesystem information */struct fs_struct *fs;/* Open File information */struct files_struct *fil es;/* namespaces */struct Nsproxy *nsproxy;

where the FS member points to the file system information for the current working directory of the process. The files member points to the information for the file that the process opens. Nsproxy points to the namespace in which the process resides, which contains the virtual file System namespace.

as you can see, fs contains dentry information. The files points to a series of struct file structures, where the The span style= "font-family:arial" >struct path structure is used to struct file and vfsmount and dentry . struct file holds the feature information of the files seen by the kernel, and the list of files opened by the process is stored in task_struct->files->fd_array[] array and fdtable .

TASK_STRUCT structure also holds the file descriptor of its open file fd information, which is required by the user process, the user process after opening a file by file name, the file name is not useful, after the operation is the file descriptor fd , in the kernel, fget_light () function is used for integer FD to find the corresponding struct file object. Because each process maintains its own fd list, different processes maintain FD Values can be duplicated, such as standard input, standard output, and standard error fd 0 , 1 , 2 .

The mapping member of the struct file points to the address space mapping of the file-related inode instance, typically set to inode->i_mapping. When reading and writing a file, each time from the physical device to obtain the file, the speed is very slow, in the kernel to each file allocation of an address space, is actually the data cache area of the file, while reading and writing files only the cache, the kernel has a corresponding synchronization mechanism to write dirty pages back to the physical device. Super_block maintains a linked list of dirty inode .

The f_op member of the struct file points to a struct file_operations instance that holds pointers to all possible file operations, such as read/write/ Open and so on.

struct File_operations {struct module *owner;loff_t (*llseek) (struct file *, loff_t, int), ssize_t (*read) (struct file *, Char __user *, size_t, loff_t *), ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); ssize_t (*aio_re  AD) (struct KIOCB *, const struct IOVEC *, unsigned long, loff_t); ssize_t (*aio_write) (struct KIOCB *, const struct IOVEC *, unsigned long, loff_t); int (*readdir) (struct file *, void *, filldir_t); unsigned int (*poll) (struct file *, struct P Oll_table_struct *); int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long); Long (*UNLOCKED_IOCTL) (Stru CT file *, unsigned int, unsigned long), Long (*compat_ioctl) (struct file *, unsigned int, unsigned long); int (*mmap) (str UCT file *, struct vm_area_struct *), int (*open) (struct inode *, struct file *), int (*flush) (struct file *, fl_owner_t I d); Int (*release) (struct inode *, struct file *); int (*fsync) (struct file *, struct dentry *, int datasync);
4. Packaging the file system

after the file system directory has been created, the directory can be packaged by a file system-specific tool, which is the file system. For example , the packaging tool for the SQUASHFS file system is MKSQUASHFS. In addition to packaging, the packaging tool generates super block and Inode node information for a specific file system, resulting in a file system image that can be interpreted and mounted by the kernel.

Appendix VFS Related Data Structures

Inode:

The struct Inode {/* Global hash list */struct hlist_nodei_hash;/* may handle different linked lists depending on the state of the Inode (inode_unused/inode_in_use/super_block-> Dirty) */struct list_headi_list;/* super_block->s_inodes linked list of nodes */struct list_headi_sb_list;/* inode corresponding to the dentry linked list, Multiple dentry may point to the same file */struct list_headi_dentry;/* inode number */unsigned longi_ino;/* The number of processes accessing the Inode */atomic_ti_count;/* The number of hard links in the Inode */unsigned inti_nlink;uid_ti_uid;gid_ti_gid;/* Inode represents the size of the device number */dev_ti_rdev;u64i_version;/* file when the device file is */loff_ti_size in bytes, #ifdef __need_i_size_orderedseqcount_ti_size_seqcount, #endif/* Last access time */struct timespeci_ atime;/* the last time the Inode data was modified */struct timespeci_mtime;/* the time the inode itself was last modified */struct timespeci_ctime;/* the size of the inode in block */ blkcnt_ti_blocks;unsigned inti_blkbits;unsigned Short i_bytes;/* file access */umode_ti_mode;spinlock_ti_lock;/* I_blo Cks, I_bytes, maybe i_size */struct mutexi_mutex;struct rw_semaphorei_alloc_sem;/* inode operation */const struct INODE_ operations*i_op;/* file Operation */const struct file_operations*i_fop;/* inode belongs to Super_blocK */struct super_block*i_sb;struct file_lock*i_flock;/* inode address space mapping */struct address_space*i_mapping;struct Address_ Spacei_data, #ifdef config_quotastruct Dquot*i_dquot[maxquotas]; #endifstruct list_headi_devices;union {struct PIPE_ Inode_info*i_pipe;struct block_device*i_bdev;struct Cdev*i_cdev;}; __u32i_generation, #ifdef Config_fsnotify__u32i_fsnotify_mask; /* All Events this inode cares about */struct hlist_headi_fsnotify_mark_entries; /* fsnotify Mark Entries */#endif #ifdef config_inotifystruct list_headinotify_watches; /* Watches on this inode */struct mutexinotify_mutex;/* protects the watches list */#endifunsigned longi_state;unsigned lo ngdirtied_when;/* jiffies of first dirtying */unsigned inti_flags;atomic_ti_writecount; #ifdef config_securityvoid*i_ Security, #endif #ifdef config_fs_posix_aclstruct posix_acl*i_acl;struct posix_acl*i_default_acl; #endifvoid *i_ Private /* fs or device private pointer */};

Super_block:

struct Super_block {/* global linked list element */struct list_heads_list;/* the device on which the underlying file system resides */dev_ts_dev;/* the length of each piece in the file system */unsigned Longs_ blocksize;/* length of each piece in the file system (Base 2 logarithm) */unsigned chars_blocksize_bits;/* need to write back to disk */unsigned chars_dirt;unsigned long longs_maxbytes;/* Max File Size *//* filesystem type */struct file_system_type*s_type;/* Super block operation method */const struct super_operations*s _op;struct Dquot_operations*dq_op; struct quotactl_ops*s_qcop;const struct export_operations *s_export_op;unsigned longs_flags;unsigned longs_magic;/* Dentry */struct dentry*s_root;struct rw_semaphores_umount;struct Mutexs_lock;ints_count;ints_need_sync of the global root directory; Atomic_ts_active, #ifdef config_securityvoid *s_security; #endifstruct xattr_handler**s_xattr;/* Super block management of all List of inode */struct list_heads_inodes;/* all inodes *//* dirty inode list */struct list_heads_dirty;/* dirty inodes */struct list_ heads_io;/* parked for writeback */struct list_heads_more_io;/* parked for more writeback */struct hlist_heads_anon;/* ano Nymous Dentries for (NFs) exporting *//* file structure of the linked list, all open files on that super block */struct list_heads_files;/* S_dentry_lru and s_nr_dentry_unused are protected B Y dcache_lock *//* no longer used dentry LRU chain list */struct list_heads_dentry_lru;/* unused dentry LRU */ints_nr_dentry_unused;/* # of D Entry on LRU */struct block_device*s_bdev;struct mtd_info*s_mtd;/* The node of the Super Block list of the same file system type */struct list_heads_instances; struct quota_infos_dquot;/* diskquota specific options */ints_frozen;wait_queue_head_ts_wait_unfrozen;char s_id[32]  ;/* Informational name */void *s_fs_info;/* Filesystem Private Info */fmode_ts_mode;/* * The next field is for VFS *only*. No Filesystems has any business * even looking at it. You had been warned.   */struct Mutex s_vfs_rename_mutex;/* kludge *//* granularity of C/m/atime in NS.  Cannot be worse than a second */u32 s_time_gran;/* * Filesystem subtype.  If Non-empty the FileSystem type field * in/proc/mounts would be ' type.subtype ' */char *s_subtype;/* * Saved mount Options For lazy filesystems using * generic_show_optiONS () */char *s_options;}; 

dentry :

struct Dentry {atomic_t d_count;unsigned int d_flags;/* protected by D_lock */spinlock_t d_lock;/* per dentry lock *//* the D  Entry is a mount point */int d_mounted;/* file belongs to the inode */struct inode *d_inode;/* * The next three fields is touched by __d_lookup. Place them this they all fit in a cache line. *//* Global dentry Hash */struct hlist_node d_hash;/* lookup Hash list *//* parent directory Dentry */struct dentry *d_parent;/* Parent Direc Tory *//* file name, for example,/tmp/a.sh, file name is a.sh */struct qstr d_name;/* Dirty dentry list node */struct list_head d_lru;/* LRU list *//* * d _child and D_RCU can share memory */union {struct List_head d_child;/* child of the parent list */struct rcu_head d_rcu;} d_u */* The node list of the dentry in the Dentry subdirectory */struct list_head d_subdirs;/* Our children *//* hard links use several different names to represent the same file when used to connect each dentry */struct List_head d_alias;/* inode alias list */unsigned long d_time;/* used by d_revalidate */const struct dentry_operations *d_o p;/* belongs to Super_block */struct Super_block *d_sb;/* The root of the Dentry tree */void *d_fsdata;/*Fs-specific Data *//* If the file name consists of a small number of characters, in the saved here, the accelerated Access */unsigned char d_iname[dname_inline_len_min];/* small names/}; 

Vfsmount:

struct Vfsmount {/* global hash */struct list_head mnt_hash;/* The mount point of the parent file system */struct Vfsmount *mnt_parent;/* FS We is mounted on * * Dentry of the mount point in the parent file system */struct dentry *mnt_mountpoint;/* dentry of Mountpoint *//* the current file system Dentry */struct dentry *mnt of the mount point _root;/* root of the mounted tree *//* point super_block */struct super_block *mnt_sb;/* Pointer to superblock *//* sub-mount under the mount point Point list */struct list_head mnt_mounts;/* List of children, anchored here *//* the child mount point of the parent file system node */struct list_head mnt_child;/* And going through their mnt_child */int mnt_flags;/* 4 bytes hole on 64bits arches *//* mounted devices such as/dev/dsk/hda1 */const Char *mnt_devname;/* the list node in the virtual file System namespace */struct list_head mnt_list;struct list_head mnt_expire;/* link in fs-specific expiry li St */struct list_head mnt_share;/* Circular list of GKFX mounts */struct list_head mnt_slave_list;/* list of slave mount s */struct list_head mnt_slave;/* slave list entry */struct vfsmount *mnt_master;/* slave are on master->mnt_slave_list The virtual file System namespace where the *//* resides */strucT mnt_namespace *mnt_ns;/* containing namespace */int mnt_id;/* mount identifier */int mnt_group_id;/* peer group IdentiFi ER *//* * We put Mnt_count & Mnt_expiry_mark at the end of a struct vfsmount * to let these frequently modified fields I n a separate cache line * (so, reads of mnt_flags wont ping-pong on SMP machines) */atomic_t Mnt_count;int Mnt_expiry_ mark;/* true if marked for expiry */int mnt_pinned;int mnt_ghosts; #ifdef config_smpint *mnt_writers; #elseint mnt_writers; #endif};

Introduction to the Linux Virtual file system (VFS)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.