Exploring kernel-hierarchical methods to discuss Linux File Systems

Source: Internet
Author: User

A file system organizes data and metadata on a storage device. The Linux File System Interface is implemented as a hierarchical architecture, which separates the user interface layer, file system implementation, and the driver for operating the storage device. Another way to look at a file system is to regard it as a protocol. Network protocols (such as IP addresses) define the meaning of data streams transmitted over the Internet. Likewise, the file system provides the meaning of data stored on a specific media.

The Linux File System architecture is an interesting example of abstracting complex systems. By using a set of common API functions, Linux supports many file systems on many storage devices. For example,readFunction calls can read a certain number of bytes from a specified file descriptor.readThe function does not know the type of the file system, such as ext3 or NFS. It does not know the storage media of the file system, such as atapi disk, SAS disk, or SATA disk. However, whenreadWhen a function reads a file, the data is returned normally.

Although most file system code is in the kernel, the architecture shown in Figure 1 shows the relationship between the user space and the main components related to the file system in the kernel.

The user space contains applications (such as file system users) and the gnu c library (glibc), which provide user interfaces for file system calls (open, read, write, and close. The function of a system call interface is like a switch, which sends system calls from the user space to an appropriate endpoint in the kernel space.

VFS is the main interface of the underlying file system. This component exports a set of interfaces and abstracts them to various file systems. The behavior of each file system may vary greatly. There are two caches for File System Objects (inode and dentry ). They cache recently used file system objects.

Each file system implementation (such as ext2 and JFS) exports a set of common interfaces for VFS. The buffer cache caches requests between the file system and related Block devices. For example, read/write requests to underlying device drivers are passed through the buffer cache. This allows you to cache requests to reduce the number of accesses to physical devices and speed up access. Manage the buffer cache in the form of a recently used (LRU) list. Note: You can usesyncCommand to send requests in the buffer cache to the storage media (forces all unwritten data to be sent to the device driver and then to the storage device ).

Linux treats all file systems from the perspective of a set of common objects. These objects are superblocks, inode, dentry, and files. The super block describes and maintains the state of the file system on the root of each file system. Each object (file or directory) managed in the file system is represented as an inode in Linux. Inode contains all the metadata required to manage objects in the file system (including operations that can be performed on objects ). Another group of structures is called dentry, which is used to map names and inode. A directory cache is used to save the recently used dentry. Dentry also maintains the relationship between directories and files to support moving in the file system. Finally, the VFS file represents an open file (Save the status of the opened file, such as the write offset ).

VFS serves as the root layer of the file system interface. VFS records the currently supported file systems and mounted file systems.

When registering a new file system, the file system and its related information will be added to the file_systems list (Linux/include/Linux/fs. h ). This List defines supported file systems. Entercat /proc/filesystemsTo view the list.

Linux/include/Linux/fs. h

Struct file_system_type {const char * Name; int fs_flags; int (* get_sb) (struct file_system_type *, Int, const char *, void *, struct vfsmount *); struct dentry * (* mount) (struct file_system_type *, Int, const char *, void *); void (* kill_sb) (struct super_block *); struct module * owner; struct file_system_type * Next; // form a list of struct list_head fs_supers; struct lock_class_key s_lock_key; struct lock_class_key pair; struct exact sequence; struct lock_class_key I _alloc_sem_key ;};

Another structure maintained in VFS is the mounted file system (see below ). This structure provides the mounted file system (see Linux/include/Linux/mount. h ). It is linked to a super block structure.

Struct vfsmount {struct list_head mnt_hash; struct vfsmount * mnt_parent;/* FS we are mounted on */struct dentry * mnt_mountpoint;/* dentry of mountpoint */struct dentry * mnt_root; /* root of the mounted tree */struct super_block * mnt_sb;/* pointer to superblock super block */# ifdef config_smpstruct mnt_pcp _ percpu * mnt_pcp; atomic_t mnt_longterm; /* how many of the refs are longterm */# elseint mnt_count; int mnt_writers; # endifstruct list_head mnt_mounts;/* List of children, anchored here */struct list_head mnt_child; /* and going through their mnt_child */INT mnt_flags;/* 4 bytes hole on 64 bits arches without fsnotify */# ifdef config_fsnotify _ u32 bytes; struct hlist_head mnt_fsnotify_marks; # endifconst char * mnt_devname;/* Name of device e.g. /dev/DSK/hda1 */struct list_head mnt_list; struct list_head mnt_expire;/* link in FS-specific expiry list */struct list_head mnt_share; /* Circular List of shared mounts */struct list_head mnt_slave_list;/* List of slave mounts */struct list_head mnt_slave;/* slave List entry */struct vfsmount * mnt_master; /* slave is on Master-> mnt_slave_list */struct mnt_namespace * mnt_ns;/* containing namespace */INT mnt_id;/* Mount identifier */INT mnt_group_id; /* Peer Group Identifier */INT mnt_expiry_mark;/* true if marked for expiry */INT mnt_pinned; int mnt_ghosts ;};

A super block structure represents a file system. It contains the information required to manage the file system, including the file system name (such as ext2), file system size and status, block device reference and metadata (such as idle list and so on ). A super block is usually stored in the storage media. If the super block does not exist, you can create it in real time.

/Include/Linux/fs. h

Struct super_block {struct list_heads_list;/* keep this first */dev_ts_dev;/* search index; _ not _ kdev_t */unsigned chars_dirt; unsigned bytes; unsigned longs_blocksize; percent; /* max file size */struct file_system_type * s_type; const struct super_operations * s_op; // The super block operation, which is very important to const struct dquot_operations * dq_op; const struct quotactl_ops * s_qcop; const struct export_operations * s_export_op; unsigned longs_flags; unsigned longs_magic; struct dentry * s_root; struct failed; struct mutexs_lock; ints_count; success; # ifdef config_securityvoid * s_security; # endifconst struct xattr_handler ** s_xattr; struct list_heads_inodes;/* All inodes */struct hlist_bl_heads_anon;/* anonymous dentries for (NFS) exporting */# ifdef config_smpstruct list_head _ percpu * s_files; # elsestruct list_heads_files; # endif/* s_dentry_lru, s_nr_dentry_unused protected by dcache. c LRU locks */struct list_heads_dentry_lru;/* unused dentry LRU */random;/* # Of dentry on LRU */struct block_device * s_bdev; struct backing_dev_info * s_bdi; struct mtd_info * s_mtd; struct list_heads_instances; struct quota_infos_dquot;/* diskquota specific options */ints_frozen; struct; char s_id [32];/* informational name */void * s_fs_info; /* filesystem private info */fmode_ts_mode;/* granularity of C/M/atime in NS. cannot be worse than a second */u32 s_time_gran;/** the next field is for VFS * only *. no filesystems have any business * even looking at it. you had been warned. */struct mutex s_vfs_rename_mutex;/* kludge * // ** filesystem subtype. if non-empty the filesystem type field * In/proc/mounts will be "type. subtype "*/char * s_subtype;/** saved mount options for lazy filesystems using * generic_show_options () */Char _ RCU * s_options; const struct dentry_operations * s_d_op; /* default d_op for dentries */};

An important element in a super block is the definition of a super block operation. This structure defines a group of functions used to manage inode in the file system. For example, you can usealloc_inodeAllocate inodedestroy_inodeDelete inode. Availableread_inodeAndwrite_inodeRead and Write inode,sync_fsExecute file system synchronization. You can find it in./Linux/include/Linux/fs. h.super_operationsStructure. Each file system provides its own inode methods to implement operations and provide general abstraction to the VFS layer.

Struct super_operations {struct inode * (* alloc_inode) (struct super_block * SB); // assign inotify ID (* destroy_inode) (struct inode *); // destroy inode void (* dirty_inode) (struct inode *); int (* write_inode) (struct inode *, struct writeback_control * WBC); int (* drop_inode) (struct inode *); void (* evict_inode) (struct inode *); void (* put_super) (struct super_block *); void (* write_super) (struct super_block *); int (* sync_fs) (struct super_block * Sb, int wait); int (* freeze_fs) (struct super_block *); int (* unfreeze_fs) (struct super_block *); int (* statfs) (struct dentry *, struct kstatfs *); int (* remount_fs) (struct super_block *, int *, char *); void (* umount_begin) (struct super_block *); int (* show_options) (struct seq_file *, struct vfsmount *); int (* show_stats) (struct seq_file *, struct vfsmount *); # ifdef evaluate (* quota_read) (struct super_block *, Int, char *, size_t, loff_t); ssize_t (* quota_write) (struct super_block *, Int, const char *, size_t, loff_t); # endifint (* struct) (struct super_block *, struct page *, gfp_t );};

Inode indicates an object in the file system, which has a unique identifier. Each file system provides a method to map a file name to a unique inode identifier and inode reference. The figure shows a part of the inode structure and two related structures. Please noteinode_operationsAndfile_operations. These structures indicate the operations that can be performed on this inode.inode_operationsDefines the operations directly executed on inode, whilefile_operationsDefine methods related to files and directories (standard system calls ).

Inode struct:

Struct inode {/* RCU path lookup touches following: */break; uid_ti_uid; lead; const struct inode_operations * I _op; // Implementation of inode operations struct super_block * I _sb; spinlock_ti_lock; /* I _blocks, I _bytes, maybe I _size */unsigned inti_flags; struct values; unsigned longi_state; unsigned values;/* jiffies of first dirtying */struct values; /* backing Dev Io list */struct records;/* inode LRU list */struct records; Union {struct list_headi_dentry; struct records;}; unsigned longi_ino; atomic_ti_count; unsigned inti_nlink; delimiter; unsigned delimiter; u64i_version; delimiter; # ifdef _ delimiter; # endifstruct delimiter; struct delimiter; unsigned short I _bytes; struct delimiter; const struct file_operations * I _fop; /* former-> I _op-> methods for file operations */struct file_lock * I _flock; struct address_space * I _mapping; struct ing; # ifdef config_quotastruct dquot * I _dquot [maxquotas]; # endifstruct failed; Union {struct pipe_inode_info * I _pipe; struct block_device * I _bdev; struct cdev * I _cdev ;};__ u32i_generation; # ifdef config_fsnotify _ timeout; /* all events this inode cares about */struct records; # endif # ifdef config_ima/* protected by I _lock */unsigned inti_readcount;/* struct files open ro */# endifatomic_ti_writecount; # ifdef config_securityvoid * I _security; # endif # ifdef define posix_acl * I _acl; struct posix_acl * I _default_acl; # endifvoid * I _private;/* FS or device private pointer */};

Inode and the Directory cache respectively Save the recently used inode and dentry. Note that each inode in the inode cache has a corresponding dentry in the directory cache. You can find it in./Linux/include/Linux/fs. h.inodeAnddentryStructure.

In addition to the implementation of each file system (which can be found in./Linux/Fs), the bottom of the file system layer is the buffer cache. This component tracks read/write requests from file system implementations and physical devices (through device drivers. To improve efficiency, Linux caches requests and avoids sending all requests to physical devices. Caches the recently used buffers (pages) in the cache, which can be quickly provided to various file systems.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.