File system VFS Data structure (super-block inode dentry file) (Collect collation)

Source: Internet
Author: User
Tags disk usage

Linux virtual file system four main objects:

1) Super Block

2) Index node (inode)

3) Catalog entry (dentry)

4) Document object (file)

A process that refers to various objects as it operates on a file is as follows:

by Task_struct get files_struct, then through the file descriptor (int fd) to obtain the corresponding file object (**FD), then get the Directory item object (Dentry), and finally get the Index node object (inode), The object includes actions related to the operation of the file, which are inherited from the Super object block. It is related to the specific file system.

First, Super Block :

Super Block Object (Super_block): Stores control information for an installed file system (file system status, file system type, block size, number of chunks, index node count, dirty flag, action method), which represents an installed file system, each time an actual file system is installed, The kernel reads some control information from a specific location on the disk (the disk's Super block location) to populate the in-memory super-block object. An installation instance and a super block object one by one correspond. The Super Block records the file system type to which it belongs through a domain S_type in its knot. Even if two identical file systems (File_system_type) are installed, there will be two super blocks (two disks and memory). Disk

Super Block Main method: The method set mainly includes the operation of the inode and the operation of Super_block.

Alloc_inode: Initializing an Index node object

Read_inode: Reads the index node from disk and populates the in-Memory Index node object

Write_inode: Writes the given index node to disk, which represents the actual creation of a file

Write_super: Update the Super Block object to disk

A super block corresponds to a filesystem (the file system type already installed, such as ext2, here is the actual file system, not the VFS). File systems are used to manage the data formats and operations of these files, system files have their own file system files, and for different disk partitions there can be different file systems. Then a super block for a standalone file system. Save the file system type, size, state, and so on.

(" file System" and "File system type" are not the same!) A file system type can include many file systems i.e. many Super_block)

Since we know that there are different super_block for different file systems, the operation of the different super_block must be different, so we can see the abstract struct structure described above in the Super_block structure below ( For example, the following: struct super_operations):

s_list: Pointer to the Super block list, this struct list_head is a familiar structure, which is actually the prev and next fields for connecting relationships.

The kernel of the structure of the processing is fastidious (also said in the kernel protocol stack), the kernel alone with a simple structure of all the super_block are linked together, but this structure is not super_block itself, because its data structure is too large, inefficient, all just use



List_head prev;

List_head Next;


Such a structure to link the s_list in the Super_block, then after traversing to s_list, directly read Super_block such a long block of memory, you can put this

Super_block read it straight in! This is very quick and convenient! This is why s_list must be placed in the first field.

S_dev: Contains the block device identifier for the specific file system. For example, for/DEV/HDA1, its device identifier is 0x301

s_blocksize: Data block size in file system, in byte units

s_blocksize_bits: The upper size occupies a number of bits, for example 512 bytes is 9 bits

s_dirt: Dirty bit, identifies if super block is modified

s_maxbytes: Maximum allowable file size (in bytes)

struct File_system_type *s_type: File system type (which is the current file system type?). EXT2 or FAT32)

To differentiate between "file system" and "File system type"! A file system type can include many file systems that are many super_block, which will be said later!

struct super_operations *s_op: A collection of functions that point to a specific file system for a super block operation

struct dquot_operations *dq_op: A collection of functions that point to a specific file system for a quota operation

struct Quotactl_ops *s_qcop: A method for configuring disk quotas to handle requests from user space
s_flags: Installation identification

s_magic: Identities that differ from other file systems

s_root: directory entry that points to the specific file system installation directory

s_umount: Synchronization of Super Block read and write

s_lock: Lock flag bit, if this bit is placed, other processes cannot operate on the Super block

s_count: Usage count for the Super block

s_active: Reference count

s_dirty: Modified index node inode form a linked list, a file system has a lot of inode, some inode node content will be modified, then will be recorded first, and then write back to disk.

s_locked_inodes: The linked list formed by the index node to be synchronized

s_files: All of the linked lists that have open files, this file and the actual process-related

S_bdev: Block device that points to file system installation

u: U union domain includes super block information belonging to the specific file system

s_instances: The specific meaning will be said later! (The same type of file system connects all the Super_block through this sub-pier.)

s_dquot: Disk quota-related options

Second, index node Inode:

Index node Object (inode): stores information about files and directories (and the file itself is a two different concept.) It contains information such as file size, owner, creation time, disk location, file operation method, dirty flag, and so on, representing a real file, which is saved on disk. When a file is first accessed, the kernel assembles the corresponding index node object in memory to provide the kernel with all the information necessary to operate on a file. Disk

The main methods of the index node include the creation of the inode, deletion of the directory, symbolic connection, and so on.

int Create (struct inode *dir, struct dentry *dentry, int mode): Called by Create or open system to create a new index node for the Dentry object.

struct Dentry * Lookup (struct inode *dir, struct dentry *dentry): The index node is found by the specified directory entry.

Mkdir (dir, dentry, mode): Called by the system call Mkdir to create a new directory

What is saved is actually some information about the actual data, which is called "metadata" (that is, the description of the file attributes). For example: File size, device identifier, user identifier, user group identifier, file mode, extended attributes, file read or modified timestamp, number of links, pointers to disk chunks that store the content, file classification, and so on.

(Note data partitioning: Meta data + data itself)

Also note : There are two types of inode, one is the inode of the VFS, and one is the inode of the specific file system. The former is in memory and the latter is on disk. So each time the inode redeployment in the disk is populated with the inode in memory, this is the use of the disk file Inode.

Notice how the Inode is generated : the size of each inode node, typically 128 bytes or 256 bytes. The total number of inode nodes, given at the time of formatting (modern OS can dynamically change), usually set an inode every 2KB. The general file system rarely has less than 2KB of files, so the reservation in accordance with the 2KB, the general inode is not finished. Therefore, the inode will have a default number when the file system is installed, and the latter will change according to the actual needs.

Note inode number : The inode number is unique and represents a different file. In fact, in the internal Linux, access to files are through the inode number, the so-called file name is only easy for users to use. When we open a file, first, the system finds the inode number corresponding to the file name, and then, through the inode number, the inode information, and finally, the inode to find the file data block, can now process the file data.

Inode relationship to a file: When a file is created, an inode is assigned to the file. An inode only corresponds to one actual file, and one file will have only one inode. The maximum number of inodes is the maximum number of files.

Explain some of the fields:

I_hash: Pointing to the hash list pointer, for the INODE hash table, the following will say

i_list: Pointer to the index node link table for the connection between the inode, which will say

i_dentry: Point to the Directory Necklace table pointer, note that a inodes can correspond to multiple dentry, because an actual file may be linked to another file, then there will be another dentry, This list is linked to all the dentry associated with this inode.

i_dirty_buffers and i_dirty_data_buffers: Dirty data buffers

I_ino: Index node number, each inode is unique

i_count: Reference count

I_dev: If the inode represents a device, then it is the device number

i_mode: Type of file and access rights

I_nlink: Number of files linked to the node (number of hard links)

i_uid: File owner designator

I_gid: The group label of the file

i_rdev: Actual device identification

Note the difference between I_dev and I_rdev: If an ordinary file, such as a disk file, is stored on a disk, then I_dev represents the disk number that holds the file, but if this is a special file such as the disk itself (because all devices are considered file-handling), then I_ Rdev represents the actual disk number of this disk.

i_size: The size, in bytes, of the file represented by the Inode

i_atime: File Last access time

i_mtime: Last modified time for file

i_ctime: Inode Last Modified time

i_blkbits: Block size, byte units

i_blksize: block size, bit unit

i_blocks: Number of blocks in the file

i_version: Version number

i_bytes: The number of bytes in the last block in a file

I_sem: points to the semaphore structure for synchronous operation

I_alloc_sem: Protect IO operations on inode from being interrupted by another

I_zombie: Zombie inode Signal Volume

i_op: Index node operation

I_FOP: File operations

I_SB: The super-block pointer to the file system to which the inode belongs

i_wait: Point to index node wait queue pointer

i_flock: File chain table

Note the following: Address_space does not represent an address space, but is used to describe pages in the page cache. A file corresponds to a address_space, a address_space, and an offset can determine the page in a page cache.

i_mapping: Indicates to whom the page is requested

i_data: A page that is read and written by the Inode

i_dquot: Disk quotas for Inode

About disk quotas: In a multitasking environment, the disk usage limits for each user are mandatory and play a fair role.

There are two types of disk quotas: block limit and inode limit, and for a special file system, the quota mechanism used is the same, so the limit operation function

Put it in the Super_block and ok!.

i_devices: Device Chain list. A linked list of devices that share the same driver.

i_pipe: Point to Pipe file (used if file is a pipe file)

I_bdev: pointing to the block device file pointer (used if the file is a block device file)

I_cdev: pointing to the character device file pointer (used if the file is a character device)

i_dnotify_mask: Directory Notification event mask

i_dnotify: for directory notifications

i_state: Status ID of the index node: i_new,i_lock,i_freeing

i_flags: The installation identity of the index node

I_sock: True if it is a socket file

i_write_count: Record How many processes open this file in write mode

i_attr_flags: File creation identity

i_generation: Reserved

u: specific inode information

note the four linked lists that manage the Inode :

inode_unused: Link up inode that is not currently in use (via i_list domain link)

Inode_in_use: The inode currently in use is linked (via the I_list domain link)

S_dirty in Super_block: Link all modified inode, this field in Super_block (linked by i_list domain)

Inode_hashtable: note In order to expedite the search efficiency of the inode, the inode and the dirty inode being used will also be placed in a hash structure such as inode_hashtable,

However, the hash values of the different inode may be equal, so the inode with equal hash value is connected by this I_hash field.

Third, Catalog Entry :

Catalog Item Object (Dentry): It represents a catalog item (including the directory object corresponding to the index node, the subdirectory linked list, the parent directory item object, the directory item object linked to its sibling directory, the use count, the cache flag), is an integral part of the path (note: Each component in the path is represented by an index node object). The object is stored in memory only. The concept of introducing catalog items is primarily for the purpose of finding files conveniently. Each component of a path, whether it is a directory or an ordinary file, is a directory item object. For example, in path/home/source/test.c, directory/, home, source, and file test.c all correspond to one directory entry object. Unlike the previous three objects, the Catalog item object does not have a corresponding disk data structure, and the VFS parses the path name into the directory item object one by one in the process of traversing the pathname, and uses the caching mechanism to improve the speed of the lookup (the Catalog item object corresponds to the index node object one by one, that is, it also represents a file, The file can be a normal file or a directory file, etc.). Memory

There are three states of a catalog item object: Used, unused, and negative states

Used: A used directory entry corresponds to a valid index node (d_inode points to the corresponding index node) and indicates that the object exists with one or more users (D_count is positive). A directory entry is in use, meaning that it is being used by VFS and points to a valid index node and therefore cannot be freed.

Unused: An unused directory entry corresponds to a valid index node (d_inode points to the corresponding index node), but it should be indicated that VFS is not currently using it (D_count is 0). The catalog item object still points to a valid object and is left in the cache for use when it is needed. Because this catalog item is not destroyed prematurely, you do not have to recreate it when you need it later, which makes the path lookup faster. However, if you want to reclaim memory, you can destroy unused directory entries.

Negative state: There is no valid index node for the corresponding. Because the index node has been deleted or the path is incorrect, the catalog entries remain so that the subsequent path queries can be resolved quickly.

The above three types of catalog items are cached in the catalog item cache, and the hash table is cached. In addition, if the directory entry is cached and is in use, the corresponding index node is also cached.

A directory entry is a logical attribute of a description file that exists only in memory and does not have a description on the actual disk, rather, a directory entry cache that exists in memory, and is designed to improve lookup performance. Note that both the folder and the final file belong to the catalog item, and all the catalog items together form a large directory tree. For example: Open a file/home/xxx/yyy.txt, then/, home, XXX, Yyy.txt is a directory entry, the VFS at the time of the search, based on a layer of directory entries to find the corresponding inode for each directory entry, then follow the directory entry to find the final file.

Note: The directory is also a file (so there is also a corresponding inode). Opening the directory is essentially opening the directory file.

Explain some of the fields:
d_count: Reference count

d_flags: Directory item cache identifier, dcache_unused, dcache_referenced, etc.

D_inode: inode associated with this catalog item

d_parent: directory entry for parent directory

D_hash: The kernel uses dentry_hashtable to manage Dentry, dentry_hashtable is a linked list of List_head, and after a dentry is created, it is

The D_hash link enters the linked list of corresponding hash values.

d_lru: Linked list of recently unused catalog items

d_child: Directory entries are added to the parent directory by this D_subdirs

d_subdirs: All children in this directory linked list header

D_alias: An effective dentry must be associated with an inode, but an inode can correspond to multiple dentry, because one file can be linked to other files, so This dentry is linked to the I_dentry linked list in its own inode structure through this field. (as mentioned in the Inode)

d_mounted: The number of file systems installed in this directory! Note that a file directory can have different file systems!

d_name: directory entry name

d_time: Time to become valid again! Note that the dentry is valid if the operation succeeds, otherwise it is invalid.

d_op: Catalog Item operation

D_SB: The super block of the file system to which this directory entry belongs

d_vfs_flags: Some signs

d_fsdata: File System Private Data

d_iname: storing short file names

Some explanations: An effective dentry structure must have an inode structure, because a directory item either represents a file or represents a directory, and the directory is actually a file. So, as long as the dentry structure is valid, its pointer d_inode must point to an inode structure . But the inode can correspond to multiple

Dentry, the above has been said two times.

Note: The whole structure is really a tree.

Four, File Object :

File object: Is an in-memory representation of an open file (including the corresponding directory item object, usage count, access mode, current offset, action method, and so on), which is used primarily to establish the correspondence between the process and the files on disk. It was created by Sys_open () and destroyed by Sys_close (). The relationship between a file object and a physical file is a bit like the relationship between a process and a program. When we stand in the user space to look at the VFS, we are like just dealing with the file object, without caring about the Super block, index node, or directory entry. Because multiple processes can open and manipulate the same file at the same time, there may be multiple corresponding file objects for the same file. A file object represents an open file only in the process view, which in turn points to the Directory item object (which in turn points to the index node). A file corresponding to a file object may not be unique, but its corresponding index node and directory item object is undoubtedly unique. Memory

File Operation Method:

Llseek: Update offset

Read, write, open, Mmap, Aio_read, Fsync

Files_struct: A collection of file objects that the process opens. Although the file descriptor (int) is obtained by using open, it corresponds to one by one (the file descriptor is the subscript of the struct file * * * FD array in file_struct). The structure is in the task_struct.

Note The file object describes a file that the process has opened. Because a file can be opened by multiple processes, a file can have multiple file objects. But because the file is unique, then the inode is unique, the catalog item is also set!

The process is actually manipulating files through file descriptors, noting that each file has a 32-bit number that represents the next read and write byte position, which is called the file location. In general, after opening a file, open the bit

Explain some of the fields:

f_list: All the open files form the linked list! Note that all open files of a file system are linked to the S_files list in the Super_block by this link!

f_dentry: Dentry associated with this file

f_vfsmnt: The installation point of the file in this file system

f_op: File operation, the I_fop file operation in the associated inode of this file initializes the F_op field when the process opens the file.

f_count: Reference count

f_flags: The identity specified when opening a file

f_mode: access mode for files

f_pos: Offset from the relative beginning of the current file

unsigned longf_reada, F_ramax, F_raend, F_ralen, F_rawin: Read-ahead flag, maximum number of pages to read, last read-ahead file pointer, number of read-ahead bytes, and number of pre-read pages

f_owner: Record a process ID and signal to the ID process when something is sent

f_uid: User ID

f_gid: Group ID

f_error: write operation error code

f_version: Version number, when F_pos changes, version increments

private_data: Private data (file system and driver use)

focus on explaining some important fields :

First, F_flags, F_mode, and F_pos represent the control information that this process currently operates on this file. This is important because for a file that can be opened simultaneously by multiple processes, the operation of this file is asynchronous for each process, so this three field is important.

Second: For reference counting F_count, when we close a process of a file descriptor, in fact, is not really closed file, just F_count minus one, when f_count=0, will really go to close it. For dup,fork these operations, will make f_count increase, specific details, and later.

The third: F_op is also very important! Is the operation structure that involves all the files. For example, a user using read will eventually invoke read operations in File_operations, whereas the file_operations struct is not necessarily the same for different file systems. Inside an important operation function of the release function, when the user executes close, in fact, in the kernel is the implementation of the release function, this function will only f_count minus one, which explains the above, the user close a file is actually the f_count minus one. Only the reference count is reduced to 0 to close the file.

Note: for "in use" and "unused" file objects are managed using a two-way linked list, respectively.

Note that the file above is only for a document, for a process (user) can process multiple files at the same time, so need another structure to manage all the files!

That is, the user opens the file table--->files_struct

Explain some of the fields:

count: Reference count

File_lock: Lock, protect the following fields

Max_fds: The maximum number of current file objects

max_fdset: Maximum number of file descriptors

next_fd: The largest assigned file descriptor +1

FD: Pointer to an array of file object pointers, typically pointing to the last field Fd_arrray, when the number of files exceeds Nr_open_default, an array is reassigned, and then points to the new array pointer!

close_on_exec: File descriptor to be closed when exec () is executed

Open_fds: Pointer to open file descriptor

close_on_exec_init: A file descriptor initialization value that needs to be closed when exec () is executed

open_fds_init: File description Fu Yan value Collection

Fd_array: Initializing an array of file object pointers

Note that the above file and Files_struct record information about the files associated with the process, but for the process itself, some of its own information is expressed in terms of what is involved in the fs_struct structure.

Explain some of the fields:

Count: Reference count

Lock: Protection lock

Umask: Default file access permissions when opening a file

Root: Directory of the process

PWD: The current execution directory of the process

Altroot: User-Set Replacement root directory

Note : These three directories are not necessarily in the same file system when actually running. For example, the root directory of a process is usually the Ext file system installed on the "/" node, and the current working directory may be a file system installed in/etc, and the replacement root directory can also be in a different file system.

ROOTMNT,PWDMNT,ALTROOTMNT: Corresponds to the above three mounting points.

It is known that the process is using a domain files_struct files in task_struct to understand the file object it is currently opening, whereas the file descriptor that we typically call is actually the index value of the file object array that the process opened. The file object finds its corresponding Dentry object through the domain F_dentry, and the Dentry object's domain D_inode finds its corresponding index node (through the index node and can get the Super block information, it can get the method of the final operation file, In the case of an open file, this process is used, which establishes the association of the file object with the actual physical file. Finally, it is also important that the file object's list of file operation functions is obtained through the domain I_fop of the index node, and I_FOP is eventually initialized by the struct super_operations *s_op.

File system VFS Data structure (super-block inode dentry file) (Collect collation)

Related Article

E-Commerce Solutions

Leverage the same tools powering the Alibaba Ecosystem

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.