System calls
The main function of the operating system is to manage hardware resources and provide a good environment for application developers, but the various hardware resources of the computer system are limited, so that every process can be implemented safely. The processor is available in two modes: "User mode" and "kernel mode". Some security-prone operations are restricted to kernel-mode execution, such as I/O operations, modification of base-point register contents, and so on. The interface connecting user mode and kernel mode is called a system call.
Application code runs in user mode, and when an application needs to implement instructions in kernel mode, the calling request is sent to the operating system first. After the operating system receives the request, it executes the system call interface, which causes the processor to enter kernel mode. When the processor finishes processing the system call operation, the operating system will let the processor return to user mode and continue executing the user code.
The virtual address space of a process can be divided into two parts, kernel space and user space. Kernel space is the kernel code and data, while the process of user space is stored in the user program code and data. Whether it is kernel space or user space, they are all in virtual space and are mapped to physical addresses.
The process of implementing a file in an application is a typical system call procedure.
Virtual file system
An operating system can support a variety of different underlying file systems (such as NTFS, FAT, Ext3, EXT4), in order to provide a unified view of the file system for the kernel and user processes, Linux adds an abstraction layer between the user process and the underlying file system, the virtual file system File system, VFS), the process of all files through the VFS, the VFS to adapt to various underlying different file systems, complete the actual file operation.
In layman's parlance, the VFS is defined as the interface layer and adaptation layer of a common file system, on the one hand, provides a uniform set of access files, directories and other objects to the user process, on the other hand, and the different underlying file system to adapt. :
Virtual file System Main module
1, super Block (Super_block), used to save all the metadata of a file system, the equivalent of this file system repository, for other modules to provide information. So a super block can represent a file system. Any metadata modification to the file system modifies the Super block. The Super Block object is resident memory and is cached.
2. Catalog Item module, manage directory item of path. such as a path/home/foo/hello.txt, then the directory entries have home, foo, hello.txt. A block of directory items that stores information such as the inode number and file name of all the files in this directory. Inside it is a tree structure, and the operating system retrieves a file that starts at the root and parses all the directories in the path hierarchically until the file is located.
3, Inode module, management of a specific file, is the unique identity of the file, a file corresponding to an inode. The inode makes it easy to find the location of the file in the disk sector. The Inode module can also be linked to the Address_space module to find out if its own file data has been cached.
4. Open the file list module, which contains all the files that have been opened by the kernel. A file object that has already been opened is created in the kernel by an open system call, also called a file handle. The Open File List module contains a list, each list item is a struct struct file, and the information in the struct is used to represent the various state parameters of an open file.
5, File_operations module. Maintaining a data structure in this module is a collection of function pointers that contain all the system-calling functions that can be used, such as open, read, write, mmap, and so on. Each open file (a table entry in the Open File List module) can be connected to the File_operations module, allowing you to implement various operations on any open file through the system call function.
6. Address_space module, which represents a physical page in which a file has been cached in the page cache. It is a bridge between the page cache and the file system in the external device. If the file system can be understood as a data source, then Address_space can be said to be associated with the memory system and file system. We'll continue our discussion later in the article.
The interaction and logical relationships between the modules are as follows:
As can be seen from the diagram:
1, each module maintains a x_op pointer to its corresponding operation object x_operations.
2. The Super block maintains a s_files pointer to the "Open File List module", which is the linked list of all open files in the kernel, which is shared by all processes.
3, the directory operation module and the Inode module maintain a X_SB pointer to the super block, so that you can obtain the entire file system meta-data information.
4, the Directory item object and the Inode object each maintain the pointer to each other, can find the other's data.
5. Each file structure instance on the open Files list maintains a f_dentry pointer to its corresponding directory entry so that its corresponding inode information can be found based on the directory entry.
6. Each file struct instance on the open Files list maintains a f_op pointer to all the set of functions that can manipulate the file file_operations.
7, the Inode not only has the pointers associated with other modules, it is important that it can point to the Address_space module, so that its own file in-memory cache information.
8, Address_space maintains a tree structure to point to all the physical page structure page, while maintaining a host pointer to the inode to obtain the metadata of the file.
Process and virtual file system interaction
1. The kernel uses task_struct to represent the descriptor of a single process, which contains all the information that maintains a process. The TASK_STRUCT structure maintains a pointer to files (and a different pointer to the table entry on the Open file list) to point to the struct files_struct,files_struct that contains the file descriptor table and the open file object information.
2. The File descriptor table in File_struct is actually a pointer list of type file (and the same pointer to the table entry on the "Open File List"), which can support dynamic scaling, each pointing to one of the open files in the file list module in the virtual file system.
3, the file structure on the one hand can be linked from F_dentry to Catalog Item module and Inode module, get all the file-related information, on the other hand link file_operations submodule, which contains all the system call functions that can be used, This ultimately completes the operation of the file. thus , from the process to the process of the file descriptor table, and then associated to the open file list on the corresponding file structure, thus calling its executable system call function, implementation of various operations on the file.
Process vs File List vs Inode
1. Multiple processes can point to an open file object (a file list table entry), such as a parent process and a child process to share a file object;
2, a process can open a file multiple times, generate different file descriptors, each file descriptor points to a different file list table entries. But because it is the same file, the inode is unique, so these file list entries point to the same inode. File sharing (sharing the same disk file) is achieved by this method;
I/O buffers
Concept
Similar to the principle of caching (cache), in the I/O process, the speed of reading the disk is much slower than the memory read speed. Therefore, in order to speed up the processing of data, the read data needs to be cached in memory. The data cached in memory is the buffer cache (buffer cache), hereinafter referred to as "buffer".
Specifically, buffer is an area where data is transferred between devices that are not synchronized or that have different priority levels. On the one hand, through the buffer, you can make the process of mutual wait less, so that the slow device when reading data, fast device operation process does not break. On the other hand, you can protect your hard disk or reduce the number of network transmissions.
Buffer and Cache
Buffer and cache are two different concepts: cache is cached for buffer between CPU and memory, buffer is I/O cache, buffer for memory and hard disk, the cache is simply to speed up "read", while buffer is buffer "write", the former solves the problem of reading, Saves the data read from the disk, which resolves the write problem and saves the data that is about to be written to disk.
Buffer Cache and Page cache
Both the buffer cache and the page cache are designed to handle high-speed access during device and memory interactions. Buffer cache can be called a block buffer, and page cache can be called a page buffer. There is no concept of a page until the virtual memory mechanism is supported by Linux, so the buffer is in blocks for the device. After Linux uses virtual memory as a mechanism to manage memory, the page is the smallest unit of virtual memory management, starting with a paged mechanism to buffer memory. After Linux2.6, the kernel consolidates the two caches, and the pages and blocks can map to each other, while the page cache is oriented to virtual memory, and the block I/O cache buffer cache is a block-oriented device. It is important to emphasize that page caching and block caching are a storage system for a process that does not need to focus on the read and write of the underlying device.
The biggest difference between buffer cache and page cache is the granularity of the cache. Buffer cache is intended for blocks of file systems. The kernel's memory management component uses a higher level of abstraction than the file system's block: page pages, which handle higher performance. Therefore, the cache components that interact with memory management use page caching.
Page Cache
The page cache is file-oriented and memory-oriented. In layman's terms, it is located between the memory and the file buffer, and the file IO operation actually interacts only with the page cache, not directly with the memory. The page cache can be used in all file-based scenarios, such as a network file system, and so on. Page cache implements the level of mapping a file to a page through a series of data structures, such as Inode, address_space, and struct page:
1. The struct page structure flags a physical memory page, which can be used to position the page frame to a specific location in a file with Page + offset. The struct page also has the following important parameters:
(1) Flags flags to record whether the page is dirty, is being written back, and so on;
(2) Mapping points to the address space address_space, indicating that the page is a page cache page, and a file address space corresponds;
(3) Index records the page offset of this page in the file;
2. The inode of the file system actually maintains the block number of all block blocks of this file, which can be quickly positioned to the block number of the file system where the offset is located, and the sector area code of the disk. Similarly, the offset of the page on which the offset is located can be calculated by modulo the offset offset of the file.
3. The page cache cache component abstracts the concept of address space address_space as an intermediate bridge between file system and page caching. The address space address_space can easily get information about the file inode and the struct page by pointers, so it is convenient to locate the offset of a file in each component by: File byte offset--page offset-- File System block number block---disk sector area code
4. Page caching is actually a cardinality tree structure that organizes the contents of a file into a physical memory struct page. A file inode corresponds to an address space of address_space. And a address_space corresponds to a page cache cardinality tree. The relationship between them is as follows:
Address Space
Below we summarize all the features that have been discussed in Address_space. Address_space is a key abstraction in the Linux kernel, which is used as an intermediate adapter for file system and page caching to indicate the physical pages that a file has been cached in the page cache. Therefore, it is a bridge between the page cache and the file system in the external device. If the file system can be understood as a data source, then Address_space can be said to be associated with the memory system and file system.
As you can see from the diagram, the address space address_space is linked to the page cache cardinality tree and inode, so address_space can easily get the file Inode and page information through pointers. So how does the page cache implement the buffer function through Address_space? Let's look at the complete document reading and writing process.
Basic process of file reading and writing
Read the file
1, Process call library function to the kernel to initiate read file requests;
2. The kernel locates the Open File List table entry of the virtual file system by checking the file descriptor of the process;
3, call the file available system call function read ()
3, the Read () function through the File table link to the directory item module, according to the path of the file in the directory entry module, to find the inode of the file;
4, in the Inode, through the file content offset to calculate the page to read;
5, through the inode to find documents corresponding to the Address_space;
6. In Address_space, access the file's page cache tree to find the corresponding page cache node:
(1) If the page cache hit, then directly return the contents of the file;
(2) If the page cache is missing, then a page is missing exception, create a page cache page, while the inode to find the file of the page's disk address, read the corresponding page to populate the cache page, re-6th step to find the page cache;
7, the file content read successfully.
Write a file
The first 5 steps are consistent with the read file, and the page cache for the corresponding page in Address_space is present:
6. If the page cache hits, the file content modification is updated in the page cache page directly. The writing of the document is over. At this time the file modification is located in the page cache and is not written back to the disk file.
7, if the page cache is missing, then a page is missing exception, create a page cache page, while the inode to find the file of the page's disk address, read the corresponding page to populate the cache page. The cache page is hit at this point and the 6th step.
8. Pages in a page cache are marked as dirty if they are modified. Dirty pages need to be written back to a block of files on disk. There are two ways to write dirty pages back to disk:
(1) Manually call the sync () or fsync () system call to write the dirty page back
(2) The Pdflush process will periodically write dirty pages back to disk
Also note that dirty pages cannot be swapped out of memory, if the dirty page is being written back, then the write-back flag is set, when the page is locked, other write requests are blocked until the lock is released.
Read the file read and write process from the kernel file system