This article was reproduced from: http://blog.csdn.net/kidd_3/article/details/6909097
Technorati Tags: I/O subsystem
--------------------------------Split Line Start--------------------------------
Our analysis is based on the 2.6.32 and its subsequent kernels.
We always keep the data on Linux, and the data is either stored in the file system (e.g. ext3) or stored in a bare device. When we use this data, we access it through the abstraction of the file, the operating system submits the data we need, and we don't have to deal with the block device.
From, we can clear the see:
The I/O subsystem is a deeply hierarchical system where data requests are eventually reached from the user space to the disk and undergo complex data flow.
We need to figure out how IO is going to work to avoid misuse of IO and lead to design problems, especially if the IO is very dense for the developers who set up the drive or the associated designers. (http://blog.yufeng.info/archives/751)
In IBM Developworks, the 〈read system invocation anatomy is clear.
The processing of read system calls is divided into two parts: User space and kernel space processing. In this case, user-space processing simply falls into the kernel through a 0x80 interrupt, then calls its interrupt service routine, Sys_read, to enter the kernel processing process.
For the read system call in the kernel processing, as described, through the VFS, the specific file system, such as ext2, page buffer storage layer, the general block layer, IO scheduling layer, device driver layer, and equipment layer. Among them, VFS is mainly used to shield the lower layer specific file system operation differences, to provide a unified interface, it is because of this level, so you can abstract the device into a file. The specific file system defines its own block size, operation set, and so on. The purpose of introducing the cache layer is to improve IO efficiency. It caches some of the data on the disk and, when the request arrives, if the data is present in the cache and is up-to-date, it is passed directly to the user program, free from the operation of the underlying disk. The main task of the common block layer is to receive disk requests from the upper layer and eventually issue an IO request (BIO). The IO scheduling layer attempts to combine and sort the bio request of the generic block layer according to the set-up scheduling algorithm, callback the request processing function provided by the driver layer to handle the specific IO request. Driver layer driver for the specific physical device, it takes the IO request from the upper level, and according to the information specified in the IO request, by the device controller to the specific block device to send commands to manipulate the device to transfer data. The device layer is a specific physical device.
VFS Layer:
The kernel function Sys_read is the entry point of the read system call at that layer.
It extracts the corresponding file object from the current process descriptor based on the index specified by the file fd, and calls Vfs_read to perform a read operation.
Vfs_read will call the read function associated with the specific file to perform the reading operation, File->f_op.read.
The VFS then gives control to the Ext2 file system. (Ext2 here as an example, parsing)
Processing of the Ext2 file system layer
The ext2_file_operations structure knows that the above function will eventually call into the Do_sync_read function, which is a common read function of the system. So, Do_sync_read is the real entrance to the ext2 layer.
The Layer entry function Do_sync_read calls the function Generic_file_aio_read, which determines the access mode of this read request, if it is direct IO (Filp->f_flags is set the O_DIRECT flag, which does not go through Cache), the Generic_file_direct_io function is called, or the Do_generic_file_read function is called if the page cache is the way it is. It determines whether the page is cached on the page, and if so, copies the data directly to the user space. If not, the Page_cache_sync_readahead function is called to perform a read-ahead (check if it can be read-ahead), and it calls Mpage_readpages. If you still fail to hit (may not allow read-ahead or other reasons), jump directly to Readpage, execute mpage_readpage, and read data from disk.
In Mpage_readpages (read multiple pages at a time), it places contiguous blocks of disk in the same bio and slows down the submission of the bio until a discontinuous block is presented, then commits the bio directly and continues processing to construct another bio.
Page cache structure of the file
Figure 5 shows the page cache structure for a file. The files are divided into blocks of data that are cells of page size, which are organized into a multi-fork tree (called Radix tree). All leaf nodes in the tree are a page frame structure (struct page) that represents each page that is used to cache the file. The first 4,096 bytes of the file are saved on the leftmost page of the leaf layer (if the page size is 4096 bytes), the next page holds the second 4,096 bytes of the file, and so on. All intermediate nodes in the tree are the organization nodes that indicate the page on which the data resides on an address. The hierarchy of this tree can be from 0 to 6 layers, supporting file sizes from 0 bytes to four T bytes. The root node pointer of the tree can be obtained from the Address_space object associated with the file (the object is in the Inode object that exists and is associated with the file) (see Resources for more information on the structure of page cache).
MPage processing mechanism is the page cache layer to deal with the problem.
Universal Block Layer
The Generic_make_request function is called after the Mpage_submit_bio is executed at the end of the cache layer processing. This is the entry function for the generic block layer.
It transmits bio to the IO dispatch layer for processing.
IO scheduling layer
Combine and sort bio to improve IO efficiency. Then, call the device driver layer callback function, REQUEST_FN, go to the device driver layer processing.
Device driver Layer
The request function processes each bio in the requests queue separately, sending commands to the disk controller based on the information in bio. After processing is complete, call the completion function End_bio to notify the upper layer to complete.
--------------------------------Split Line End-----------------------------------
Reference Links:
Linux IO subsystem and file system read and write process