In learning the principle of block devices, I most relate to the data flow of the block device, from the application call read or write, the data in the kernel in the end how to flow, processing it? And then how to get to the specific physical equipment? The following is an analysis of a block device data flow with the cache function.
1. The user-state program opens the specified block device through open (), falls into the kernel through the Systemcall mechanism, executes the blkdev_open () function, and registers the open on the file system method (File_operations). In the Blkdev_open function called the Bd_acquire () function, the Bd_acquire function completes the file system Inode to block device Bdev conversion, the specific conversion method through the hash lookup implementation. After the bdev of the specific block device is obtained, the Do_open () function is called to complete the operation of the device opening. The Open method to block device driver registration is called in the Do_open function, which is called as follows: Gendisk->fops->open (Bdev->bd_inode, file).
2, the user program through read, write function to read and write to the device, the file system will call the appropriate method, usually call the following two functions: Generic_file_read and Blkdev_file_write. In the process of reading and writing, various strategies are used to analyze the reading process first.
3, the user state calls the Read function, the kernel executes generic_file_read, if not direct IO mode, then call the Do_generic_file_read->do_generic_mapping_read () function directly , in Do_generic_mapping_read (function in filemap.c) functions, first find whether the data hit the cache, if hit, then directly return the data to the user state; otherwise through address_space->a_ops- The >readpage function initiates a real read request. In the Readpage function, construct a buffer_head, set the BH callback function End_buffer_async_read, and call SUBMIT_BH to initiate the request. In the SUBMIT_BH function, the buffer_head constructs the bio, sets the callback function End_bio_bh_io_sync of the bio, and finally sends the bio request to the specified fast device through Submit_bio.
4, if the user state calls a write function, the kernel executes the Blkdev_file_write function, if not the direct IO operation, then executes the buffered write operation procedure, calls directly generic_file_buffered_ Write function. The Buffered write method writes data directly to the cache and replaces the cache, and the actual fast device needs to be manipulated during the replacement operation, Address_space->a_ops provides a method for block device operation. When the data is written to the cache, the write function can be returned, and the subsequent asynchronous write task is mostly given to Pdflush Daemon (a part of it is done at the time of the substitution)
5, data flow operation to this step, we have been very clear how the user's data to the kernel. The closest approach to the user is file_operations, which defines the method for each device type (since Linux sees all devices as files, a file operation method is defined for each type of device, for example, a character device is Def_chr_fops, The block device is def_blk_fops and the network device is bad_sock_fops). The underlying operation method for each device type is different, but the File_operations method masks the difference of the device type, which is why Linux can interpret all the devices as files. Here, another question is raised: In that case, how should the differentiation of the equipment be embodied? At the file system level, the method of accessing the device by the file system is defined, the method is Address_space_operations, and the file system can access the specific device through this method. For character devices, it is not necessary to implement the Address_space_operations method, because the interface of the character device is the same as the interface of the file system, and in the process of the character device open operation, the inode points to the File_ The operations is replaced by the file_operations that Cdev points to. This allows the user-level read-write character device to call the Cdev File_operations method directly.
6, up to step (4), read operation in the case of no hit cache through the Readpage function in the Address_space_operations method to initiate a block device read request The write operation initiates a block device request when replacing the cache or Pdflush wakeup. The process of initiating a block device request is the same, first building the bio structure based on requirements, the bio structure contains information such as read-write address, length, destination device, callback function, etc. After the bio is constructed, the request is forwarded to the specific block device through a simple submit_bio function. As can be seen from here, the block device interface is very simple, the interface method is Submit_bio (the lower function is generic_make_request), the data structure is the struct bio.
7, Submit_bio function through Generic_make_request forwarding bio,generic_make_request is a loop, which is registered under each block device Q->make_request_ The FN function interacts with the block device. If the block device being accessed is a device with a queue, the system's __make_request function is registered with the Q->MAKE_REQUEST_FN, otherwise the block device registers a private method. In a private method, because there are no queue queues, the specific request is not processed, but the bio is forwarded by modifying the method in the bio, which in the private Make_request method often returns 1, telling generic_make_request Continue forwarding than bio. There are two possible execution contexts for generic_make_request, one for the user context and the other for the kernel thread context where Pdflush resides.
8, through the continuous forwarding of generic_make_request, the final request will be bound to a queue on a block device, assuming that the final block device is a SCSI disk (/DEV/SDA). Generic_make_request calls __make_request when the request is forwarded to SDA, which is a block device request handler provided by Linux. An extremely important operation is implemented in this function, which is commonly referred to as the IO schedule implemented in this function. In this function, you attempt to merge the forwarded bio into an existing request, and if it can be merged, attach the new bio request to an already existing request. If you cannot merge, assign a new request and add the bio to it. After all this is done, the bio that was forwarded through Generic_make_request has arrived at a site in the kernel,--request, and found a temporary destination. At this point, there is no action to actually start the physical device. Before __make_request exits, the sync tag in a bio is judged, and if the tag is valid, the requested bio is a very real-time operation and cannot stay in the kernel, so the __generic_unplug_device function is called. The function will trigger the next stage of action, and if the token is not valid, then the request will need to remain in the queue for a period of time until the queue triggers the alarm and the next stage of action is triggered. The __make_request function returns 0, telling Generic_make_request that no more bio-forwarding is needed, and the bio-forwarding ends.
9, so far, the file system (Pdflush or address_space_operations) sent down the bio has been merge into the request queue, if it is sync bio, then call directly __generic_unplug_ device, otherwise you need to perform Q->UNPLUG_FN in the context of the soft interrupt of the unplug timer. Subsequent request processing should be related to the specific physical equipment, but how to reflect the differences of the different physical devices on the standard block device? This difference is reflected in the queue method, different physical devices, the queue method is not the same. For example, SDA is a SCSI device that registers the SCSI_REQUEST_FN function with the queue Request_fn method at SCSI middle level. The specific handler function Q->REQUEST_FN of the request queue is called in the Q->unplug_fn (method: Generic_unplug_device) function. Ok, to this step, the block device layer is actually connected to the SCSI bus driver, and their interface method is REQUEST_FN (the function is SCSI_REQUEST_FN).
10, after understanding the point (9), the following process is actually related to the specific SCSI bus operation. The request queue is scanned in the SCSI_REQUEST_FN function and a request is obtained from the queue through the Elv_next_request function. The Q->PREP_RQ_FN (SCSI layer registration as SCSI_PREP_FN) function registered with the SCSI bus layer in the Elv_next_request function converts the specific request to a SCSI command known to the SCSI driver. After obtaining a request, the SCSI_REQUEST_FN function calls the Scsi_dispatch_cmd function directly to send the SCSI command to a specific SCSI host. At this point, there is a question: is the SCSI command specifically forwarded to the SCSI host? The secret is that in Q->queuedata, when assigning the queue queue to the SDA device, the relationship between the SDA block device and the underlying SCSI device (SCSI devices) has been specified, and their relationship is maintained through the request queue.
11. In the Scsi_dispatch_cmd function, the SCSI command is sent to the SCSI host via the SCSI host interface method Queuecommand. Typically, the SCSI host's Queuecommand method hangs the received SCSI command into its own maintained queue, and then initiates the DMA process to send data from the SCSI command to the specific disk. After the DMA is complete, the DMA controller interrupts the CPU, tells the CPU that the DMA process is over, and sets the bottom half of the interrupt in the interrupt context. A soft interrupt is triggered after the DMA interrupt service program returns, performing the lower half of the SCSI interrupt.
12, in the lower part of the SCSI interrupt, call the SCSI command end of the callback function, this function is often scsi_done, in the Scsi_done function call blk_complete_request function end request requests, Each request maintains a bio chain, so the bio callback function in each request is recalled during the end of the request, ending the specific bio. Bio also has the file system's buffer head generated, so at the end of bio, callback Buffer_head callback handler function bio->bi_end_io (registered as End_bio_bh_io_sync). Since then, a series of callback processes triggered by interrupts have ended, summarizing the callback process as follows: Scsi_done->end_request->end_bio->end_bufferhead.
13, after the end of the callback, the file system-initiated read and write operation process is complete.
Linux block device read and write process