Analysis of Linux SCSI callback IO

Source: Internet
Author: User
Tags error handling

This article is reproduced from:, for reference, please visit the original link address.

Did not find how to reprint the entrance, had to copy the full text.

-----------------------------------Split Line-------------------------------------------------

The SCSI subsystem in the LINUX kernel consists of the SCSI upper layer, the middle layer and the underlying driver module [1], which is responsible for managing SCSI resources and processing other subsystems such as file systems, which are submitted to IO requests in the SCSI subsystem. Therefore, understanding the IO processing mechanism of the SCSI subsystem is important for understanding the entire SCSI subsystem, and it also helps to understand the IO processing mechanism of the entire LINUX kernel. This paper describes the IO processing mechanism of SCSI subsystem from the request of SCSI device access, the processing of Access request by SCSI subsystem and three aspects of SCSI subsystem error handling.

Back to top of page

Submission of SCSI device access requests

The submission of SCSI device access requests is divided into two steps: The user space submits the access request to the common block layer and the universal block layer commits the block access request to the SCSI subsystem.

User space submits access requests to the generic block layer

In LINUX user space, there are three ways to submit access requests to SCSI devices to the common block layer:

    • Accessed through a file-system-provided access interface. This is the access method for file read and write operations on LINUX file systems built on SCSI devices;
    • How RAW devices are accessed. The most common application of this type of access is the dd command. The most important difference between RAW device access and the file access interface provided by the file system is that the former has direct linear address access to the SCSI device and does not need to be addressed by the file system;
    • SCSI PASSTHROUGH mode. This is the way in which the SG provided by LINUX is accessed, and the user can send the cdb[2] command directly to the SCSI device. Therefore, through this interface, users can do some SCSI management operations, such as SES management.

Figure 1 shows how the LINUX kernel handles three request submissions.

Figure 1. How the LINUX kernel handles three types of access requests

Requests submitted via the file system or RAW device will be generated by a block IO request (BIO) via the underlying block device access layer (Ll_rw_block ()) and submitted to the generic block layer [3], while access requests submitted via the SG interface invoke the interface provided by the SCSI middle tier. The request is referred directly to the generic block layer for processing.

Universal block-level commit block access request to SCSI subsystem

Why go through the universal block layer? This is because the general block layer will optimize the request based on the characteristics of the disk access, secondly, the general block layer provides the scheduling function, can dispatch the request, again, the universal block layer extensible structure, so that the various devices block drive can be relatively easy and its integration.

When a request is submitted to a generic block layer, the generic block layer needs to be prepared to dispatch and deliver block access requests to the SCSI middle tier. Block access requests can be understood as descriptions of block access areas, access methods, and associated BIO requests, in the kernel with ' struct request‘ struct representation '. The block device will have a corresponding block access request device queue that records the access requests that need to be processed by the device, and the newly generated block access request is added to the Block Access request queue for the corresponding device. The SCSI subsystem's processing of IO is actually a block access request that is processed on the block access request queue.

The universal block layer provides two ways to dispatch processing block access request queues: direct dispatch and dispatch execution through the LINUX kernel work queue mechanism. In both ways, the block Access request queue handler is called for processing, and the SCSI device registers the block Access request queue handler defined by the SCSI subsystem to the generic block layer at initialization. Listing 1[4] shows this process. In this way, when the generic block layer handles the block Access request queue for SCSI devices, these handler functions are defined by the SCSI middle tier. In this way, the generic block layer gives the block access request processing to the SCSI subsystem.

Listing 1. Processing functions

struct Request_queue *scsi_alloc_queue (struct scsi_device *sdev)  {   ...    Q = Blk_init_queue (SCSI_REQUEST_FN, NULL);     Request generate block layer allocate a request queue ...    BLK_QUEUE_PREP_RQ (q, SCSI_PREP_FN); Prepare a SCSI request     blk_queue_max_hw_segments (q, shost->sg_tablesize);     Define SG Table Size ...    Blk_queue_softirq_done (q, Scsi_softirq_done);  }

Back to top of page

SCSI subsystem handles block access requests

When the SCSI subsystem's request queue handler is called by the universal Block layer, the SCSI middle tier generates, initializes, and submits the SCSI command () to the SCSI TARGET, based on the contents of the block access request struct scsi_cmd .

SCSI command initialization and commit

The SCSI command logs the command description block (CDB), the sensing data cache (sense buffer), the IO timeout, and other information required by the SCSI subsystem to process commands such as callback functions, and other SCSI-related information. Listing 2 shows the main structure of this command.

Listing 2. Main structure

 struct SCSI_CMND {... void (*done) (struct scsi_cmnd *);                /* Mid-level done function */... int retries;   /*retried time*/int Timeout_per_command;  /*timeout define*/. Enum Dma_data_direction sc_data_direction;   /*data transfer direction*/... unsigned char cmnd[max_command_size];  /*cdb*/void *request_buffer;  /* Actual Requested buffer */struct request *request;                                /* The command we are working on */... unsigned char sense_buffer[scsi_sense_buffersize];                                /* Obtained by REQUEST Sense when * CHECK CONDITION are received on original * Command (Auto-sense)///* Low-level done function-can is used by *//*low-level Driver To completion function.     */void (*scsi_done) (struct scsi_cmnd *); ...}; 

The initialization process first takes a block access request from the request queue of the block device according to the elevator scheduling algorithm, defines the direction, length and address of the data transfer in the SCSI command, based on the information of the block access request. Next, define the callback function for the CDB,SCSI middle tier, and so on.

After initialization is complete, the SCSI middle tier submits the SCSI scsi_host_template 结构中定义 command to the SCSI low-level driver by invoking the function [5] queuecommand . queuecommandfunction, which is a SCSI command queue handler, defines the specific implementation of the function in the SCSI underlying driver queuecommand . Therefore, the SCSI middle tier, the calling queuecommand function is actually the processing entity that invokes the function defined by the underlying driver queuecommand , and submits the SCSI command to each manufacturer-defined SCSI underlying driver for processing. This process and the common block device layer calls the SCSI middle-tier processing function for block request processing mechanism is very similar, this also reflects the LINUX kernel code is very good extensibility. After the underlying driver accepts the request, it begins to process the SCSI command, which is closely related to the hardware, so the code is generally implemented by the individual manufacturers themselves. The basic process can be summarized as: from the bottom-driven maintenance queue, take out a SCSI command, encapsulated into the manufacturer's custom request format, and then use DMA or other means to submit the request to the SCSI target end, the SCSI target side of the request processing, and return the execution results to the SCSI bottom Drive layer.

Processing of SCSI command execution results

When the SCSI underlying driver accepts the result of the command returned by the SCSI TARGET, the SCSI subsystem completes the processing of the command execution result through two callback procedures. The SCSI underlying driver, after accepting the result of the command returned by the SCSI TARGET, invokes the callback function defined by the SCSI middle tier and delivers the processing result to the SCSI middle tier for processing, which is the first callback procedure. When the SCSI middle tier process is complete, the callback function defined by the SCSI upper layer is called, ending the IO processing in the entire SCSI subsystem, which is the second callback procedure.

First callback:

The SCSI middle layer queuecommand also passes the callback function pointer to the SCSI low-level driver while calling the function to submit the SCSI command to the SCSI underlying driver. After the underlying driver accepts the result of command execution returned by the SCSI TARGET, the callback function is called, resulting in a soft interrupt with BLOCK_SOFTIRQ for the first callback processing. During this callback process, the SCSI middle tier first determines whether the request processing succeeds based on the results of the SCSI underlying driver processing. Processing success does not mean handling no errors, but returning information that allows the SCSI middle layer to know clearly that there is no need for the middle tier to continue processing for this command. Therefore, for a successful SCSI command, the SCSI middle tier invokes the second callback function into the second callback procedure. Listing 3 shows the processing function for the soft interrupt defined by the SCSI middle tier.

Listing 3. The processing function of the soft interrupt

static void Scsi_softirq_done (struct request *rq)  {     ...    disposition = scsi_decide_disposition (cmd);    ... Switch (disposition) {case       SUCCESS:         scsi_finish_command (cmd);           Enter to second callback process break         ;       Case Needs_retry:         scsi_retry_command (cmd);         break;       Case Add_to_mlqueue:         scsi_queue_insert (cmd, scsi_mlqueue_device_busy);          break;        Default:          if (!scsi_eh_scmd_add (cmd, 0))             scsi_finish_command (cmd);     }  }

Second callback:

Different SCSI upper modules define their own second callback function, such as the SD module, which sd_init_command defines its own second callback function in the function, which sd_rw_intr will further deal with the results of the SCSI command execution based on the needs of the SD module. Listing 4 shows the code for the SD module to register the second callback. Although each SCSI upper module can define its own second callback function, these callback functions eventually end up handling the block access request from the SCSI subsystem.

Listing 4. The SD module registers the code for the second callback

static int Sd_init_command (struct scsi_cmnd * scpnt)  {     ...    Scpnt->done = sd_rw_intr;     return 1;  }

Back to top of page

Error handling for SCSI subsystems

Because the SCSI underlying driver is implemented by the vendor itself, this is not discussed here. In addition, the SCSI subsystem error handling, mainly by the SCSI middle-tier completed. During the first callback, the SCSI low-level driver returns the processing result of the SCSI command and the obtained SCSI status information to the SCSI middle layer, and the SCSI middle layer first evaluates the results of the SCSI command execution returned by the SCSI bottom drive, and if no definite conclusion is reached, the SCSI bottom Layer drives the returned SCSI status, sensing data, and so on. In the case of a successful SCSI command, the SCSI middle tier will make a second callback directly, and for a command that needs to be retried, it will be added to the block device request to the column and re-processed. This process can be referred to as a basic way to determine the result of SCSI command execution in the SCSI middle tier.

Everything seems to be so simple, but in fact it is not, some errors are not well-defined, such as sensing data errors or TIMEOUT errors. To solve this problem, the SCSI subsystem in the LINUX kernel introduces a thread dedicated to error handling, and the SCSI command that cannot determine the cause of the error is referred to the thread for processing. The threading process is closely related to the two queues, one is the error handling queue (), and the other eh_work_q is the error handling completion queue ( done_q ). The error handling queue logs SCSI commands that require error handling, and the error handling completion queue Logs SCSI commands that are processed during error handling. Listing 5 shows the process by which the thread incorrectly handles the commands that are logged on the error-handling queue.

Listing 5. Process of error handling

scsi_unjam_host{...    if (!scsi_eh_get_sense (&eh_work_q, &eh_done_q))       //get sense Data         if (!scsi_eh_abort_cmds (&eh_ Work_q, &eh_done_q))           //abort command     scsi_eh_ready_devs (Shost, &eh_work_q, &eh_done_q);       Reset     scsi_eh_flush_done_q (&eh_done_q);       Complete error IO on done_q ...     }

The entire process can be summarized into four stages:

  • Sensing data Query phase

    By querying the sensing data, we provide a basis for the processing of SCSI commands, and judge them according to the above basic judgment method. If you determine that the result is successful or retry, you can move the command from the error handling queue to the error handling completion queue. If the judgment fails, the command will remain in the SCSI error-handling queue, and error handling enters the ABORT phase.

  • Abort phase

    At this stage, the SCSI command on the error handling queue is actively dropped. The command that is ABORT is added to the error handling completion queue. If the ABORT process ends and there are no commands on the error handling queue that are not processed, you need to go to the START STOP UNIT stage for processing.

  • START STOP Unit Stage

    At this stage, the start Stop unit[6] command is sent to the SCSI device associated with the command on the error handling queue to attempt to recover the SCSI device, if there is still a command on the error-handling queue after the START stop UNIT phase is completed, you will need to Proceed to the RESET stage for processing.

  • Reset Stage

    The process of the reset phase is divided into three levels: DEVICE reset,bus reset and HOST reset. The SCSI device that is associated with the command on the wrong queue is first reset, and if the device reset is in a normal state, the error command on the error handling queue associated with the equipment is added to the error handling completion queue. If you cannot handle all error commands through DEVICE Reset, you need to go to the bus reset stage, and bus reset will reset the bus associated with the command on the error handling queue. If BUS reset does not successfully process all SCSI commands on the error-handling queue, it enters the host reset stage, and host Reset operates on the host associated with the command on the error-handling queue. Of course, it is possible that HOST RESET cannot successfully process all the error commands, and only the SCSI device associated with the error command on the error handling queue cannot be used. These devices that cannot be used are marked as not available and the associated error commands are added to the error handling completion queue.

For requests that are added to the error processing completion queue, the commands are re-added to the Block Access request queue for re-processing if the device status is correct and the number of command retries is less than the allowable number of times, otherwise, the second callback processing is done directly to complete the processing of block access requests by the SCSI subsystem. In this way, the SCSI subsystem completes the entire process of SCSI command error handling.

SCSIDisk device driver throughSCSIMiddle level layer toSCSIHost submits the request,SCSIDisk driver is a block device driver that calls the UNPLUG_FN function of the block device layer to handleSCSIDisk's request queue. In processingSCSIThe disk request queue process is calledSCSIMiddle Level RegisterSCSIThe _REQUEST_FN function implements the specific operation.SCSIThe _REQUEST_FN function takes a request from the IO dispatch queue and then converts the request to aSCSIcommand, and finally call directlySCSIThe _dispatch_cmd function willSCSIcommand is submitted toSCSIHostSCSIHost andSCSIThe interface of the middle level layer is the Queue_command function, eachSCSIThe host driver willSCSIThe middle level layer registers the specific Queue_command method. Because the Queue_command function executes in a non-sleeping context, it cannot handle complex operations, and the usual operation is to receive theSCSIcommand is placed intoSCSIThe host maintains the processing queue. If a trueSCSIHost, instead of a virtual host, then theSCSIThe host layer canSCSIcommand is transferred via DMA toSCSIDisk The above process completes an IO request submission process, for devices such as disks, in this process need to take into account the characteristics of the storage media and the application of the characteristics of access mode, so there is a need to do some IO scheduling strategy, so thatSCSIDisk read and write more satisfied with the characteristics of the storage medium. Of course, you can alsoSCSIThe upper layer of disk implements a more advanced IO management strategy. The commit of the IO request can be understood as the first half of the entire IO process, then the second half is the completion of the IO callback process, the following analysis of LinuxIO callbackThe specific implementation of the path.

After an IO event is complete, SCSI disk will notify the SCSI host driver in an interrupted manner. When a SCSI host interrupt event occurs, the CPU executes the host's interrupt service program, usually the actual SCSI host is in the form of a PCI device, taking into account the interruption sharing problem, In the Interrupt service program, you first need to make a judgment of the interrupt event, and then perform the processing of the specific interrupt task based on the status register of the SCSI host. For read/write IO requests, the DMA end interrupt signal is generated when the data is DMA to the SCSI disk, and it is possible to use the technology of Scatter-gather DMA in the DMA process. Therefore, this process does not involve the memory copy of the data, that is, in the read/write IO process, the data is always in the page of the bio page (the data in the writing process will be directly in the page pages DMA to disk, the data in the process of reading the data DMA directly into the page of bio, This processing mechanism is more efficient). When host determines the completion request, the SCSI middle Level callback function is called, which is known as the SCSI _done. The SCSI _done is submitted to the SCSI host layer during Queue_command. The Blk_complete_request function was called directly in the SCSI _done function, which triggered the by Raise_softirq_irqoff (BLOCK_SOFTIRQ). Soft interrupt for SCSI . So far, the above procedure has been performed in the upper half of the interrupt in SCSI host. The upper half of the interrupt should not run too long, or it will cause the interruption event to be lost. After the soft interrupt is triggered, the upper half of the interrupt can be exited. After exiting the upper half, the CPU will be handed over to the SCSI soft Interrupt service that has been triggered, and you can see that the soft interrupt service is still running in the interrupt context, not a context that can be dispatched.


Soft interrupt execution function is BLK_DONE_SOFTIRQ, because it is the interrupt event that is raised by the SCSI command, so the SCSI that is registered to the request queue beforehand is invoked The Softirq_done function, complete the specific SCSI soft interrupt the next half of the event processing. In this function will do some of the correctness of the SCSI command execution, if the command execution error, you can use the method of retry the command requeue processing, when the retry to a certain extent, will execute the error The SCSI command gives the SCSI error handling kernel daemon for final judgment, and if successful, the call SCSI _finish_command function ends SCSI command. Call the SCSI _io_completion function in the SCSI _finish_command function to end the block-level IO request, which calls the The SCSI _end_request function, and then calls the Blk_end_request function, and finally the Blk_end_io function. In the Blk_end_io function, all the bio on request is ended, and the process of ending bio can call the Bio_endio function. Release the request resource after all the bio in the request has ended. At this point, when the request for a bio is processed by the SCSI disk, it has been completely processed by interrupting the upper and lower half of the interrupt. It is important to note that all of the callback procedures for IO are handled in the context of the interrupt, so you need to be aware of the sleep problem when writing the callback function for IO, you need to consider the possible sleep of memory allocation, and the use of semaphores can cause sleep, which causes the system to crash.

Through the above analysis, the normal IO callback path ofSCSI disk involves the following function description:SCSI_doneàblk_done_softirqàSCSI_softirq_ Doneà SCSI _finish_commandàSCSI _io_completionàSCSI _end_requestàblk_end_ioàbio_endio.

Linux recently changed significantly in SCSI Middle level, which is based on the newer Linux-2.6.28 version, and the IO callback process has changed somewhat compared to the previous version of 2.6.18.

Back to top of page


This paper analyzes the IO processing mechanism in SCSI subsystem, hoping to understand that SCSI subsystem and block device driver can be helpful.


    • "1" Linux SCSI subsystem anatomy-
    • "2" refer to SCSI Primary Commands-4 (SPC4).
    • "3", "Linux Kernel analysis and Programming", Nujili, electronic industry press.
    • "4" (GPL LICENCE Version 2).
    • "5" Linux core, David A rusling,
    • "6" Reference: "SCSI-3 BLOCK Commands (SBC)", Information Technology Industry Council.
    • Find out more about our most popular articles and tutorials in the DeveloperWorks Linux zone for more reference materials for Linux developers, including beginners for Linux.
    • check all Linux tips and Linux tutorials on DeveloperWorks.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.