Disk DMA Process Analysis

Source: Internet
Author: User
When we write a Write System Call in the application program and write data to the disk, the write request first calls the underlying write function to write the request to the page cache in the memory). The page cache is a cache of the hard disk in the memory and is the high-speed cache of the main disk used by the Linux kernel. In most cases, the kernel references page cache when reading and writing data to a disk (a small number of applications bypass the page cache, such as database software ).

Before writing a page of data in the page cache to a block device, the kernel first checks whether the corresponding page is already in the cache. If not, add a new item to it, fill this item with the data to be written to the disk. I/O data is not transmitted immediately. Instead, it takes several seconds to update the disk, in this way, the process has the opportunity to further modify the data to be written to the disk (that is, the Kernel performs a delayed write operation ).

When the kernel decides to input and output block data from block I/O devices in the form of a file system, virtual memory subsystem, or system call, it will combine a bio structure, used to describe this operation. This structure is passed to the I/O Code. The Code combines it into an existing request structure, or creates a new request structure as needed. The bio structure contains all the information about the request executed by the driver, instead of being associated with the process that initializes the user space of the request.

In the kernel, basic containers for block device I/O operations are represented by Bio struct, which is defined in <Linux/Bio. h>, this struct represents the block I/O operations that are being organized (active) in the form of a segment linked list. A fragment is a small contiguous memory buffer. The advantage is that a single buffer must be continuous. Therefore, we use fragments to describe the buffer. Even if a buffer is scattered across multiple locations in the memory, the Bio struct can guarantee the execution of I/O operations on the kernel, this is called the aggregation I/O (scatter/gather ).

Bio is the main data structure of the General layer. It describes both the disk location and the memory location, and is the connection link between the upper kernel VFS and the lower driver.

Struct bio {

Sector_t bi_sector;// The first (512 bytes) sector to be transmitted for the bio structure: the location of the Disk

Struct bio * bi_next; // request linked list

Struct block_device * bi_bdev; // related block Device

Unsigned long bi_flags // status and command flag

Unsigned long bi_rw; // read/write

Unsigned short bi_vcnt; // Number of bio_vesc Offsets

Unsigned short bi_idx; // current index of bi_io_vec

Unsigned short bi_phys_segments; // Number of merged fragments

Unsigned short bi_hw_segments; // Number of remapped fragments

Unsigned int bi_size; // I/O count

Unsigned int bi_hw_front_size; // the size of the first merged segment;

Unsigned int bi_hw_back_size; // the size of the last merged segment

Unsigned int bi_max_vecs; // maximum number of bio_vecs

Struct bio_vec * bi_io_vec;// Bio_vec linked list: memory location

Bio_end_io_t * bi_end_io; // I/O Completion Method

Atomic_t bi_cnt; // count

Void * bi_private; // Private method of the owner

Bio_destructor_t * bi_destructor; // destruction method

};

 

The file system needs to write data to the hard disk and save it in the page cache. How does this process establish a relationship with DMA?

 

DMA disk write Process Overview:

If the hard disk supports DMA and the DMA is enabled in the operating system, the DMA Operation is involved each time the disk is read/written. Although the file system does not consecutive hard disk I/O requests and the physical memory pages of the data are not consecutive, the operating system will combine these discontinuous memory pages, enable the DMA Operation again (the process of enabling the DMA is costly, and a series of registers need to be set), the data can be transferred at one time, so that the data can be efficiently transmitted. There is a physical device descriptor table (prdt) in the kernel. to transmit data, you must fill in the corresponding physical pages and the Data Length in the physical pages into the prdt ., The prdt structure is as follows:

 

Figure 1 Description:

The size of each prdt is 8 bytes. The size of 0-3 bytes indicates the memory address of the physical page. The size of 4-5 bytes indicates the number of memory areas. The value is 64 kB in bytes. The last bit of the last byte indicates the end of The prdt table.

The scsi_init_io function of the SCSI layer encapsulates bio and maps it to the scatterlist struct of DMA. This struct is an item in the prdt (the dma_desc_array corresponds to the prdt in the kernel), which is used to point to each memory block. The rest of the work is to set the DMA register and then send it. We will analyze this part of code in detail later.

 

The path for the Write System to call the kernel state processing function is as follows:

After a series of processing, after the write system calls the processing, if the disk data needs to be written, it will eventually go through the following path:

Scsi_scan_target (SCSI scan function) -- "_ scsi_scan_target --" scsi_sequential_lun_scan -- "queue --> scsi_alloc_sdev --" scsi_alloc_queue (SCSI allocation Queue), separated from here, one path is to set the DMA and send the command to the DMA controller (Path 1), and the other is to initialize the function path (Path 2 ).

Path 1: scsi_request_fn --> scsi_dispatch_cmd -- "scsi_log_send --" (. queuecommand = register,) register -- "_ ata_scsi_queuecmd --" ata_scsi_translate -- "ata_qc_issue --" (bfin_bmdma_setup: Set the DMA register/bfin_bmdma_start: Start DMA)

Path 2: scsi_prep_fn --> scsi_setup_blk_pc_cmnd --, scsi_init_io --, scsi_init_sgtable --, and blk_rq_map_sg (the request struct of the function encapsulates the bio struct ).

The following mainly analyzes the bfin_bmdma_setup and bfin_bmdma_start functions, namely the DMA Operation Process:

(1) The software has prepared a PRD table in the memory, each 8 bytes, aligned to the 4-byte boundary.

(2) The software sets the start address of the PRD table, and sets the data and transmission direction by setting the read/write control bit to clear the interrupt bit and error bit in the Status Register.

(3) The software sends the DMA instruction to the disk device.

(4) Write 1 to the channel corresponding to the bus controller ide command register to enable the bus controller.

(5) DMA requests the controller from the IDE device to transmit data to/from the memory

(6) When the transfer ends, the IDE device is interrupted

(7) after an interruption is received, the software sets the start/second bit of the command register, and then reads the Controller status and driver status to determine whether the transfer is successful.

The Code is as follows:

 

 

 

 

/**

* Bfin_bmdma_setup-set up ide DMA transaction

* @ QC: info associated with this ata transaction.

*

* Note: original code is ata_bmdma_setup ().

*/

 

Static void bfin_bmdma_setup (struct ata_queued_cmd * QC)

{

Struct ata_port * ap = QC-> aP;

/* The following struct is the encapsulation of the scatterlist struct, which points to the locations of all scatterlists in the memory, corresponding to the above steps (2 )*/

Struct dma_desc_array * dma_desc_cpu = (struct dma_desc_array *) AP-> bmdma_prd;

Void _ iomem * base = (void _ iomem *) AP-> ioaddr. ctl_addr;

/* DMA configuration */

Unsigned short Config = dmaflow_array | ndsize_5 | restart | wdsize_16 | dmaen;

Struct scatterlist * SG;

Unsigned int Si;

Unsigned int channel;

Unsigned int dir;

Unsigned int size = 0;

Dev_dbg (QC-> aP-> Dev, "inatapi DMA Setup \ n ");

/* Program the ata_ctrl register withdir */

/* Set the ATA control register, which is irrelevant to the DMA control register */

If (QC-> TF. Flags & ata_tflag_write ){

Channel = ch_atapi_tx;

Dir = dma_to_device;

} Else {

Channel = ch_atapi_rx;

Dir = dma_from_device;

Config | = WNR;

}

 

Dma_map_sg (ap-> Dev, QC-> SG, QC-> n_elem, DIR );

/* Fill the atapi DMA controller * // fill in the SG struct one by one. The SG struct is used to point to each memory block to be transmitted, corresponding to the above steps (1)

For_each_sg (QC-> SG, SG, QC-> n_elem, Si ){

Dma_desc_cpu [Si]. start_addr = sg_dma_address (SG );

Dma_desc_cpu [Si]. cfg = config;

Dma_desc_cpu [Si]. x_count = sg_dma_len (SG)> 1;

Dma_desc_cpu [Si]. x_modify = 2;

Size + = sg_dma_len (SG );

}

 

/* Set the last descriptor to stop mode */

Dma_desc_cpu [QC-> n_elem-1]. cfg & = ~ (Dmaflow | ndsize );

 

Flush_dcache_range (unsignedint) dma_desc_cpu,

(Unsigned INT) dma_desc_cpu +

QC-> n_elem * sizeof (struct dma_desc_array ));

 

/* Enable ata DMA Operation */

// Set the scatterlist structure to be obtained from the bmdma_prd_dma pointer // fixed memory location

Set_dma_curr_desc_addr (Channel, (unsigned long *) AP-> bmdma_prd_dma );

// Initialize the DMA Operation

Set_dma_x_count (Channel, 0 );

Set_dma_x_modify (Channel, 0 );

Set_dma_config (Channel, config );

 

Ssync ();

 

/* Send ata DMA command */

/* It should be noted that although the DMA command is sent, the real DMA Operation has not started yet;

* Various ata device registers are set in this function and will be returned after the settings are complete.

*/

Bfin_exec_command (AP, & QC-> TF );

// Determine the DMA direction based on the IO operation direction set during initialization. Corresponding steps (2)

If (QC-> TF. Flags & ata_tflag_write ){

/* Set ata DMA Write direction */

Atapi_set_control (base, (atapi_get_control (base)

| Xfer_dir ));

} Else {

/* Set ata DMA read direction */

Atapi_set_control (base, (atapi_get_control (base)

&~ Xfer_dir ));

}

 

/* Reset all transfer count */

Atapi_set_control (base, atapi_get_control (base) | tfrcnt_rst );

 

/* Set atapi state machine contorl interminate sequence */

Atapi_set_control (base, atapi_get_control (base) | end_on_term );

 

/* Set transfer length to the totalsize of SG buffers */

Atapi_set_xfer_len (base, size> 1 );

/**

* Bfin_bmdma_start-start an ide dma transaction

* @ QC: info associated with this ata transaction.

*

* Note: original code is ata_bmdma_start ().

*/Static void bfin_bmdma_start (structata_queued_cmd * QC)

{

Struct ata_port * ap = QC-> aP;

Void _ iomem * base = (void _ iomem *) AP-> ioaddr. ctl_addr;

 

Dev_dbg (QC-> aP-> Dev, "inatapi DMA start \ n ");

 

If (! (Ap-> udma_mask | ap-> mwdma_mask ))

Return;

 

/* Start atapi transfer */

If (ap-> udma_mask)

Atapi_set_control (base, atapi_get_control (base)

| Ultra_start );

Else

Atapi_set_control (base, atapi_get_control (base)

| Multi_start );

}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.