Virtio-blk Analysis
Virtio-blk Analysis
Like virtio-network, the virtio-blk driver uses the Virtio mechanism to provide a high-performance block device I/O Method for Guest. Here we will look at the implementation of virtio-blk.
Block devices in Linux
Before introducing virtio-blk, let's talk about the overall architecture of Block devices in the Linux kernel.
Basic Concepts
Linux has three main device files:
1. Character devices: devices that perform sequential I/O operations in bytes;
2. block device: the block device receives the input and returns in block units. The buffer zone corresponding to the I/O request can be accessed randomly. The access location of the block device must be moved before and after different intervals of the media. In Block devices, the smallest addressable unit is the sector. The size of the sector is generally an integer multiple of 2, and the common size is 512 bytes;
3. network equipment: Provides network data communication services.
The topic here is about Block devices.
1. Sectors: the basic unit for data processing by any block device hardware. Generally, the size of one slice is 512 bytes.
2. Block (Blocks): Linux develops the basic unit for processing data such as the kernel or file system. Generally, one block consists of one or more sectors.
Overall Architecture
Instructions:
1. the general Block Layer maintains the relationship between an I/O request in the upper-Layer file system and the underlying physical disk. In the general block layer, a bio struct is usually used to correspond to an I/O Request.
2. The driver sends a request to the block device for input or output (I/O) operations. The request structure is described in the driver. However, for some disk devices, the request speed is very slow. At this time, the kernel provides a queue mechanism to add these I/O requests to the queue (that is, the request queue ), use the request_queue struct in the driver.
3. i/O scheduling Layer (I/O Scheduler Layer): Before the request is submitted to the block device, the kernel will execute the merge and sort pre-operations of the request, to improve access efficiency, and then the I/O scheduler subsystem in the kernel is responsible for submitting I/O requests, the scheduler allocates disk resources to all pending block I/O requests in the system. It manages the request queues of Block devices, determines the order of requests in the queue and when requests are sent to the device.
4. For each independent disk device or partition, Linux provides a gendisk data structure for accessing the underlying physical disk. There is a hardware operation structure pointer in gendisk, which is the block_device_operations structure.
When multiple requests are submitted to the block device, the execution efficiency depends on the Request order. If all requests are in the same direction (such as writing data), the execution efficiency is the greatest. Before the kernel calls a block device driver routine to process requests, it first collects I/O requests and sorts the requests. Then, it combines multiple requests operated by consecutive sectors to improve execution efficiency, the algorithm for sorting I/O requests is called the elevator algorithm (elevator algorithm ). The elevator algorithm is completed at the I/O scheduling layer. The kernel provides different types of elevator algorithms, including:
1. noop (implement simple FIFO and basically merge and sort directly );
2. anticipatory (delayed I/O requests, optimized sorting of Critical Zones );
3. Deadline (to improve anticipatory disadvantages and reduce the delay time );
4. Cfq (uniform I/O bandwidth allocation, fair mechanism ).
Data Structure
1. Block device object structure block_device
The kernel uses a block_device instance to represent a block device object, such as the entire hard disk or a specific partition. If this structure represents a partition, its member bd_part points to the partition structure of the device. If this structure represents a device, its member bd_disk points to the generic hard disk structure gendisk of the device.
When you open the block device file, the kernel creates the block_device instance structure. The device driver also creates the gendisk instance structure, allocates the Request queue, and registers the block_device structure.
2. Generic hard disk structure gendisk
The struct gendisk represents a general hard disk object, which stores information about a hard disk, including the request queue, partition linked list, and block device operation function set. The block Device Driver allocates the structure gendisk instance, loads the partition table, allocates the Request queue, and fills in other fields of the structure.
The block driver that supports partitions must contain header files and declare a structure gendisk. The kernel also maintains a global linked list gendisk_head for the structure instance. The chain list is maintained through the add_gendisk, del_gendisk, and get_gendisk functions.
3. request Structure
A structure request represents a pending I/O request. Each request is described in a structure request instance and stored in the request queue linked list, which is sorted by the elevator algorithm, each request contains one or more bio instances.
4. Request queue structure request_queue
Each block device has a request queue, and each request queue executes I/O scheduling independently. The request queue is a two-way linked list linked to the request structure instance, the chain table and the information of the entire queue are described by the structure request_queue, which is called the structure of the Request queue object or the structure of the Request queue. It stores information about pending requests and the information required to manage Request queues (such as elevator algorithms. The structure member request_fn is the request processing function from the device driver.
5. Bio Structure
Generally, one bio corresponds to one I/O request, and the IO scheduling algorithm can combine the continuous bio into one request. Therefore, one request can contain multiple bio.
Bio is the main data structure of the General layer. It describes both the disk location and the memory location, and is the connection link between the upper kernel vfs and the lower driver.
Summary
The I/O operations of Block devices are significantly different from those of character devices. Therefore, a series of data structures such as request_queue, request, and bio are introduced. In the I/O operations of the entire block device, requests are always carried out, and the I/O operations of character devices are directly accessed. To improve performance, block device I/O operations are queued and integrated.
The driver's task is to process requests. The queuing and integration of requests are solved by the I/O scheduling algorithm. Therefore, the core of the block device driver is the request processing function or the "manufacturing request" function.
Virtio-blk
Initialization
The relevant code is located at: drivers/block/virtio_blk.c
static int __init init(void){ int error; virtblk_wq = alloc_workqueue("virtio-blk", 0, 0); if (!virtblk_wq) return -ENOMEM; major = register_blkdev(0, "virtblk"); if (major < 0) { error = major; goto out_destroy_workqueue; } error = register_virtio_driver(&virtio_blk); if (error) goto out_unregister_blkdev; return 0;out_unregister_blkdev: unregister_blkdev(major, "virtblk");out_destroy_workqueue: destroy_workqueue(virtblk_wq); return error;}
Use register_blkdev () to register a block device at the block device layer, and use register_virtio_driver to register virtio_blk driver at the virtio layer. In the previous virtio analysis, the virtio device layer is a PCI device interface layer. Therefore, virtio blk is built on the pci interface.
When Qemu specifies the virtio blk device when starting Guest, the probe function registered in the virtio_blk structure will be called to initialize the virtio blk device during the startup process. Specific role blk_probe () is as follows:
Assign a struct virtio_blk structure to represent a virtio blk device.
vdev->priv = vblk = kmalloc(sizeof(*vblk), GFP_KERNEL);
Allocate a queue, which is different from the virtio-net device. Only one queue is used.
init_vq(vblk);
Allocates a gendisk structure, representing the virtio blk physical disk.
vblk->disk = alloc_disk(1 << PART_BITS);
Allocate the request_queue structure, under the gendisk structure of virtio-blk
q = vblk->disk->queue = blk_mq_init_queue(&virtio_mq_reg, vblk);
The operation and processing functions for the request are all in virtio_mq_ops of the virtio_mq_reg structure:
static struct blk_mq_ops virtio_mq_ops = { .queue_rq = virtio_queue_rq, .map_queue = blk_mq_map_queue, .alloc_hctx = blk_mq_alloc_single_hw_queue, .free_hctx = blk_mq_free_single_hw_queue, .complete = virtblk_request_done,};
The request storage zone vbr is initialized, and the structure is still in the form of scatter-list
blk_mq_init_commands(q, virtblk_init_vbr, vblk);
The basic container for block I/O operations in the kernel is represented by the bio struct. This struct represents the block I/O operations that are being organized on site (active) in the form of a segment linked list. A fragment is a small contiguous memory buffer. The advantage is that a single buffer must be continuous. Therefore, we use fragments to describe the buffer. Even if a buffer is scattered across multiple locations in the memory, the bio struct can guarantee the execution of I/O operations on the kernel, this is called aggregation I/O.
Name of the disk to which virtio blk is allocated
virtblk_name_format("vd", index, vblk->disk->disk_name, DISK_NAME_LEN);
The drive virtio_blk is displayed as "/dev/vda ", this is different from the "/dev/hda" or "/dev/sda" display identifier of the SATA hard disk.
Complete the disk information and register the disk information of virtio blk to the same management layer of the block device.
vblk->disk->major = major;vblk->disk->first_minor = index_to_minor(index);vblk->disk->private_data = vblk;vblk->disk->fops = &virtblk_fops;vblk->disk->driverfs_dev = &vdev->dev;vblk->index = index;add_disk(vblk->disk);
Data Processing
Backend ---> front-end
The request_queue queue of the gendisk structure in the virtio_blk structure receives bio requests from the block layer. According to the default processing process of the request_queue queue, bio requests are converted to requests at the io scheduling layer and then enter the request_queue queue, finally, call virtblk_request to convert the request to a vbr structure.
Virtio_queue_rq () <---- queue_rq member registered in the request_queue structure ---> Queue () <---- enter vbr in scatter-list --- >__ define blk_add_req () ---> revoke queue_add_sgs () <---- vring ---> queue queue_kick <---- notify the front-end
Finally, Qemu takes over the processing.
Frontend ---> backend
Qemu adds the request queue to virtio_ring after processing the request code. The Qemu sends an interrupt to the queue. The queue's interrupt response function vring_interrupt calls the queue's callback function virtblk_done;
virtblk_done() --->blk_mq_complete_request()
Finally, complete blk_request_done () is processed by the complete function registered by request_queue. The block device layer I/O is advertised through blk_mq_end_io.
Request lifecycle illustration