"Turn" bio with block device driver

Source: Internet
Author: User

Original Address: BIO and block device driver
devices in the system that have random access to fixed-size data slices (chunk) are called block devices, which are called blocks. Block device files are used in a way that installs the file system, which is also a common way for block devices to access them. Block devices are accessed randomly, which means they can jump from one location to another at random when accessing a device. The access location of the block device must be able to move around the different intervals of the media. so in fact the kernel does not have to provide a specialized subsystem to manage character devices, but the management of block devices must have a dedicated subsystem to provide services. The smallest addressable unit in a block device is a sector. The sector size is generally 2 integer multiples, while the most common size is 512 bytes. The size of the sector is the physical property of the device, which is the basic unit of all block devices-The block device cannot compare its smaller units for addressing and operation, and many block devices can transmit multiple sectors at once. A block is an abstraction of a file system-a file system can only be accessed based on blocks. Although physical disk addressing is done at the sector level, all disk operations performed by the kernel are done in blocks. Because the sector is the smallest addressable unit of the device, the block cannot be smaller than the sector and can be several times more than the sector size. The kernel also requires that the block size is an integer multiple of 2 and cannot exceed the length of one page. Therefore, the most important requirement for block size is that it must be an integer multiple of 2 of the sector size and smaller than the page size, so the size is usually 512 bytes, 1K or 4 K.

Before linux2.5, when a block is called into memory, to be stored in a buffer, each buffer corresponds to a block, equivalent to the disk block in memory representation, because the kernel in processing the data requires some relevant control information, so each buffer has a corresponding descriptor. This descriptor is represented by a buffer_head struct, also known as a buffer header.

struct buffer_head {

    unsigned long b_state; // Buffer status Flags

    buffers in struct buffer_head *b_this_page; //pages

     struct page *b_page; //Storage buffer page

    sector_t b_blocknr; //Logical Block number

    size_t b_size; //block size

    char *b_data; // Buffers in pages

    struct block_device *b_bdev; //block devices

    BH _END_IO_T *B_END_IO; //I/O Completion method

    void *b_private; //Complete method Data

    struct list_head b_assoc_buffers; //Related map list
    /* Mapping this Buffer is associated with */
    struct address_space *b_assoc_map;     

    atomic_t b_count; //缓冲区使用计数


The B_state field represents the state of the buffer, and the valid flags are stored in the Bh_state_bits enumeration, defined in the

Enum Bh_state_bits {
Bh_uptodate, the buffer contains the available data
Bh_dirty, the buffer is dirty (the contents of the cache are newer than the block contents of the disk, so the buffer contents must be written back to the disk)
Bh_lock, the buffer is being used by I/O operations and is locked to prevent concurrent access
Bh_req, the buffer has an I/O request operation
Bh_mapped, which is the available buffer for mapped disk blocks
Bh_new, the buffer is just mapped by Get_block (0) and cannot be accessed
Bh_async_read, the buffer is being used by the asynchronous I/O read operation through End_buffer_async_read ()
Bh_async_write, the buffer is being used by the asynchronous I/O write operation through End_buffer_async_write ()
Bh_delay, the buffer has not been associated with a disk block
Bh_boundary, the buffer is at the boundary of a contiguous chunk--the next block is no longer contiguous
The driver can safely define its own status flag in these bits, as long as the custom status flag is guaranteed not to conflict with the private bits of the block I/O layer.
In B_count, the use count of buffers is represented by two functions to increase or decrease:
GET_BH (struct buffer_head *bh)-->atomic_inc (&bh->b_count)
PUT_BH (struct buffer_head *bh)-->atomic_dec (&bh->b_count)

A block device driver accesses a device primarily by transferring random data of a fixed size. Efficient block device drivers are highly performance-critical and are not just read-write operations in user applications. Modern operating systems work with virtual memory, transferring unwanted data to other storage media, such as disks, where the block driver is a conduit between core memory and other storage media, so they can be considered part of the virtual memory subsystem. A data block specifies a fixed-size data, and the size of the value is determined by the kernel, the size of the data block is usually 4,096 bytes, but can be changed according to the architecture and the file system used. A block corresponding to a block of data, which is a chunk of the size determined by the underlying hardware. The size of the device sector processed by the kernel is 512 bytes. If the user's device uses a different size, the kernel needs to be modified to avoid generating I/O requests that the hardware cannot handle. Whenever the kernel gives the user a sector number, the size of the sector is 512 bytes. If you want to use a different hardware sector size, the user modifies the sector of the kernel accordingly. Again, this part is composed of a number of data structures and corresponding methods, the following first look at the relevant data structure:

The kernel uses the GENDISK structure to represent a separate disk device. The kernel also uses the GENDISK structure to represent partitions, in which many members must be initialized by the driver. This structure is defined in the

struct Gendisk {
int major; Main device number

int First_minor; First from the device number

int minors;
/* Describes the member of the device number used by the disk. A drive must use at least one number of times. If your driver is partitioned, but (and most of it should be), you will assign a secondary number to each possible partition. A normal value for the secondary number is 16, which allows the "full disk" device box to 15 partitions. Some disk drives use 64 times to give each device a number. */

Char disk_name[32]; Should be set to a member of the disk drive name. It appears in/proc/partitions and Sysfs.

struct Hd_struct **part; /* [indexed by minor] */
struct Block_device_operations *fops;//device operation collection.

struct Request_queue *queue;//is used by the kernel to manage the structure of I/O requests for this device;

The void *private_data;//block driver can use this member as a pointer to their own internal data.

sector_t capacity;
The capacity of this drive is measured in 512-byte sectors. The sector_t type can be 64-bit wide. The driver should not set this member directly; instead, pass the number of sectors to set_capacity.

int flags;
A set of flags (rarely used) that describes the state of the drive. If your device has removable media, you should set the genhd_fl_removable. The CD-ROM drive can be set GENHD_FL_CD. If, for some reason, you do not need the partition information to appear in/proc/partitions, set Genhd_fl_suppress_partitions_info.

struct device *driverfs_dev; Fixme:remove

struct device dev;
struct Kobject *holder_dir;
struct Kobject *slave_dir;
struct Timer_rand_state *random;
int policy;
atomic_t Sync_io; /* RAID */
unsigned long stamp;
int in_flight;
struct Disk_stats *dkstats;
struct Disk_stats dkstats;
struct Work_struct async_notify;

This structure is a dynamically allocated structure. Some special processing of the kernel is required for initialization, and the driver cannot allocate the structure itself dynamically, but must be called.
struct Gendisk *alloc_disk (int minors);//parameter is the number of secondary device numbers. After that, minors members cannot be changed. Assign the structure dynamically.
void Del_gendisk (struct gendisk *gd);//unmount disk. The parameter is a reference count structure that contains the Kobject object.
void Add_disk (struct gendisk *gd); Initializes the struct function, and once this function is called, the device is activated and the method it provides is called at any time. Do not call this function until the driver is fully initialized and is able to request the disk accordingly.

When the kernel determines the input and output block data from a block I/O device in the form of a file system, a virtual memory subsystem, or a system call, it is then combined with a bio structure to describe the operation. The structure is passed to the I/O code, the code merges it into an existing request structure, or, if necessary, creates a new request structure. The bio structure contains all the information about the driver execution request, not the process associated with initializing the requested user space.

The basic container for block I/O operations in the kernel is represented by the bio struct, which represents the block I/O operations that are being organized in the field (active) as a fragment (segment) list. A fragment is a small contiguous memory buffer. The advantage is that there is no need to ensure that a single buffer must be contiguous. So, by using fragments to describe the buffer, even if a buffer is scattered in multiple locations in memory, the bio struct can also guarantee the execution of I/O operations to the kernel, which is called the I/O.
Bio is the main data structure of the general-purpose layer, which describes both the location of the disk and the location of the memory, and is the connection between the upper core VFS and the lower driver.

struct bio {

sector_t bi_sector;

struct bio *bi_next; Request Chain List

struct Block_device *bi_bdev;//related block device

unsigned long bi_flags//status and command flags

unsigned long BI_RW; Write

unsigned short bi_vcnt;number of//bio_vesc offsets

unsigned short bi_idx; //Current index of the Bi_io_vec

Unsigned short bi_phys_segments;//number of fragments after binding

Unsigned short bi_hw_segments;//number of fragments after remapping

unsigned int bi_size; I/O count

unsigned int bi_hw_front_size;//The first can be combined with a segment size;

unsigned int bi_hw_back_size;//The last possible merged segment size

unsigned int bi_max_vecs; Maximum number of Bio_vecs

struct Bio_vec *bi_io_vec;//bio_vec Linked list: The location of the memory

bio_end_io_t *bi_end_io;//i/o Completion method

atomic_t bi_cnt; Usage count

void *bi_private; Private method of the owner

bio_destructor_t *bi_destructor; Method of Destruction


The main purpose of this structure is to represent the I/O operations that are being performed on-site, so that the primary domains in the struct are used for relevant information, whereBi_io_vec, bi_vcnt, Bi_idxImportant
These three formed a relationship: Bio-->bi_io_vec,bi_idx (as the base site plus offset, you can easily find the specific Bio_vec)-->page (again through the VEC find page)
Where Bi_io_vec points to an array of BIO_VEC structures that contain all the fragments needed for a particular I/O operation.each bio_vec is a vector of <page,offset,len>, describing a particular fragment: the physical page where the fragment resides, the offset position of the block in the physical page, the block length starting at the given offset, and the entire Bio_io_ The VEC struct array represents a complete buffer.

struct Bio_vec {
struct page *bv_page; point to the physical page where the entire buffer resides
unsigned int bv_len; the size of this buffer in bytes
unsigned int bv_offset; The offset in bytes in the page where the buffer resides.

The Bi_vcnt field is used to describe the number of vectors in the Bio_vec array that Bi_io_vec points to. When the I/O operation is complete, bi_idx points to the current index of the array. A block request is represented by a bio. Each request consists of more than one block, each of which is stored in an array of BIO_VEC structures that describe the actual position of each fragment in the physical page, and are organized like a vector,the first fragment of the I/O operation is pointed to by the B_io_vec struct, and the other fragments are then placed sequentially, with a total of bi_vcnt fragments, and when the I/O layer starts to execute the request, the BI_IDX is constantly updated to always point to the current fragment. Look, this is the most simple concept used in the introductory C language, and the concept of array addressing is similar.

A block device holds a pending block request in the request queue, represented by the request_queue struct, defined in a file, containing a two-way request queue and associated control information. By adding a request to the queue with high-level code such as a file system in the kernel, the queue's block device driver fetches the request from the queue header and then joins it to the corresponding block device, and each item in the Request queue table is a separate request, expressed by the request struct body.

In the queue, request requests, defined in, a request may be to operate multiple contiguous disk blocks, so each request can be composed of more than one bio structure. Each bio structure can describe multiple fragments. Here are some of the more commonly used fields in request.

struct Request {
The struct List_head queuelist;//connects this request to the request queue.
A member of the sector that tracks the hardware completion of the request. The first sector that has not been transferred is stored in Hard_sector, the total number of sectors that have been transferred is hard_nr_sectors, and the number of sectors remaining in the current bio is Hard_cur_ Sectors. These members are intended to be used only in block subsystems, and drivers should not use them.
struct Request_queue *q;
sector_t Hard_sector;
unsigned long hard_nr_sectors;
unsigned int hard_cur_sectors;
The struct bio *bio;//bio is a linked list of bio structures that give this request. You should not directly access this member; Use Rq_for_each_bio (described later) instead.
Unsigned short nr_phys_segments;//the number of unique segments that the request occupies in physical memory after the adjacent page has been merged
Char *buffer;//with in-depth understanding, it can be seen that this member is only the result of the current bio-up bio_data.

And how are the relationships between several key structures? Request_queue is the request queue, which finds the request, connects the requests, and then includes the bio in the request, then finds the corresponding page through the bio struct, and then reads the information from the physical memory through the page. This is basically a relationship.

Block driver steps and instances:

For most block drivers, the first thing to do is register yourself with the kernel! The function of this task is register_blkdev (defined in):
int Register_blkdev (unsigned int major, const char *name);
The parameter is the primary number that the device wants to use and the associated name (the kernel will show it in/proc/devices). If major is passed as 0, the kernel assigns a new primary number and returns it to the caller.
The corresponding function for unregister is: int unregister_blkdev (unsigned int major, const char *name), and the arguments must match those passed to Register_blkdev.
In the 2.6 kernel, the Register_blkdev function has been reduced over time; the only task for this call is to assign a dynamic master number if needed, and create a portal in/proc/devices.

Describes the structure of the virtual device, the structure of the body removed timer_list are described in the previous:
struct SBULL_DEV
int size; The size of the device, in sector units
U8 *data; Data array
Short users;//number of users
Short media_change;//Media Change flag
spinlock_t lock;//User Mutex
struct Request_queue *queue;//device request queue
struct GENDISK *gd;//gendisk structure
struct Timer_list timer;//Analog Media change

static struct Sbull_dev *devices = null;//Request a device
memset (Dev, 0, sizeof (struct sbull_dev));//Request Memory space
Dev->size = nsectors*hardsect_size;//Device Size: 1024*512
Dev->data = Vmalloc (dev->size);

Switch (request_mode) {
Case Rm_noqueue:
Dev->queue = Blk_alloc_queue (Gfp_kernel);
Blk_queue_make_request (Dev->queue, sbull_make_request);
Case Rm_full:
Dev->queue = Blk_init_queue (Sbull_full_request, &dev->lock);
PRINTK (kern_notice "Bad Request mode%d, using simple\n", request_mode);
Case Rm_simple:
Dev->queue = Blk_init_queue (Sbull_request, &dev->lock);
if (Dev->queue = = NULL)
Goto Out_vfree;
Block device drivers written using the bio structure.
static void Sbull_full_request (request_queue_t *q)
struct request *req;
int sectors_xferred;
struct Sbull_dev *dev = q->queuedata;
while ((req = elv_next_request (q)) = NULL) {//Get the next request in the queue
if (! blk_fs_request (req)) {
PRINTK (kern_notice "Skip non-fs request\n");
End_request (req, 0);//use with Elv_next_request to complete a request
sectors_xferred = Sbull_xfer_request (dev, req);//Return quantity
if (! End_that_request_first (req, 1, sectors_xferred)) {//The driver starts at the beginning of the previous end, completing the transfer of the specified number of sectors
Blkdev_dequeue_request (req);//delete a request function from the queue, which must be called when End_that_request_first is transferred
End_that_request_last (req);//notifies any object waiting to be completed and reuses the request structure.
static int sbull_xfer_request (struct Sbull_dev *dev, struct request *req)
struct bio *bio;
int nsect = 0;
Rq_for_each_bio (bio, req) {//a control structure implemented as a macro, traversing each bio in the request
Sbull_xfer_bio (dev, bio);
Nsect + = bio->bi_size/kernel_sector_size;//#define KERNEL_SECTOR_SIZE 512
return nsect;
static int Sbull_xfer_bio (struct sbull_dev *dev, struct bio *bio)
int i;
struct Bio_vec *bvec;
sector_t sector = bio->bi_sector;
Bio_for_each_segment (Bvec, bio, I)//to traverse the pseudo-control structure of the segments that make up the bio structure
Char *buffer = __bio_kmap_atomic (bio, I, KM_USER0);//The underlying function directly maps a buffer in the Bio_vec of the specified index number i.
Sbull_transfer (Dev, Sector, bio_cur_sectors (bio), buffer, bio_data_dir (bio) = = WRITE);//completely simple RAM-based device. Complete the actual transfer.
Bio_cur_sectors is used to access the current segment in the bio structure, and bio_data_dir is used to obtain the size and direction of the bio structure description
Sector + = bio_cur_sectors (bio);
__bio_kunmap_atomic (bio, KM_USER0);
return 0;
static void Sbull_transfer (struct Sbull_dev *dev, unsigned long sector,unsigned long nsect, char *buffer, int write)
unsigned long offset = sector*kernel_sector_size;
unsigned long nbytes = nsect*kernel_sector_size;
if (write)
memcpy (dev->data + offset, buffer, nbytes);
memcpy (buffer, Dev->data + offset, nbytes);
Register_blkdev can be used to obtain a master number, but does not make any disk drives available to the system. There is a separate registration interface you must use to manage separate drives. It is a struct block_device_operations, defined in.
struct Block_device_operations {
Int (*open) (struct inode *, struct file *);//Device Open function
Int (*release) (struct inode *, struct file *);//device shutdown function
Int (*ioctl) (struct inode *, struct file *, unsigned, unsigned long);//Implement the method called by the IOCTL system. Most block-driven IOCTL methods are fairly short.
Long (*unlocked_ioctl) (struct file *, unsigned, unsigned long);//
Long (*compat_ioctl) (struct file *, unsigned, unsigned long);
Int (*direct_access) (struct Block_device *, sector_t,void * *, unsigned long *);
Int (*media_changed) (struct gendisk *);
Called by the kernel to check if the user has changed the method of the media in the drive, and if so returns a non-0 value. Obviously, this method is only suitable for drives that support removable media (and it is best to drive a "media changed" flag); Can be ignored in other cases.
Int (*revalidate_disk) (struct gendisk *);
The Revalidate_disk method is called to respond to a media change; it gives the driver an opportunity to do whatever it needs to make the new media ready for use. This function returns an int value, but the value is ignored by the kernel.
Int (*getgeo) (struct block_device *, struct hd_geometry *);
The struct module *owner;//a pointer to the module that owns the structure; It should often be initialized to This_module.
Continue initialization:
DEV-&GT;GD = Alloc_disk (sbull_minors);//Dynamic Allocation Gendisk structure (table is a separate disk device)
Dev->gd->major = sbull_major;//Set the main device number
Dev->gd->first_minor = which*sbull_minors;//number of secondary device numbers supported Per Device
Dev->gd->fops = &sbull_ops;//Block Operation method
Dev->gd->queue = dev->queue;
Dev->gd->private_data = Dev;
snprintf (Dev->gd->disk_name, +, "sbull%c", which + ' a ');
Set_capacity (DEV-&GT;GD, nsectors* (hardsect_size/kernel_sector_size));
Use Kernel_ to convert a kernel 512-byte sector to the actual sector size.
Add_disk (DEV-&GT;GD);//end the setup process.
See LDD3 's sbull for the remainder of the section

"Turn" bio with block device driver

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.