Bio and block Device Drivers

Source: Internet
Author: User

Bio and block Device DriversDevices that can randomly access chunks in the system are called Block devices, which are called blocks. Block device files are used to install the file system. This is also the common access method for Block devices. Block devices are randomly accessed, that is, they can jump from one location to another. The access location of Block devices must be able to move before and after different intervals of the media. Therefore, in fact, the kernel does not have to provide a dedicated subsystem to manage character devices, but the management of Block devices must have a dedicated subsystem to provide services. The smallest addressable unit among Block devices is the sector. The sector size is generally an integer multiple of 2, and the most common size is 512 bytes. The size of the slice is the physical attribute of the device, and the slice is the basic unit of all Block devices. Block devices cannot perform addressing and operations on smaller units than they are, many block devices can transmit multiple slices at a time. Block is an abstraction of a file system-you can only access the file system based on blocks. Although the addressing of physical disks is based on the sector level
All disk operations on the row are performed by block. Because the slice is the smallest addressable unit of the device, the block size cannot be smaller than the slice size and can only be several times the slice size. The kernel also requires that the block size be an integer multiple of 2 and cannot exceed the length of one page. Therefore, the most important requirement for block size is that it must be an integer multiple of the slice size and be smaller than the page size. Therefore, the size is usually 512 bytes, 1 K or 4 K.

Before linux2.5, when a block is transferred to the memory, it should be stored in a buffer zone. Each buffer zone corresponds to a block, which is equivalent to the disk block in the memory, because the kernel needs some related control information when processing data, each buffer has a corresponding descriptor. This descriptor is represented by a buffer_head struct, also known as a buffer header.

struct buffer_head
{

Unsigned
Long B _state;
// Buffer status flag

Struct buffer_head
* B _this_page;
// Buffer in the page

Struct page * B _page;
// Page for storing the buffer

Sector_t B _blocknr;
// Logical block number

Size_t B _size;
// Block size

Char * B _data;
// Buffer in the page

Struct block_device
* B _bdev; // block Device

Bh_end_io_t * B _end_io;
// I/O Completion Method

Void * B _private;
// Complete method Data

Struct list_head B _assoc_buffers;
// Related ing list
/* Mapping this buffer is associated */
Struct address_space
* B _assoc_map;

Atomic_t B _count;
// Buffer usage count

};

The B _state field indicates the buffer state. Valid signs are stored in the bh_state_bits enumeration and are defined in <Linux/buffer_head.h>

Enum bh_state_bits {
Bh_uptodate, which contains available data
Bh_dirty, the buffer zone is dirty (the cache content is newer than the block content in the disk, so the buffer content must be written back to the disk)
Bh_lock, which is used by I/O operations and locked to prevent concurrent access
Bh_req, the buffer has I/O Request operations
Bh_uptodate_lock,
Bh_mapped, which is an available buffer mapped to disk blocks.
Bh_new, this buffer is mapped through get_block (0, and cannot be accessed
Bh_async_read, which is used by the asynchronous I/O read operation through end_buffer_async_read ().
Bh_async_write, which is used by the asynchronous I/O write operation through end_buffer_async_write ().
Bh_delay, the buffer has not been associated with the disk Block
Bh_boundary, the buffer is in the boundary of the continuous block area -- the next block is no longer consecutive.
Bh_write_eio,
Bh_ordered,
Bh_eopnotsupp,
Bh_unwritten,
Bh_privatestart,
}; The driver can safely define its own status flag in these BITs, as long as the custom status flag does not conflict with the dedicated bits of the block I/O layer.
In B _count, this indicates the number of buffers used. Two functions are used to increase or decrease the number:
Get_bh (struct buffer_head * BH) --> atomic_inc (& bh-> B _count)
Put_bh (struct buffer_head * BH) --> atomic_dec (& bh-> B _count)

A block Device Driver accesses a device by transmitting random data of a fixed size. Efficient block device drivers are strictly required in terms of performance, not only in the read and write operations of user applications. The modern operating system uses virtual memory to transfer unwanted data to other storage media such as disks. The block driver is a pipeline between the core memory and other storage media, therefore, they can be considered as part of the virtual memory subsystem. A data block specifies a fixed size of data, and the size value is determined by the kernel. The size of the data block is usually 4096 bytes, however, it can be changed based on the architecture and the file system used. A data block corresponds to a sector, which is determined by the underlying hardware. The size of the device sector processed by the kernel is 512 bytes. If the user's device uses different sizes, you need
The kernel is modified to avoid hardware I/O requests that cannot be processed. At any time, the kernel provides users with a sector number, which is 512 bytes in size. If you want to use different hardware sector sizes, you can modify the kernel sector accordingly. Similarly, this part is composed of many data structures and corresponding methods. Let's look at the relevant data structure first:

The kernel uses the gendisk structure to represent an independent disk device. The kernel also uses the gendisk structure to represent partitions. In this structure, many Members must be initialized by the driver. This structure is defined in <Linux/genhd. h>


Struct gendisk {
Int Major;
// Master device number

Int first_minor;
// The first slave device number

Int minors;
/* Describes the member of the device number used by the disk. A drive must contain at least one serial number. if your driver is partitioned, but (and most should be), you need to assign a number to each possible partition. the common value of the second serial number is 16, which allows 15 partitions in the "full disk" device box. some disk drivers use 64 numbers for each device. */

Char disk_name [32];
// It should be set as a member of the disk drive name. It appears in/proc/partitions and sysfs.

Struct hd_struct ** part;
/* [Indexed by minor] */
Struct block_device_operations
* Fops; // a set of device operations.

Struct request_queue
* Queue; // The structure of the I/O Request Used by the kernel to manage the device;

Void * private_data; // The Block driver can use this member as a pointer to their own internal data.

Sector_t capacity;
// The size of the drive, measured in 512-byte sectors. The sector_t type can be 64-bit width. The driver should not directly set this member. On the contrary, the number of sectors is transferred to set_capacity.

Int flags;
// A flag (rarely used) that describes the status of the drive. if your device has removable media, you should set the GENHD_FL_REMOVABLE.CD-ROM drive to enable genhd_fl_cd. if you do not need partition information to appear in/proc/partitions for some reason, set genhd_fl_suppress_partitions_info.

Struct device * driverfs_dev;
// Fixme: Remove

Struct device dev;
Struct kobject * holder_dir;
Struct kobject * slave_dir;
Struct timer_rand_state
* Random;
Int policy;
Atomic_t sync_io;
/* Raid */
Unsigned long stamp;
Int in_flight;
# Ifdef config_smp
Struct disk_stats
* Dkstats;
# Else
Struct disk_stats dkstats;
# Endif
Struct work_struct async_policy;
};

This structure is a dynamic allocation structure. Some special kernel processing is required for initialization. the driver cannot dynamically allocate the structure, but must call it.
Struct gendisk * alloc_disk (INT minors); // The number of sub-device numbers. Since then, the minors member cannot be changed. This structure is dynamically allocated.
Void del_gendisk (struct gendisk * Gd); // detach the disk. A parameter is a reference counting structure that contains a kobject object.
Void add_disk (struct gendisk * Gd); // initialize the schema function. Once this function is called, the device will be activated and will call the methods it provides at any time. Do not call this function before the driver is fully initialized and can request a disk.

When the kernel decides to input and output block data from block I/O devices in the form of a file system, virtual memory subsystem, or system call, it will combine a bio structure, used to describe this operation. This structure is passed to the I/O Code. The Code combines it into an existing request structure, or creates a new request structure as needed. The bio structure contains all the information about the request executed by the driver, instead of being associated with the process that initializes the user space of the request.

The basic container for block I/O operations in the kernel is represented by the bio struct, which is defined in <Linux/Bio. h>, this struct represents the block I/O operations that are being organized (active) in the form of a segment linked list. A segment is a small continuous memory buffer. The advantage is that a single buffer must be continuous. Therefore, we use fragments to describe the buffer. Even if a buffer is scattered across multiple locations in the memory, the Bio struct can guarantee the execution of I/O operations on the kernel, this is called aggregation I/O.
Bio is the main data structure of the General layer. It describes both the disk location and the memory location, and is the connection link between the upper kernel VFS and the lower driver.

struct bio
{

// The first (512 bytes) sector to be transmitted for the bio structure:Disk location
Sector_t bi_sector;

Struct bio
* Bi_next;
// Request linked list

Struct block_device
* Bi_bdev; // related Block devices

Unsigned long bi_flags // status and command flag

Unsigned long bi_rw;
// Read/write

Unsigned short bi_vcnt; // Number of bio_vesc Offsets

Unsigned short bi_idx;
// Current index of bi_io_vec

Unsigned short bi_phys_segments; // Number of merged fragments

Unsigned short bi_hw_segments; // Number of remapped fragments

Unsigned int bi_size;
// I/O count

Unsigned int bi_hw_front_size; // the size of the first merged segment;

Unsigned int bi_hw_back_size; // the size of the last merged segment

Unsigned int bi_max_vecs;
// Maximum number of bio_vecs

Struct bio_vec * bi_io_vec;
// Bio_vec linked list:Memory location

Bio_end_io_t * bi_end_io; // I/O Completion Method

Atomic_t bi_cnt; // count

Void * bi_private;
// Private method of the owner

Bio_destructor_t * bi_destructor;
// Destruction method

};

The purpose of this struct is mainly to represent the I/O operations being performed on the site. Therefore, the main fields in this struct are used for relevant information, while bi_io_vec, bi_vcnt, and bi_idx are important.
These three forms a relationship: bio --> bi_io_vec, bi_idx (just like the base address plus offset, you can easily find the specific bio_vec) --> page (then find the page through VEC)
Bi_io_vec points to a bio_vec struct array, which contains all the fragments required for a specific I/O operation. Each bio_vec is a vector of <page, offset, Len>, describing a specific segment: the physical page where the segment is located, and the offset position of the block on the physical page, the length of the block starting from the given offset. The entire bio_io_vec struct array represents a complete buffer.

Struct bio_vec {
Struct page * bv_page; points to the physical page where the entire buffer resides
Unsigned int bv_len; the buffer size in bytes
Unsigned int bv_offset; the offset in bytes on the page where the buffer resides.
};

The bi_vcnt field is used to describe the number of vectors in the bio_vec array pointed to by bi_io_vec. After the I/O operation is complete, bi_idx points to the current index of the array. A block request is represented by a bio. Each request contains multiple or one block, which are stored in the array of the bio_vec struct. These structures describe the actual location of each segment on the physical page, and like vectors, the first segment of the I/O operation is pointed by the B _io_vec struct, and other segments are placed in sequence. There are bi_vcnt fragments in total, when the I/O layer starts to execute the request and each segment needs to be used, bi_idx will be constantly updated to always point to the current segment. Look, this is the simplest concept used in the C language, the concept of array addressing is similar.

The block device stores the pending block requests in the Request queue, which is represented by the request_queue struct and defined in the file <Linux/blkdev. h> contains a two-way Request queue and related control information. The request is added to the queue through high-level Code such as the file system in the kernel. As long as the request queue is not empty, the corresponding block Device Driver of the queue will obtain the request from the queue header, add it to the corresponding block device. Each item in the Request team list is a separate request, represented by the request structure.

Request in the queue is defined in <Linux/blkdev. h>. A request may have to operate on multiple consecutive disk blocks. Therefore, each request can be composed of multiple bio structures. Each bio structure can describe multiple segments. The following are the common request fields.

Struct request {
Struct list_head queuelist; // connect the request to the Request queue.
// Trace the members of the hardware sector. the first sector that has not been transferred is stored in hard_sector. The total number of transferred sectors is in hard_nr_sectors, and the number of remaining sectors in the current bio is hard_cur_sectors. these members are intended to be used only in the block subsystem; drivers should not use them.
Struct request_queue * q;
Sector_t hard_sector;
Unsigned long hard_nr_sectors;
Unsigned int hard_cur_sectors;
Struct bio * bio; // Bio is a linked list of the Bio structure of the request. You should not directly access this member; Use rq_for_each_bio (described later) instead.
Unsigned short nr_phys_segments; // The number of unique segments occupied by the request in the physical memory. After the adjacent pages are merged
Char * buffer; // with deep understanding, we can see that this member is only the result of calling bio_data on the current bio.
};

What is the relationship between several key structures? In request_queue, a request queue is used to locate the request, connect these requests into one, and then include bio in the request, and then find the corresponding page through the bio struct, then, the page is used to read information in the physical memory. This is basically the relationship.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.