QoS mechanism for Linux-Dm-ioband (3) __linux

Source: Internet
Author: User

This article explains the mechanism of Ioband

The Ioband principle is simple: the Ioband device sets many group, each group has its own weights and thresholds, and Ioband drives QoS control for these group IO requests. Ioband devices are controlled based on token, and different token are allocated according to group weights. The strategy also includes the request based and sector based token control

Dm-ioband involves several important data structures:

struct Ioband_device: Represents a Ioband block device under the/dev/mapper/that has several ioband_group on it, at least one default group

struct Ioband_group: Represents a group of attached on the Ioband device, each group has different weights, strategies, etc. Ioband_group on the two bio list: C_prio_bios and C_blocked_bios, the former representing higher priority struct bio.

Ioband_device->g_issued[blk_rw_async], Ioband_device->g_issued[blk_rw_sync] represents all blocked of the device, Issued Number of bio

Ioband_group->c_issued[blk_rw_sync] represents the number of bio-blocked for all group

static void Suspend_ioband_device (struct ioband_device*, unsigned long, int) : first set_device_suspended set DEV_ Suspended tags, while set_group_down and set_group_need_up set Iog_going_down and iog_need_up tags. Then Wake_up_all wakes up all the processes waiting on the IOBAND_DEVICE->G_WAITQ and IOBAND_GROUP->C_WAITQ. For already mapped good bio, call Queue_delayed_work + Flush_workqueue to process these bio through the work queue, finally call WAIT_EVENT_LOCK_IRQ, wait for Ioband_ All the bio requests on the device were flush successful. BTW, the realization of the WAIT_EVENT_LOCK_IRQ here is very similar to the condition in pthread.

static void Resume_ioband_device (struct ioband_device* dp): This function clears all dev_suspended, Iog_going_down, Iog_need_ Up tab and wake all functions waiting on the ioband_device->g_waitq_suspend. This ioband_device->g_waitq_suspend can be seen only in Ioband_map, because once Ioband_device is suspend, all of the bio will be hang here.

static void Ioband_group_stop_all (struct ioband_group* head, int suspend): Set iog_suspended,iog_going for all group _down flag, through G_IOBAND_WQ to flush all work queues of the bio

static void Ioband_group_resume_all (struct ioband_group* head): Restoring the above sign bit


Ioband in the architecture of Device Mapper, which is similar stripped snapshot as linear, struct, target_type, is defined as follows

static struct Target_type Ioband_target = {
. Name = "Ioband",
. module = This_module,
. Version = {1, 14, 0},
. ctr = ioband_ctr,
. DTR = IOBAND_DTR,
. Map = Ioband_map,
. End_io = Ioband_end_io,
. Presuspend = Ioband_presuspend,
. Resume = Ioband_resume,
. Status = Ioband_status,
. Message = Ioband_message,
. Merge = Ioband_merge,
. iterate_devices = Ioband_iterate_devices,
};


static int ioband_ctr (struct dm_target *ti, unsigned argc, char **argv)

IOBAND_CTR first calls Alloc_ioband_device to generate a ioband_device ioband device. Alloc_ioband_device first calls Create_workqueue ("Kioband") to create a WORKQUEUE_STRUCT member G_ioband_wq. Initializes a series of Ioband_device member variables, and finally returns a newly created and initialized Ioband_device structure pointer


static void Ioband_dtr (struct dm_target* ti)

Call Ioband_group_stop_all to stop all group requests on Ioband (set Iog_going_down and iog_suspended two flag bits), call Cancel_delayed_work_ Sync to cancel the previous delayed work_struct, call Ioband_group_destroy_all to destroy all group on the Ioband device. Here you can see that the group on the Ioband device is stored in the data structure of the red and black tree, not device mapper used Btree


static int ioband_map (struct dm_target* ti, struct bio* bio, Union map_info* map_context)

Note that Ioband_map is a distinction between synchronous and asynchronous requests, such as the g_issued[2 of ioband_device structures, g_blocked[2], g_waitq[2], ioband_group structure c_waitq[2], is used to differentiate between sync, async request control.

The

Ioband_group is obtained by dm_target->private, and Ioband_device can be obtained by Ioband_group->c_banddev. The following steps are as follows:

If Ioband_device is a suspend state, call WAIT_EVENT_LOCK_IRQ wait for its recovery to call Ioband_group_get, and find the corresponding Ioband_group via the Bio prevent_burst_ The BIOS function is interesting, and my understanding is that if the current kernel thread is executing (Is_urgent_bio seemingly simply implemented, the author thinks that the future bio structure needs to have a control to determine whether it is urgent bio), call Device_should_ Block to determine whether the current device blocking, based on the Io_limit parameter: for synchronization requests if the io_limit is exceeded, all synchronization requests on this device are blocked, consistent for asynchronous request processing and synchronization, if not kernel threads, Call Group_should_block to determine whether the current group is blocked, and whether the group should block, different policy have different ways of judging: for weight based judgment, will eventually call Is_queue_full, The bandwidth based judgment calls Range_bw_queue_full. These two functions are further studied later. If Should_pushback_bio returns True, the bio will be put back into the queue, then return to Dm_mapio_requeue to see ioband_group->c_blocked[2] to determine if the request is blocked. Call Room_for_bio_sync to determine if the io_limit is full, and if it is false, then the bio can be committed, otherwise the call to Hold_bio suspends the bio. The core function of Hold_bio is to call Ioband_device->g_hold_bio. For Ioband_device->g_hold_bio, the function pointer points to Ioband_hold_bio, which simply puts the bio into the Ioband_group->c_blocked_bios queue. (the author thinks that C_blocked_bios should have two, distinguishing between synchronous or asynchronous requests) if the bio can be committed, it will call Ioband_device->g_can_submit, where g_can_submit Depending on the policy of the different methods of judgment, if it is based on weight policy, then G_can_submit will invoke Is_token_left, if it is based on bandwidth policy, then g_can_sUbmit will call Has_right_to_issue, and these two functions will be further studied if G_can_submit returns false, the bio is still unable to commit, and it still goes to Hold_bio. Here is a queue_delayed_work kernel call that will delay the work queue Ioband_device->g_ioband_wq after 1 jiffies, and this task column will call Ioband_device->g_ Conductor->work.func (Work_struct *) If the bio is OK to commit, it will call Prepare_to_issue (struct ioband_group*, struct bio*), The function first adds 1 to the ioband_device->g_issued counter, then calls Ioband_device->g_prepare_bio, which is also a policy-related function. In the policy based on weight, the Prepare_token is downgraded with a range_bw_prepare_token based on the policy of the bandwidth, which is lowered weight-iosize based on policy Prepare_token

static int ioband_end_io (struct dm_target* ti, struct bio* bio, int error, Union map_info* map_context)

Call Should_pushback_bio to determine if the Ioband_group has been suspend, if the suspend is returned directly dm_endio_requeue back into the queue, otherwise if there is blocked bio, Then start the work queue ioband_device->g_ioband_wq; if Ioband_device has suspend, wake up all wait on G_waitq_flush program


static void ioband_conduct (struct work_struct* work)

The Ioband_conduct function is the method called by the kernel work queue delay processing, and the incoming parameter pointer points to the IOBAND_DEVICE->G_CONDUCTOR.WORK structure, which can be passed through this struct work_struct* Get struct Ioband_device. The steps are as follows:

First call Release_urgent_bios, put Ioband_device->g_urgent_bios all into the issue_list list if there is a blocking bio request in Ioband_device, Select a Ioband_group according to a certain strategy, the ioband_goup need to have a blocked bio, while Io_limit is not full. Based on the Ioband_group call Release_bios,release_bios, Release_prio_bios and Release_norm_bios are called separately to put the blocked bio into the Issue_list list. Release_prio_bios operates the bio in Ioband_group->c_prio_bios (if the current group cannot submit the bio, for example, if the token is exhausted, return directly to R_block), For each BIO call Make_issue_list, put it in the issue_list or Pushback_list list, and if the group's c_blocked is 0, you know the block flag for group: Iog_bio_ Blocked_sync/iog_bio_blocked_async, simultaneously wakes up the program waiting on the ioband_group->c_waitq[2]. The last call to Prepare_to_issue. Release_norm_bios operates the bio in Ioband_group->c_blocked_bios, the number of such bio is Nr_blocked_group (ioband_group*)-ioband_group- >c_prio_blocked, the rest of the code and Release_prio_bios exactly the same, not much said if Release_bios returned R_yield, this time that the group has used all the token, need to submit The priority of the bio is given out, and the queue_delayed_work is tuned again, waiting for the next time to process the bio request for all blocks prior to device, to start the resubmit process, and first clear the blocking flag on the Ioband_device Dev_bio_ Blocked_sync/dev_bio_blocked_async, and wake up all the code waiting on wait_queue_head_t ioband_device->g_waitq[2] IfAt this point Ioband_device also has block bio, and after the above code issue_list still empty, at this time basically all group token exhausted, rejoin the work queue to wait for the next execution finally for Issue_ All of the bio in the list, call the regular method generic_make_request to the underlying block device and, for all the pushback_list's bio, call Ioband_end_io to end the bio request (most of which returns a Eio error)


----------------------------------------------------Gorgeous split Line---------------------------------------------------


The following study of the policy in Dm-ioband, in ioband_ctr, calls Policy_init to initialize the specified policy. At present Ioband has the following several POLICY:DEFAULT,WEIGHT,WEIGHT-IOSIZE,RANGE-BW.

Weight policy: A strategy for assigning a bio based on weight, and the following methods are analyzed

DP->G_GROUP_CTR/DB->G_GROUP_DTR = POLICY_WEIGHT_CTR/POLICY_WEIGHT_DTR: Create/Destroy a weight based group.

Dp->g_set_param = Policy_weight_param: Call Set_weight, Init_token_bucket, etc. to set the weight value. From the implementation of Set_weight we can see that Ioband_group is organized in Rbtree way, if ioband_group->c_parent = = NULL, then this is default Group or the root group of the new group type, so use the Ioband_device argument: G_root_groups,g_token_bucket,g_io_limit to initialize, otherwise this ioband_ Group is another Ioband_group child (this configuration is rarely seen) and is therefore configured with Ioband_group parameters.

Dp->g_should_block = is_queue_full: Whether or not to exceed the ioband_group->c_limit to determine whether the queue is full

Dp->g_restart_bios = Make_global_epoch: The ioband_group on this ioband_device device has exhausted token, calling this function to redistribute a new round of token.

dp->g_can_submit = Is_token_left: To see if there are any token remaining (by calculating iopriority), you can first view the Ioband_group->c_token, Second, you can see if this Iobnad_group's epoch is behind the entire Ioband_device epoch (epoch plus a indicates another token refresh, is a self increment), if it is, All of the new token that have been added to these epoch can be filled (nr_epoch*ioband_group->c_token_initial), recalculated iopriority again, and returned. PS, we can also see here that the priority of an IO request is related to the number of token remaining in the group and the initial token number of the group, and the advantage is not to starve the IO requests on the lower token group.

Dp->g_prepare_bio = Prepare_token: For weight policy, each IO request consumes one token. Prepare_token will call Consume_token,consume_token will update ioband_group->g_dominant and ioband_group->g_expired, At the same time minus Ioband_group->c_token, plus ioband_group->c_consumed, the value is 1. Here is a g_yield_mark, used to gracefully out of Io, here is not much to say, detailed content please go to see the source code.




Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.