A legendary life of IO (6 )--

Last Update:2013-12-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

IO Scheduler

When I/O traveled to the scheduler, I found that I was treated very differently. Some I/O relied on the privilege to quickly release it. Some I/O was not able to handle it for a long time, and even starved to death in some systems! In the face of this situation, I/O is obviously very unhappy. Why can someone else be quickly sent to the next journey? Do I need to spend time on the scheduler? This is the age of competition, people are well-developed, people are reading requests, people can quickly get resources. We are writing requests. We are born from poverty and can only wait, but it will not let you starve to death. This is our common deadline policy. In a worse place, deadline does not exist. It is a matter of family blood, and adjacent IO can be quickly processed. Wait for others, and then there will be starvation. This is what we often call the noop strategy, which is actually a Linus elevator. In a civilized society, everyone will be fair. From the perspective of applications, everyone will enjoy the same I/O bandwidth. However, from the perspective of I/O individuals, there is still no fairness. There is no absolute fairness in this society. As long as the fairness of all families is guaranteed, society will be more harmonious. Of course, we find that some household applications are not particularly gregative, and we can also punish them so that the distribution of I/O bandwidth will be reduced. This is our common CFQ policy. At the IO scheduler layer, there can be many policies. Different systems can define different policies to better aggregate IO and control QOS for different applications.

In Linux, you can register your own scheduling algorithm. If you do not register your own scheduler, you can use one of the three schedulers mentioned above. Among them, deadline is developed on the basis of the Linus elevator. It schedules read and write requests differently and will also consider IO hunger. The most traditional Scheduler cannot avoid IO hunger. The CFQ scheduler considers the fairness of the application and can obtain the best performance in many cases. The design comparison of the three schedulers will be elaborated in the following chapter.

When I/O requests are forwarded through generic_make_request, if the accessed device is a block device with a queue, the system will call the blk_queue_bio function to schedule and merge bio. The blk_queue_bio function is described as follows:

Void blk_queue_bio (struct request_queue * q, struct bio * bio) {const bool sync = !! (Bio-> bi_rw & REQ_SYNC); struct blk_plug * plug; int el_ret, rw_flags, where = ELEVATOR_INSERT_SORT; struct request * req; unsigned int request_count = 0; /** low level driver can indicate that it wants pages above a * certain limit bounced to low memory (ie for highmem, or even * ISA dma in theory) */blk_queue_bounce (q, & bio); if (bio-> bi_rw & (REQ_FLUSH | REQ_FUA) {spin_lock_irq (q-> queue_lock); where = ELEV ATOR_INSERT_FLUSH; goto get_rq;}/** Check if we can merge with the plugged list before grabbing * any locks. * // * try to merge bio into the current plugged Request queue */if (attempt_plug_merge (q, bio, & request_count) return; spin_lock_irq (q-> queue_lock ); /* elv_merge is the core function. Find the bio forward or backward merged Request */el_ret = elv_merge (q, & req, bio); if (el_ret = ELEVATOR_BACK_MERGE) {/* perform backward merge operations */if (bio_attempt_back_merge (q, req, bio) {if (! Attempt_back_merge (q, req) elv_merged_request (q, req, el_ret); goto out_unlock;} else if (el_ret = ELEVATOR_FRONT_MERGE) {/* perform the forward merge operation */if (bio_attempt_front_merge (q, req, bio) {if (! Attempt_front_merge (q, req) elv_merged_request (q, req, el_ret); goto out_unlock;}/* the corresponding request cannot be found to merge */get_rq: /** This sync check and mask will be re-done in init_request_from_bio (), * but we need to set it earlier to expose the sync flag to the * rq allocator and io schedulers. */rw_flags = bio_data_dir (bio); if (sync) rw_flags | = REQ_SYNC;/** Grab a free request. this is might sleep but can not fail. * Re Turns with the queue unlocked. * // * Get an empty request */req = get_request_wait (q, rw_flags, bio); if (unlikely (! Req) {bio_endio (bio,-ENODEV);/* @ q is dead */goto out_unlock;}/** After dropping the lock and possibly sleeping here, our request * may now be mergeable after it had proven unmergeable (abve ). * We don't worry about that case for efficiency. it won't happen * often, and the elevators are able to handle it. * // * use bio to initialize the request */init_request_from_bio (req, bio); if (test_bit (QUEUE_FLAG_SAME_COMP, & Q-> queue_flags) req-> cpu = raw_smp_processor_id (); plug = current-> plug; if (plug) {/** If this is the first request added after a plug, fire * of a plug trace. if others have been added before, check * if we have multiple devices in this plug. if so, make a * note to sort the list before dispatch. */if (list_empty (& plug-> list) trace_block_plug (q); else {if (! Plug-> should_sort) {struct request * _ rq ;__ rq = list_entry_rq (plug-> list. prev); if (_ rq-> q! = Q) plug-> should_sort = 1;} if (request_count> = BLK_MAX_REQUEST_COUNT) {/* when the number of requests reaches the upper queue limit, perform the unplug operation */blk_flush_plug_list (plug, false ); trace_block_plug (q) ;}/ * Add the request to the queue */list_add_tail (& req-> queuelist, & plug-> list); drive_stat_acct (req, 1 );} else {/* in the new kernel, if the user does not call start_unplug, It is not merged in IO scheduler. Once added to the request queue, the unplug operation is immediately executed, I personally think this is a bit inappropriate. It is not as good as the previous regular scheduling mechanism. For the ext3 file system, you must first execute the start_unplug operation when flushing the page cache. Therefore, both the request and bio operations will be performed. */Spin_lock_irq (q-> queue_lock);/* Add the request to the scheduler */add_acct_request (q, req, where ); /* call the underlying function to execute the unplug operation */_ blk_run_queue (q); out_unlock: spin_unlock_irq (q-> queue_lock );}}

The blk_queue_bio function mainly performs the following three tasks:

1) perform the request backward merge operation

2) perform forward merge operations on requests

3) If the request cannot be merged, create a request for bio and schedule it.

The most critical function in bio merging is elv_merge. This function is mainly used to determine whether bio can be backward merged or backward merged. For all schedulers, the logic of backward merge is the same. A request hash table is maintained in the system, and the starting address of the bio request is used for hash addressing. The principle of generating a Hash table is relatively simple, that is, to classify the end addresses of all requests into several intervals, and then use the hash function to address these intervals. The Hash function is:

Hash_long (ELV_HASH_BLOCK (sec), elv_hash_shift)

Once all the requests in this range are found through the hash function, the request is matched by traversal. The implementation functions of this process are as follows:

Static struct request * elv_rqhash_find (struct request_queue * q, sector_t offset) {struct elevator_queue * e = q-> elevator; /* use the hash function to find all requests in the interval */struct hlist_head * hash_list = & e-> hash [ELV_HASH_FN (offset)]; struct hlist_node * entry, * next; struct request * rq;/* traverse all requests within the address range */hlist_for_each_entry_safe (rq, entry, next, hash_list, hash) {BUG_ON (! ELV_ON_HASH (rq); if (unlikely (! Rq_mergeable (rq) {__ elv_rqhash_del (rq); continue;}/* if the address matches, locate the required request */if (rq_hash_key (rq) = offset) return rq;} return NULL ;}

To maintain a request in hash mode, note that after a request is merged, You need to relocate the request in the hash table. This is mainly because the end address of the request has changed and may exceed the range of a hash range.

If backward merge fails, the scheduler tries to merge backward. Not all schedulers support forward merge. If the scheduler supports this method, You need to register the elevator_merge_fn function to implement the forward scheduling function. For example, the deadline algorithm uses a red/black tree to implement forward scheduling. If forward scheduling cannot complete merging. In this case, the scheduler considers that the merge operation fails. A new request must be generated, initialized using the existing bio, and then added to the request queue for scheduling.

When I/O comes to the block device layer using generic_make_request, the main function blk_queue_bio for processing is to merge I/O. Because different schedulers have different merging methods and IO classification methods, the algorithm of the specific scheduler is implemented by function registration. Blk_queue_bio is only an upper-layer function. It is mainly used to merge and call the scheduler method before merging and initialize request preparation scheduling.

This article is from the "Storage path" blog. For more information, contact the author!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A legendary life of IO (6 )--

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

A legendary life of IO (6 )--

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support