The Adventures of the Linux kernel MD source explanation # 14RAID5 non-ribbon read

Source: Internet
Author: User

The Adventures of the Linux kernel MD source explanation # 14RAID5 non-ribbon readReprint Please specify Source: Http://blog.csdn.net/liumangxiong
Suppose the non-bar reads. Then at least two bars of reading are involved, which requires reading the data separately from the two bars. Then the whole result is returned to the upper layer. Next we'll see how to split a complete bio read request into multiple sub-requests to disk, return from disk and then combine the request results back to the top.
4097     logical_sector = bi->bi_sector & ~ ((sector_t) stripe_sectors-1); 4098     last_sector = bi->bi_ Sector + (BI->BI_SIZE>>9); 4099     bi->bi_next = null;4100     bi->bi_phys_segments = 1;     /* over-loaded to count active stripes */

First, the request starting position is calculated, because the minimum unit of the MD issued to the disk data request is stripe_sectors, so the request is aligned here. The requested start position is logical_sector and the end position is last_sector. The 4100-row multiplexing bi_phys_segments is used to count the number of strips issued, which prevents accidental release from being set to 1 first.

4102 for     (; logical_sector < last_sector; Logical_sector + = stripe_sectors) {4103          define_wait (w); 4104          int previous;41054106     retry:4107          previous = 0;4108          prepare_to_wait (&conf->wait_for_overlap, &w, task_uninterruptible); 41344135          new_sector = raid5_compute_sector (conf, logical_sector,4136                                previous,4137                                &dd_idx, NULL); 4138          pr_debug ("raid456:make_request, Sector%llu logical%llu\n", 4139               (unsigned long long) new_sector,4140               (unsigned long Long) logical_sector) 41414142          sh = get_active_stripe (conf, new_sector, previous,4143                           (Bi->bi_rw&rwa_mask), 0);

In this loop, the request is split into multiple bands, each issued a command.

There is a need to be mutually exclusive when dealing with bands. No two threads can operate the same stripe at the same time.

Let's say the synchronization thread is synchronizing this stripe, and raid5d is writing this stripe, then it will produce unintended results.

4103 lines. The wait queue is used for stripe access to mutually repel 4108 rows, increasing the waiting queue by 4135 rows. The physical offset sector of the disk is computed according to the array logical sector, and the corresponding data disk number and the check disk number 4142 lines are computed, and a stripe is obtained according to the physical offset sector of the disk.

4144 if (SH) {.... 4186 if (Test_bit (stripe_expanding, &sh->state) | |                    4187!add_stripe_bio (SH, Bi, DD_IDX, rw)) {4188/* stripe is busy expanding or4189  * Add failed due to overlap. Flush everything4190 * And wait a while4191 */4192 Md_wakeup_thre                    AD (mddev->thread); 4193 release_stripe (SH); 4194 schedule (); 4195               Goto retry;4196}4197 finish_wait (&conf->wait_for_overlap, &w); 4198               Set_bit (Stripe_handle, &sh->state); 4199 clear_bit (stripe_delayed, &sh->state); 4200 if ((BI-&GT;BI_RW & Req_sync) &&4201!test_and_set_bit (stripe_preread_active, &sh-&gt (state)) 4202 Atomic_inc (&conf->preread_active_stripes); 4203 Release_stripe_plug (MD DEV, SH); 4204} else {4205/* Cannot get stripe for read-ahead, just give-up */4206 cl               Ear_bit (Bio_uptodate, &bi->bi_flags); 4207 finish_wait (&conf->wait_for_overlap, &w); 4208 break;4209}4210}

The first time you look at this code. Because it was too hasty to find out where the focus was. Like a person in the noisy city grew up, because by the city's appearance is confused completely do not know the heart really want to pursue life. When it's time to really calm down and see. Finally found the most important sentence in 4187 lines, that is, the Add_stripe_bio function, from this start stripe no longer lonely, because of the possession of bio. It is ready to add a strip processing process, and a vigorous strip of people's path unfolds.

A new stripe formally increases the processing queue (conf->handle_list) after 4198 rows and 4203 lines of Release_stripe_plug.

The last half of man is constantly looking for the entrance. The next half of the life is constantly looking for exports. Here, read stripe found the entrance, then where is the exit? Students who have read LDD must know the answer, for block device drivers that do not use the default request queue. The corresponding Make_request function is the entry. The exit is Bio_endio. The next step is to move towards this exit. Release_stripe_plug after the first entry is Handle_stripe,handle_stripe call analyse_stripe, in this function set the To_read:

3245          if (Test_bit (R5_wantfill, &dev->flags)) 3246               s->to_fill++;3247          else if (dev->toread) 3248               s->to_read++;

Return to the Handle_stripe function:
3472     if (s.to_read | | s.non_overwrite3473         | | (Conf->level = = 6 && s.to_write && s.failed) 3474         | | (S.syncing && (s.uptodate + s.compute < disks)) 3475         | | s.replacing3476         | | s.expanding) 3477          Handle_stripe_fill (SH, &s, disks);

To_read triggers the Handle_stripe_fill, the function of which is to set the flags that need to be read:
2696               set_bit (r5_locked, &dev->flags); 2697               set_bit (R5_wantread, &dev->flags); 2698               S >locked++;

Then came the ops_run_io and sent the read request to disk. The callback function for the read request is raid5_end_read_request:
1745     if (uptodate) {1746          set_bit (r5_uptodate, &sh->dev[i].flags); .... 1824     rdev_dec_pending (Rdev, Conf->mddev); 1825     clear_bit (r5_locked, &sh->dev[i].flags); 1826     set_bit (Stripe_handle, &sh->state); 1827     release_stripe (SH);

This function does two things. One is to set the R5_uptodate flag, and one is called Release_stripe again to send the strip back to the handle_stripe processing. Enter the Analyse_stripe function with the r5_uptodate sign:
3231          if (Test_bit (R5_uptodate, &dev->flags) && dev->toread &&3232              !test_bit ( Stripe_biofill_run, &sh->state)) 3233               set_bit (R5_wantfill, &dev->flags); 32343235          * * now Count Some things */3236          if (Test_bit (r5_locked, &dev->flags)) 3237               s->locked++;3238          if (Test_bit (R5 _uptodate, &dev->flags)) 3239               s->uptodate++;3240          if (Test_bit (R5_wantcompute, &dev->flags ) {3241               s->compute++;3242               bug_on (S->compute > 2); 3243          }32443245          if (Test_bit (r5_ Wantfill, &dev->flags)) 3246               s->to_fill++;

The R5_wantfill flag is set on line 3255. Set the To_fill in line 3246 and come back again Handle_stripe:
3426     if (S.to_fill &&!test_bit (Stripe_biofill_run, &sh->state)) {3427          set_bit (stripe_op_ Biofill, &s.ops_request); 3428          set_bit (Stripe_biofill_run, &sh->state); 3429     }

Strip state set the Stripe_op_biofill, only to set the s.ops_request. You must immediately know that the corresponding processing function for this domain is raid_run_ops, and the actual operation is in __raid_run_ops:
1378     if (Test_bit (Stripe_op_biofill, &ops_request)) {1379          ops_run_biofill (SH); 1380          overlap_clear+ +;1381     }

The corresponding handler function is Ops_run_biofill:
812static void Ops_run_biofill (struct stripe_head *sh) 813{814 struct Dma_async_tx_descriptor *tx = null;815 struct Async_submit_ctl submit;816 int i;817818 pr_debug ("%s:stripe%llu\n", __func__,819 (unsigned long Long) Sh->sector); 820821 for (i = sh->disks; i--;)               {822 struct R5dev *dev = &sh->dev[i];823 if (test_bit (R5_wantfill, &dev->flags)) {824 struct Bio *rbi;825 Spin_lock_irq (&sh->stripe_lock); 826 Dev->read = RBI =               dev->toread;827 Dev->toread = null;828 Spin_unlock_irq (&sh->stripe_lock); 829                    while (RBI && rbi->bi_sector <830 Dev->sector + stripe_sectors) {831                    tx = Async_copy_data (0, RBI, dev->page,832 Dev->sector, TX); 833 RBI = R5_next_bio (RBI, Dev->sector); 834}835}836}837838     Atomic_inc (&sh->count); 839 Init_async_submit (&submit, Async_tx_ack, TX, Ops_complete_biofill, SH, NULL ); 840 Async_trigger_callback (&submit); 841}

Finally see the truth, can not help feeling the code is so wrapped layer after layer, like a magical birthday gift to open a layer and layer of packaging, and like the Old Alley Alley through a together and a talent to find the liquor store son. But no matter what, the code is completely reserved for you. Sincere.

And the more complex code on the more amorous feelings of all kinds, graceful, if you want to know how to walk into her inner talent experience to get. When you really feel the time you will simply astounding, so that the thrill of conquest can not be forgotten. After conquering the code of this one-and-a-kind style. It is art that your quest is no longer confined to the body, but instead pursues a spiritual height, designing the cathedral like a European architect, and then building the Gothic Cologne Cathedral for more than 600 years.

Well, at that time you and I are no longer, but that spirit is always you and I want to pursue the situation. 823 lines, we've just finished reading the disk, which copies the read data from the cache to the Dev->page, and at this point the Dev->toread is also transferred to Dev->read. This first constructs the DMA descriptive descriptor, 839 and 840 submits the request to the DMA, and then calls back to the 839 incoming argument Ops_complete_biofill after the request is completed:

769static void Ops_complete_biofill (void *stripe_head_ref) 770{771 struct Stripe_head *sh = stripe_head_ref;772 Str UCT bio *return_bi = null;773 int i;774775 pr_debug ("%s:stripe%llu\n", __func__,776 (unsigned long long ) sh->sector); 777778/* Clear completed Biofills */779 for (i = sh->disks; i--;) {780 struct R5dev *dev = &sh->dev[i];781782 */Acknowledge completion of a Biofill operation */78 3/* And check if we need to reply to a read request,784 * New R5_wantfill Requests is held off until78 5 *!               stripe_biofill_run786 */787 if (Test_and_clear_bit (R5_wantfill, &dev->flags)) {788 struct Bio *rbi, *rbi2;789790 bug_on (!dev->read); 791 RBI = dev->read;792 D Ev->read = null;793 while (RBI && rbi->bi_sector <794 Dev->sector + S                  tripe_sectors) {795  RBI2 = R5_next_bio (RBI, dev->sector); 796 if (!raid5_dec_bi_active_stripes (RBI)) {797                    Rbi->bi_next = return_bi;798 Return_bi = rbi;799}800 RBI = rbi2;801}802}803}804 clear_bit (Stripe_biofill_run, &sh->state); 8058 Return_io (RETURN_BI); 807808 set_bit (Stripe_handle, &sh->state); 809 release_stripe (SH); 810}

Assuming you have trained Yimushihang's fire eyes, you must have seen the 806 rows of Return_io, yes. This is the exit I mentioned earlier:
177static void Return_io (struct bio *return_bi) 178{179     struct Bio *bi = return_bi;180 while     (BI) {181182          Return_bi = bi->bi_next;183          bi->bi_next = null;184          bi->bi_size = 0;185          bio_endio (bi, 0); 186          Bi = return_bi;187     }188}

Finally see Bio_endio, Happy Bar to celebrate a drink.

Is the party enough? Next there are two study questions: 1) return_bi why not a bio. And a bi_next? 2) Since the return_io is over. 808/809 rows Why do you want to add to the list again? Reprint Please specify Source: Http://blog.csdn.net/liumangxiong

Copyright notice: This article blog original article. Blogs, without consent, may not be reproduced.

The adventures of the Linux kernel MD source explanation # 14RAID5 non-ribbon read

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.