The Linux kernel Adventures of the MD source code interpretation of 14 RAID5 non-bar read

Source: Internet
Author: User

The Linux kernel Adventures of the MD source code interpretation of 14 RAID5 non-bar readReprint Please specify Source: Http://blog.csdn.net/liumangxiong
If it is read in a non-bar, then at least two bars of reading are involved, it is necessary to read the data separately from the two bars, and then the whole result is returned to the upper layer. Next we will see how to split a complete bio read request into multiple sub-requests to the disk, return from the disk and then regroup the request results back to the upper layer.
4097     logical_sector = bi->bi_sector & ~ ((sector_t) stripe_sectors-1); 4098     last_sector = bi->bi_ Sector + (BI->BI_SIZE>>9); 4099     bi->bi_next = null;4100     bi->bi_phys_segments = 1;     /* over-loaded to count active stripes */

The start of the request is calculated first, as the minimum unit of the MD issued to the disk data request is stripe_sectors, so the request is aligned here. The requested start position is logical_sector and the end position is last_sector. The 4100-row multiplexing bi_phys_segments is used to count the number of strips issued, which prevents accidental release from being set to 1 first.
4102 for     (; logical_sector < last_sector; Logical_sector + = stripe_sectors) {4103          define_wait (w); 4104          int previous;41054106     retry:4107          previous = 0;4108          prepare_to_wait (&conf->wait_for_overlap, &w, task_uninterruptible); 41344135          new_sector = raid5_compute_sector (conf, logical_sector,4136                                previous,4137                                &dd_idx, NULL); 4138          pr_debug ("raid456:make_request, Sector%llu logical%llu\n", 4139               (unsigned long long) new_sector,4140               (unsigned long Long) logical_sector) 41414142          sh = get_active_stripe (conf, new_sector, previous,4143                           (Bi->bi_rw&rwa_mask), 0);

In this loop, the request is split into multiple bands, each issued a command. There is also a need to be mutually exclusive when working with bands, and no two threads can operate the same stripe at the same time. For example, the synchronization thread is synchronizing this stripe, raid5d is writing this stripe, then it will produce unintended results. 4103 rows, waiting for the queue for stripe access mutex 4108 rows, join the waiting queue 4135 lines, according to the array logical sector to calculate the disk physical offset sector, and calculate the corresponding data disk number and check disk number 4142 lines, according to the physical offset sector of the disk to obtain a stripe
4144 if (SH) {.... 4186 if (Test_bit (stripe_expanding, &sh->state) | |                    4187!add_stripe_bio (SH, Bi, DD_IDX, rw)) {4188/* stripe is busy expanding or4189  * Add failed due to overlap. Flush everything4190 * And wait a while4191 */4192 Md_wakeup_thre                    AD (mddev->thread); 4193 release_stripe (SH); 4194 schedule (); 4195               Goto retry;4196}4197 finish_wait (&conf->wait_for_overlap, &w); 4198               Set_bit (Stripe_handle, &sh->state); 4199 clear_bit (stripe_delayed, &sh->state); 4200 if ((BI-&GT;BI_RW & Req_sync) &&4201!test_and_set_bit (stripe_preread_active, &sh-&gt (state)) 4202 Atomic_inc (&conf->preread_active_stripes); 4203 Release_stripe_plug (MD DEV, SH); 4204} else {4205/* Cannot get stripe for read-ahead, just give-up */4206 cl               Ear_bit (Bio_uptodate, &bi->bi_flags); 4207 finish_wait (&conf->wait_for_overlap, &w); 4208 break;4209}4210}

The first time I looked at this code, I couldn't find the focus because I was in a hurry. Like a person in the noisy city grew up, because of the city's appearance is confused completely do not know what the heart really want to pursue life. When the real calm down to see, finally found the most important sentence in 4187 lines, that is, Add_stripe_bio function, from then on stripe no longer lonely, because of the possession of bio, it is ready to join the strip processing process, a vigorous strip of people's way out of this unfolds. A new stripe was formally added to the processing queue (conf->handle_list) after 4198 rows and 4203 rows of Release_stripe_plug. People in the last half of the constant search for the entrance, the rest of the world constantly looking for exports. Here, read stripe found the entrance, then where is the exit? Students who have read LDD must know the answer, for block device drivers that do not use the default request queue, the corresponding Make_request function is the entry, and the exit is Bio_endio. The next step is to move towards this exit. Release_stripe_plug after the first entry is Handle_stripe,handle_stripe call analyse_stripe, in this function set the To_read:
3245          if (Test_bit (R5_wantfill, &dev->flags)) 3246               s->to_fill++;3247          else if (dev->toread) 3248               s->to_read++;

Return to the Handle_stripe function:
3472     if (s.to_read | | s.non_overwrite3473         | | (Conf->level = = 6 && s.to_write && s.failed) 3474         | | (S.syncing && (s.uptodate + s.compute < disks)) 3475         | | s.replacing3476         | | s.expanding) 3477          Handle_stripe_fill (SH, &s, disks);

To_read triggers the Handle_stripe_fill, the function of which is to set the flags that need to be read:
2696               set_bit (r5_locked, &dev->flags); 2697               set_bit (R5_wantread, &dev->flags); 2698               S >locked++;

Then came the ops_run_io and sent the read request to disk. The callback function for the read request is raid5_end_read_request:
1745     if (uptodate) {1746          set_bit (r5_uptodate, &sh->dev[i].flags); .... 1824     rdev_dec_pending (Rdev, Conf->mddev); 1825     clear_bit (r5_locked, &sh->dev[i].flags); 1826     set_bit (Stripe_handle, &sh->state); 1827     release_stripe (SH);

This function does two things, one is to set the R5_uptodate flag, and the other is to call the Release_stripe again to return the stripe back to the handle_stripe processing. Enter the Analyse_stripe function with the r5_uptodate sign:
3231          if (Test_bit (R5_uptodate, &dev->flags) && dev->toread &&3232              !test_bit ( Stripe_biofill_run, &sh->state)) 3233               set_bit (R5_wantfill, &dev->flags); 32343235          * * now Count Some things */3236          if (Test_bit (r5_locked, &dev->flags)) 3237               s->locked++;3238          if (Test_bit (R5 _uptodate, &dev->flags)) 3239               s->uptodate++;3240          if (Test_bit (R5_wantcompute, &dev->flags ) {3241               s->compute++;3242               bug_on (S->compute > 2); 3243          }32443245          if (Test_bit (r5_ Wantfill, &dev->flags)) 3246               s->to_fill++;

Set the R5_wantfill flag in line 3255, set the To_fill in line 3246, and return to Handle_stripe again:
3426     if (S.to_fill &&!test_bit (Stripe_biofill_run, &sh->state)) {3427          set_bit (stripe_op_ Biofill, &s.ops_request); 3428          set_bit (Stripe_biofill_run, &sh->state); 3429     }

Strip state set Stripe_op_biofill, as long as the set of S.ops_request, you must immediately know that the domain corresponding to the processing function is raid_run_ops, the actual operation in the __raid_run_ops:
1378     if (Test_bit (Stripe_op_biofill, &ops_request)) {1379          ops_run_biofill (SH); 1380          overlap_clear+ +;1381     }

The corresponding handler function is Ops_run_biofill:
812static void Ops_run_biofill (struct stripe_head *sh) 813{814 struct Dma_async_tx_descriptor *tx = null;815 struct Async_submit_ctl submit;816 int i;817818 pr_debug ("%s:stripe%llu\n", __func__,819 (unsigned long Long) Sh->sector); 820821 for (i = sh->disks; i--;)               {822 struct R5dev *dev = &sh->dev[i];823 if (test_bit (R5_wantfill, &dev->flags)) {824 struct Bio *rbi;825 Spin_lock_irq (&sh->stripe_lock); 826 Dev->read = RBI =               dev->toread;827 Dev->toread = null;828 Spin_unlock_irq (&sh->stripe_lock); 829                    while (RBI && rbi->bi_sector <830 Dev->sector + stripe_sectors) {831                    tx = Async_copy_data (0, RBI, dev->page,832 Dev->sector, TX); 833 RBI = R5_next_bio (RBI, Dev->sector); 834}835}836}837838     Atomic_inc (&sh->count); 839 Init_async_submit (&submit, Async_tx_ack, TX, Ops_complete_biofill, SH, NULL ); 840 Async_trigger_callback (&submit); 841}

Finally see the truth, can not help feeling the code is wrapped in a layer after layer, like a mysterious birthday gift to open a layer and layer of packaging, and like the Old Alley Alley through a together to find the liquor store son. But no matter what, the code is not reserved for you, sincere. And the more complex code on the more amorous feelings of all kinds, graceful, the premise is that you have to know how to walk into her heart to understand, and so on when you will be simply astounding, so that the thrill of conquest can not be forgotten for a long time. After conquering the code of one style after another, your quest is no longer confined to the body, but to the spiritual height, like a European architect to design the cathedral, and then spend a more than 600 years to build the Gothic Cologne Cathedral, this is called art. Well, at that time you and I are no longer, but that spirit is always you and I want to pursue the realm. 823 lines, we have just finished reading the disk, this will read the data from the buffer to copy to the Dev->page, and at this time Dev->toread also transferred to the Dev->read. This first constructs the DMA descriptor, 839 and 840 submits the request to the DMA, and then calls back to the 839 incoming parameter Ops_complete_biofill after the request is completed:
769static void Ops_complete_biofill (void *stripe_head_ref) 770{771 struct Stripe_head *sh = stripe_head_ref;772 Str UCT bio *return_bi = null;773 int i;774775 pr_debug ("%s:stripe%llu\n", __func__,776 (unsigned long long ) sh->sector); 777778/* Clear completed Biofills */779 for (i = sh->disks; i--;) {780 struct R5dev *dev = &sh->dev[i];781782 */Acknowledge completion of a Biofill operation */78 3/* And check if we need to reply to a read request,784 * New R5_wantfill Requests is held off until78 5 *!               stripe_biofill_run786 */787 if (Test_and_clear_bit (R5_wantfill, &dev->flags)) {788 struct Bio *rbi, *rbi2;789790 bug_on (!dev->read); 791 RBI = dev->read;792 D Ev->read = null;793 while (RBI && rbi->bi_sector <794 Dev->sector + S                  tripe_sectors) {795  RBI2 = R5_next_bio (RBI, dev->sector); 796 if (!raid5_dec_bi_active_stripes (RBI)) {797                    Rbi->bi_next = return_bi;798 Return_bi = rbi;799}800 RBI = rbi2;801}802}803}804 clear_bit (Stripe_biofill_run, &sh->state); 8058 Return_io (RETURN_BI); 807808 set_bit (Stripe_handle, &sh->state); 809 release_stripe (SH); 810}

If you have acquired the Yimushihang fire eye, you must have seen the 806 rows of return_io, yes, this is the exit I mentioned earlier:
177static void Return_io (struct bio *return_bi) 178{179     struct Bio *bi = return_bi;180 while     (BI) {181182          Return_bi = bi->bi_next;183          bi->bi_next = null;184          bi->bi_size = 0;185          bio_endio (bi, 0); 186          Bi = return_bi;187     }188}

Finally see Bio_endio, Happy Bar to celebrate a drink. Is the party enough? Next there are two study questions: 1) return_bi Why not a bio, but a bi_next? 2) Since Return_io is over, why do 808/809 rows have to be re-added to the processing chain list?Reprint Please specify Source: Http://blog.csdn.net/liumangxiong
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.