Analysis of read/write processes in the MD module-4

Source: Internet
Author: User
MD is different in processing read/write errors. When data writing errors occur, the processing is relatively simple. When an error occurs during reading, it is troublesome. it calculates the data that cannot be read and writes it back to the disk. First, let's take a look at how to handle write errors. 1. an error occurs when writing data. if the data is written, it is... information & nb

 

MD is different in processing read/write errors. When data writing errors occur, the processing is relatively simple. When an error occurs during reading, it is troublesome. it calculates the data that cannot be read and writes it back to the disk. First, let's take a look at how to handle write errors.

1. an error occurred while writing data.

If an error occurs in writing, the BIO_UPTODATE bit of bio in the callback function raid5_end_write_request () is invalid. call the md_error function to set the rdev to Faulty, clear the In_sync flag, and degreded ++. Wake up the raid5 daemon. if there is a spare disk, perform a recovery. This process will be discussed later. Set the STRIPE_HANDLE bit of the strip to continue processing the strip.

In the handle_stripe5 function, the number of invalid disks is counted as failed. For RAID 5, a disk is allowed to expire. If failed> 1, the array will become invalid. In handle_stripe5, the corresponding processing ----- that is, if (failed> 1 & to_read + to_write + written) is satisfied, and all commands in the strip are returned as failures.

If only one disk is invalid and there are non-full write requests on the expired rdev, data on other disks must be read. Why? In fact, just think about it. When writing data, we need to calculate and verify the data on the disk. to ensure the correctness of the data on the disk, we must know the original data in the buffer zone for non-full write on the invalid disk, then, update part of the data to the buffer. In this way, the data in the buffer zone that fails during data writing is correct. The data on the invalid disk needs to be calculated based on the data on other disks. Therefore, the data on other disks must be read first.

If it is not a non-full block write, we do not need to pre-read data from other non-failure disks. At this time, it will go to the process of determining whether to do rmw or rcw. The rmw display is not practical in the case of a faulty disk. Because rmw reads data from the disk with write requests, and the data on the invalid disk is calculated by pre-reading data from other disks, therefore, set the rmw value to 2 * disks. The rcw value is also 2 * disks. The purpose is to select rcw for data writing.

After that, the data on the invalid disk will be calculated based on compute_block () for a non-full block write, and the data on the full block write is read based on the rcw method. at this time, the data is ready, yes

 

If (locked = 0 & (rcw = 0 | rmw = 0 )&&

! Test_bit (STRIPE_BIT_DELAY, & sh-> state ))

Then compute_parity5 calculates the disk information and writes the data to the disk. There may be a question: how can I write the invalid disk data? At the end of handle_stripe, the following judgment will be made:

 

Rcu_read_lock ();

Rdev = rcu_dereference (conf-> disks [I]. rdev );

If (rdev & test_bit (Faulty, & rdev-> flags ))

Rdev = NULL;

If (rdev)

Atomic_inc (& rdev-> nr_pending );

Rcu_read_unlock ();

 

 

 

 

This rcu lock is very important and someone interested can study it. This code sets the rdev pointer value based on the rdev status. if it is NULL, no specific commands will be issued to the physical disk. In this way, the processing is completed when the data writing error occurs. let's take a look at the time when the read error occurs.

2. an error occurred while reading data.

 

Like a write error, a read error is first reflected in the raid5_end_read_request () function, but the command is retried. Some checks will be performed before the retry command. for example, if the array is already in the degraded state, we have not retried the command, and the array is broken. For example, if too many reading errors occur on the device, no retry is performed. If no retry is performed, md_error is called. Otherwise, set the rdev status to R5_ReadError and reprocess the strip.

In the handle_stripe5 function, if there is a read request on the invalid disk, you still need to read the data on other disks to calculate the data on the invalid disk. When data on other disks is read, compute_block () is called to calculate the data on the invalid disk.

If (failed = 1 &&! Conf-> mddev-> ro &&

Test_bit (R5_ReadError, & sh-> dev [failed_num]. flags)

&&! Test_bit (R5_LOCKED, & sh-> dev [failed_num]. flags)

& Test_bit (R5_UPTODATE, & sh-> dev [failed_num]. flags)

)

 

Criterion: Set the rdev status bit R5_ReWrite to valid, and re-write the data in the failed disk back to the disk.

 

If the rewrite operation is successful, the data is successfully written back to the disk. Otherwise, the data is processed if the write request fails.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.