RAID The system is an effective means of data protection for storing data. There is always a very long system initialization process during RAID creation, why is there such an operation during RAID initialization? What impact does this operation have on SSDs ? Storage Old Wu from the point of view of technology development and everyone together on the RAID Initialization process analysis, research.
Traditional RAID The basic organizational structure is as follows:
650) this.width=650; "title=" 1.jpg "alt=" wkiol1xg1z_cdemiaadn2k33qka096.jpg "src=" http://s3.51cto.com/wyfs02/M01/ 71/1f/wkiol1xg1z_cdemiaadn2k33qka096.jpg "/>
all Join RAID Group the disk will follow LBA The address is cut into a series of slices, which are called Stripe Unit , same in different disks LBA the address corresponds to the Stripe Unit will be organized into a strip ( Stripe ). Encode all the data in a stripe, for example RAID6 generates two coded chunks P and Q, Can allow two data disks to be damaged at the same time.
So, in RAID in the system, all the data in the stripe needs to satisfy the rules of the codec algorithm, that is, all the data in the stripe can generate the encoded data according to certain rules, and the encoded data and the encoded data stored in the stripe are the same. This condition is called the data in the band is consistent. When a disk fails, the lost data block can be recovered from the encoded data stored in the stripe.
If the data in a stripe is inconsistent, that is, the data in the stripe is computed with a different encoding than the stored encoding data, and once a disk fails, the lost data block cannot be recovered correctly by the encoded data stored in the stripe. Therefore, a stripe with inconsistent data will cause data correctness in the event of a failure.
when you create a RAID System, RAID Group May be a new disk or a data disk that has already been used, and the data on these disks will not be all zeros. In this case, data bands built with these disks must not meet the requirements of data consistency. That is, the data in each stripe is calculated according to certain rules, and the coded data in the stripe is inconsistent. This inconsistent stripe of data introduces a significant risk to the correctness of the RAID data.
for this reason, in creating a RAID , it is necessary to consider initializing all the bands in the system to ensure consistency of the data in the stripe. Stripe initialization can usually be resolved in two ways:
1, initialized by a full write-zero method RAID all the bands in the system. Data is zero, and its checksum data is zero. Therefore, the full 0 data can guarantee the consistency of the stripe.
2, all bands are verified and the checksum data in the stripe is updated to achieve the consistency of stripe data.
when a RAID once the system is initialized, the data in all bands will become consistent as shown in:
650) this.width=650; "Width=" 882 "height=" 301 "title=" 2.jpg "style=" WIDTH:714PX;HEIGHT:229PX; "alt=" Wkiom1xg1vazkw-caag58a9ptqk379.jpg "src=" Http://s3.51cto.com/wyfs02/M01/71/23/wKiom1XG1VazKW-cAAG58A9PTqk379.jpg "/>
RAID The system initialization process is a very lengthy process, mainly due to the need to initialize all the bands in the system. There is also a need to consider The performance balance between thefront-end user IO, soRAID system initialization is often a background execution process that lasts for a long time and has an impact on the performance of the front-end application.
forSSDterms,RAIDOther issues are also introduced in the system initialization process. During the system initialization process, either write 0 or verify the data update, you need toSSDDisk writes data, this process can cause unnecessary data write amplification. The user data has not been written yet, by the way it was initializedSSDThe Data map table is built internally. TheSSDReducing the service life and performance. Therefore, a specificSSDof theRAIDThe system needs to consider the optimization of the system initialization process, the traditionalRAIDis not to take into accountSSDof this particular characteristic. Therefore, the traditionalRAIDcannot be directly inSSDis deployed on theSSDimpact on their life and performance.
RAID The system uses data stripe to protect the data, but the process of stripe data protection also introduces a series of problems, the system initialization is a typical stripe consistency problem. A good RAID Data protection system solves this problem during the design process, such as EMC 's data Domain RAID There is no system initialization process, of course, it needs to coordinate with the file system, and the RAID stripe data distribution has done a lot of optimization.
This article is from the "Save the Way" blog, make sure to keep this source http://alanwu.blog.51cto.com/3652632/1683079
Do you know the RAID initialization process?