Calculation method Analysis of Ceph reliability

Source: Internet
Author: User

Before starting the text, I would like to thank Unitedstack Engineer Zhu Rongze for his great help and careful advice on this blog post. This paper makes a more explicit analysis and elaboration on the calculation method of Ceph reliability (https://www.ustack.com/blog/build-block-storage-service/) for Unitedstack company at the Paris summit. For the interests of this topic friends to discuss, research, the article if there is inappropriate, but also ask you expert advice.

What happens when the data is lost? Another reference to this topic is the reliability of storage, the so-called storage reliability is the most basic point is the data do not lose, that is, we commonly known as "can't find back."        So, to analyze the reliability of ceph we just need to figure out exactly under what circumstances our data will be lost and no longer recoverable, so we can create our computational model. Let's start by assuming a simple ceph environment, 3 OSD nodes, one physical hard disk per OSD node, and a copy number of 3. Then we ruled out the Mon factor affecting the operation of the Ceph cluster, it is obvious that when the three OSD corresponding to the physical hard disk all damaged, the data must not be restored.        Therefore, the reliability of the cluster is directly related to the reliability of the hard disk itself. Let's assume that a larger ceph environment, 30 OSD nodes, 3 racks, each rack has 10 OSD nodes, each OSD node still corresponds to a physical hard disk, the number of replicas is 3, and through crush MAP, each copy is evenly distributed on three racks, No two copies appear in a rack at the same time. At this point, when will there be a data loss situation?        When a hard disk is damaged on all three racks, and the three hard drives hold all copies of the same object, the data is lost. So according to the above analysis, we think that the reliability of Ceph is calculated with the number of OSD (N), the number of copies (R), the number of OSD per service node (S), the annual failure probability of the hard disk (AFR). Here we use the Unitedstack correlation parameter to calculate, the concrete value is as follows:


HDD Annual failure probability

Based on Wikipedia's calculation method (Http://en.wikipedia.org/wiki/Annualized_failure_rate), AFR is calculated as follows:


For example, to calculate the AFR of a Seagate enterprise-class hard drive, which is based on a document with an MTBF of 1,200,000 hours, then AFR is 0.73% and the calculation process is as follows:


However, according to Google's calculations, in a large-scale cluster environment, often AFR value is not as optimistic as the hard disk manufacturers, the following statistics tell us in the real environment AFR Change situation:


So we can see that the actual range of AFR varies with the year and the value range is around 1.7%-8%, so the AFR in this article is 1.7%.

The probability of a hard disk damage within one year has AFR, we can try to calculate the probability of a hard disk failure in a year, according to related research, the failure probability of the hard disk in a certain period of time in line with the possion distribution (knowledge has been returned to the teacher's classmate Please move:/http en.wikipedia.org/wiki/poisson_distribution). The calculation formula is as follows:
When I first got this formula, I suddenly lost my way, how to determine the mathematical expectation Lamda? Lamda the calculation process according to the relevant research data, the single block of hard disk damage expectations (failures in time) is the failure rate per 1 billion hours of hard disk (Failure rateλ), the calculation process is as follows:
Here the AF (Acceleration Factor) is the result of the test time multiplied by the Arrhenius equation, well, I confess, I am also now learning to sell, this equation is the relationship between the rate constant and temperature of the chemical reaction, applicable to the primitive reaction and non-primitive reaction, Even some heterogeneous reactions. However, it can be seen that the computational process of failure rate is mainly based on the calculation of the physical changes caused by environmental factors, resulting in the failure of mathematical expectations. Therefore, according to the relevant research, the final Fit calculation method is:
With these parameters, we can begin to formally calculate the probability that there are three hard disks in the Ceph cluster that are damaged at the same time on different racks. The probability of any OSD corruption P1 (any) we are not very easy to calculate the probability of any OSD corruption, but we can easily calculate the probability that there is no problem with the OSD, the method is as follows, with a minus no OSD node problem probability, get P1 (any).
Probability of a second node failure in recovery time P2 (any) we know that when Ceph discovers a problematic OSD node, it automatically takes out the node, which is about 10min, while Ceph's self-healing mechanism automatically balances the data.        Reassign the data of the failed node on the other OSD nodes. We assume that we have a single disk with a capacity of 75% GB, which means that there will be more than one GB of data to be synchronized at this time. Our data is balanced only in this rack, and the node writes at a speed of. MB/s, calculated as follows:
Note: Since there are three OSD per node, it is required that each physical machine should have at least a node bandwidth of more than four MB/s.        And in this calculation model, there is no calculation of metadata, request data, IP header and other additional information size. With recovery time, we can calculate the probability that our second node will fail in recovery time, and the calculation process is as follows:
The probability of a third node failure within the recovery time P3 (any) is calculated as follows:
The probability of the failure of any number of copies (R) of the OSD in a year so multiplying the above probabilities will result in the probability of the failure of an arbitrary number of copies (R) OSD in one year.
Copy sets (M) in this calculation model, because the damage of any r OSD node is not accidental the complete loss of the copy, because the corrupted R OSD does not necessarily hold all the copy information of an object, so it does not necessarily cause data unrecoverable, so this introduces the concept of copy sets. Simply put, Copyset is a collection of all copies, with specific definitions and calculations to see the reference links. So here's the scenario where Copy sets multiplies the number of three rack OSD, which is m=24*24*24. Of course, if it is two copies of the case, M should be 24*24+24*24+24*24.
The reliability of ceph so that the algorithm that finally induces ceph reliability is:
It can be seen that the reliability of ceph three copies is about 9 9, and because of the problem of recovery time and AFR, the calculation results and the Unitedstack are slightly off-limits. Reference links

Annualized Failure Rate

Poisson distribution

Calculating reliability using FIT & Mttf:arrhenius HTOL Model

Google ' s Disk Failure Experience

Failure Trends in a Large Disk drive Population

Copysets:reducing the Frequency of Data Loss in Cloud Storage



Calculation method Analysis of Ceph reliability

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.