Problems with Ceph Crush

Source: Internet
Author: User
Tags emit

Ceph crush the question to read over and over again, the relevant chapters of the CEPH source analysis book are summarized as follows:

4.2.1 Hierarchical Cluster Map
Example 4-1 Cluster map definition
Hierarchical cluster map defines the static topology of the OSD cluster with hierarchical relationships. The level of the OSD enables the crush algorithm to realize the ability of the rack-aware (rack-awareness) when the OSD is selected, that is, by defining the rules, so that the replicas can be distributed in different racks, different computer rooms, providing data security.
Some basic concepts of hierarchical cluster map are as follows:
· Device: The most basic storage device, which is the OSD, one OSD corresponds to a disk storage device.
Bucket: A container for a device that can recursively contain buckets of multiple devices or subtypes. Bucket type: Buckets can have many types, for example, host represents a node and can contain multiple device. Rack is a rack that contains multiple hosts. In Ceph, the default is root, Datacenter, room, row, rack, host six levels. Users can also define the new type themselves. Each device has its own weight, which is related to its own storage space. The weight of the bucket is the sum of the weights of the sub-buckets (or equipment).

  #bucket类型 # typestype  0   Osdtype  1   HostType  2   Chassistype  3   Racktype  4   RowType  5   Pdutype  6   Podtype  7   Roomtype  8   Datacentertype  9   Regiontype  10  root 

The following examples illustrate the use of buckets:

Host Test1 {//type host, named Test1  ID-2                       //bucket ID, which is generally negative# weight3.000              //weight, default is the sum of the weights of the child itemALG Straw//a stochastic selection algorithm for bucketsHash0                      //The bucket random selection algorithm uses the hash function, where 0 represents the use of the hash function jenkins1Item OSD.1Weight1.000     //Item1:osd.1 and Weight valuesItem OSD.2Weight1.000item OSD.3Weight1.000}host test2{ID-3# Weight3.000ALG Straw Hash0item OSD.3Weight1.000item OSD.4Weight1.000item OSD.5Weight1.000}root default{//root Type bucket, name is default  ID-1                       //ID Number# weight6.000ALG Straw//Random selection algorithmHash0                      //rjenkins1Item Test1 Weight3.000Item Test2 Weight3.000}

4.2.2 Placement Rules
Cluster map reflects the physical topology of the storage-system hierarchy. Placement rules determines how a copy of a PG's object is chosen, through which you can set the rules for the user to set the distribution of the replicas in the cluster. The format is defined as follows:

Tack (a)  choose   Choose firstn {num} type {bucket-type}  chooseleaf firstn {num} type {bucket -type},    0, choose pool-num-Replicas buckets (all available).     0 && < pool-num-replicas, choose that many buckets.     0, it means Pool-num-replicas- {num}. Emit

The execution process for Placement rules is as follows:
1) Take operation Select a bucket, usually a bucket of root type.
2) The Choose Operation has different options, and its input is the output of the previous step:
A) Choose Firstn depth First selects a sub bucket of num type Bucket-type.
b) chooseleaf first select the num type of Bucket-type bucket, then recursively to the page node, select an OSD device:
• If Num is 0,num, set the number of replicas for pool.
• If NUM is greater than 0 and less than the number of copies of the pool, then num is selected.
• If num is less than 0, select the number of copies of the pool minus the absolute value of Num.
3) Emit output results.
Operation Chooseleaf Firstn{num}type{bucket-type} can be equivalent to two operations:
A) Choose Firstn{num}type{bucket-type}
b) Choose FIRSTN 1 Type OSD

Example 4-2
Placement Rules: Three copies are distributed in three cabinet.
The cluster Map shown in 4-2: The top layer is a root bucket, with four row type buckets under each root. There are 4 cabinet under each row, and there are several OSD devices under each cabinet (there are 4 hosts in the figure, each host has several OSD devices, but in this crush map there is no host-level bucket, Instead, all OSD devices on the 4 host are defined as a single cabinet):

rule Replicated_ruleset {ruleset0                        //ruleset ID of the IDType replicated//type: repliated or erasure codeMin_size1                       //minimum number of replicasMax_sizeTen                      //Maximum number of replicasStep Take root//Select a root bucket and do the next inputStep Choose FIRSTN1Type row//Select a row, the same rowStep Choose FIRSTN3Type cabinet//Select three cabinet, three copies in different cabinetStep Choose FIRSTN1Type OSD//In the previous step output of the three cabinet, select an OSD separatelyStep Emit}

According to the above definition and the cluster map of Figure 4-2, the selection algorithm executes as follows:
1) Select the root bucket as input to the next step.
2) Select a sub bucket of the row class from the root type bucket, and the selected algorithm is set in root definition and is generally set to the straw algorithm.
3) from the output row of the previous step, select three cabinet, and the selected algorithm is defined in row.
4) from the previous step output of the three cabinet, select an OSD, respectively, and output.
According to this rule sets, three OSD devices are selected to be distributed in three cabinet on a row.

Example 4-3
Placement Rules: The primary replica is distributed on SSDs, and other replicas are distributed on the HDD.


4-3 Cluster Map: Defines two buckets of root type, one is the bucket of root type named SSD, and its OSD storage media is SSD disk. It has two hosts, each host device is SSD disk, and the other is a bucket named HDD root type, the OSD storage media is HDD disk, it has two host, each host device is HDD disk.

Rule ssd-Primary {ruleset5Type replicated Min_size5max_sizeTenStep Take SSD//Select SSD this root bucket as inputStep Chooseleaf FIRSTN1Type Host//Select a host and recursively select the leaf node OSDStep Emit//Output ResultsStep take HDD//Select HDD This root bucket as inputStep Chooseleaf FIRSTN-1Type Host//Select the total number of replicas minus one host and recursively select a leaf node OSDStep Emit//Output Results}

According to the cluster map shown in Figure 4-3, the execution of the rulesets in the code is as follows:
1) First take operation select the SSD as the root type bucket.
2) in the SSD root, select a host, then the host as input, recursive to the leaf node, select an OSD device.
3) Output the selected device, which is the SSD device.
4) Select HDD as the root input.
5) Select 2 host (minus one copy, default 3 copy) and select one OSD device recursively, and finally select two HDD devices.
6) outputs the final result.
The final output is 3 devices, one SSD-type disk, and the other two HDD disks. With these rules, the primary copy of the PG can be distributed on an SSD-type OSD, and other replicas are distributed on HDD-type disks.

Problems with Ceph Crush

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.