Crush Algorithm
1, the purpose of crush
Optimize allocation data, efficiently reorganize data, flexibly constrain object copy placement, maximize data security when hardware fails
2. Process
In the Ceph architecture, the Ceph client is directly read and written to the Rados Object stored on the OSD, so ceph needs to go through (pool, Object) → (pool, PG) →osd set→osd/disk complete link to allow Ceph client Know exactly where the target data object is located.
When data is written, the file is cut into Object,object first mapped to the PG, and then the PG is mapped to the OSD set. Each pool has more than one PG, and each object obtains its corresponding PG by calculating the hash value and modulo it. The PG is then mapped to a set of OSD (the number of OSD is determined by the number of copies of the pool).
The process algorithm for the PG-to-OSD mapping is called the crush algorithm, the algorithm is a pseudo-random process, and he can select an OSD collection from all of the OSD's randomness, butThe result of each random selection of the same PG is constant, that is, the mapped OSD set is fixed.
Osdmap Management all the Osd,osdmap in the current ceph specify a range of crush algorithms, in which the OSD combination is selected. There are two factors that affect the results of the crush algorithm, one is the OSDMAP structure and the other is the crush rule.
3. Crush Rule Introduction
Crush rule has 3 main points:
A. From which node in the Osdmap to start the search,
B. Use that node as the fault isolation domain,
c. Locate the copy of the search mode (breadth first or depth first).
# ruleset
Rule Replicated_ruleset #规则集的命名, you can specify the rule set when you create a pool
{
ruleset 0 #rules集的编号, sequential series can be
Type replicated #定义pool类型为replicated (and Esurecode mode)
Min_size 1 #pool中最小指定的副本数量不能小1
Max_size #pool中最大指定的副本数量不能大于10
Step take default #定义pg查找副本的入口点
Step chooseleaf firstn 0 Type host #选叶子节点, depth first, isolated host
Step Emit #结束
}
PG Selects the OSD process: first know in the rules indicate which node in the Osdmap start to find, the entry point defaults to default is the root node, and then the isolation domain is the host node (that is, the same host cannot select two child nodes). The selection process from default to 3 hosts, where the next child node is selected by default based on the bucket type of the node, and the sub-node continues to select according to its own type, knowing the selection to host and then selecting an OSD under host.
PG and PGP
Through the crush algorithm, Ceph maps several objects onto the PG, forming a logical set of object and PG, which is used as the middle layer of object and OSD, and the PG is copied to multiple OSD based on the number of copies of the pool in which it resides. The purpose of PG is to make certain things logically grouped, so as to achieve a unified management, improve the efficiency of the role.
PG is the number of directories that specify the storage pool storage object, PGP is the amount of OSD distribution combination of the storage pool PG
Summary of the three-point introduction
1, PGP plays the role of co-locating pg.
2, the value of PGP should be the same as PG, the value of PG increases, but also to increase the value of PGP to maintain the same value.
3. When the PG of a pool increases, Ceph does not start to rebalancing, only after the value of PGP increases, the PG will start migrating to the other OSD and start rebalancing.
Conclusion
When PG is added, Ceph does not randomly extract part of the data from the original PG to the new PG, but splits a PG to produce a new PG. The original 6 PG only 2 splits, the other 4 keep the object unchanged, this way can effectively reduce the large number of data migration caused by performance problems. (The mapping between PG to OSD has not changed)
When PGP changes, Ceph will begin the real data re-balancing. (The adjustment of PGP will not cause the division of objects within the PG, but will cause a change in the distribution of PG, adjust the new PG to OSD mapping, to ensure the uniform distribution of data at the OSD level)
Ceph knowledge excerpt (Crush algorithm, PG/PGP)