In this article, we mainly talk about several kinds of data protection methods, and we mainly talk about the common disaster design methods of Ceph. Ceph has three great weapons in disaster preparedness: fault domain, RBD offsite disaster preparedness, RGW offsite disaster preparedness.
level five: Ceph disaster preparedness Divine Weapon-fault domain
degree of importance: five stars
In a blink of an eye six articles passed, still remember Daming Lake (this series one) of the operation of the small brother? Do not forget beginner's mind, we still go back to the initial operation of small brother, Yun-brother experienced a hardware selection, deployment, tuning, testing a series of transition levels, finally to the last on-line. But often the requirements in the production environment are no single point, highly available architecture design, to avoid the occurrence of disaster failure to affect the normal operation of the business. OPS brother's first dream. Build a ceph storage cluster, docking cloud services, and the underlying storage for a highly available data access architecture. Before the official launch, the cluster architecture needs to be deployed, and the storage cluster for server power outages, rack power outages and other data high-availability testing, the need for Ceph disaster preparation artifact " fault domain ." There are 24 servers available to build a Ceph storage cluster.
Depending on the requirements of the storage management platform and the size of the cluster, you need to implement:
Plan the physical environment in a highly available topology and complete the storage cluster deployment. Realize the unified management of storage resources, while reducing the difficulty of storage management, improve management efficiency; Ensure high availability of storage data through software-defined storage, and thus utilize storage resources to improve business continuity;
According to the existing physical resource specifications and configuration, in the case of maximum security and space utilization, reasonable planning storage resource pool. 24 servers are planned on 3 racks, each rack 8 servers, each rack set as a fault domain, create a 3 copy storage resource pool, data copy automatically distributed to different fault domains, is also distributed on different racks, to ensure data security. You can provide recovery capability for racks, servers, and hard disks. No downtime or data loss due to hardware failure on disk, server, or even the entire rack. This is related to Ceph's own design crush map and rule set, which are described later.
Physical topology Planning: each rack is equivalent to 1 fault domains, assigning the same number of hosts with a unified configuration to each rack. Evenly allocate 24 servers to 3 racks, 8 servers in each rack.
Topology diagram
650) this.width=650; "class=" Size-full wp-image-610 aligncenter "src=" http://www.xsky.com/wordpress/wp-content/ Uploads/2016/11/1.png "alt=" 1 "style=" border:0px;vertical-align:middle;width:720px;margin:20px auto; "/>
Note: The 1 racks that I'm talking about are equivalent to 1 fault domains, and that's not the only way to do it, but to plan fault domains and crush Map according to your actual situation.
Technology implementation
Crush map contains a list of all the storage devices in the cluster, and some buckets that organize these devices into a physical hierarchy, as well as a set of rules that specify replication data policies. Buckets of different types, high-level buckets can aggregate low-level buckets. For example, a typical bucket level is OSD, Host, Rack, Row, A, DC, Root. For a crush map consisting of these buckets types, a tree structure is formed as shown in the structure. For some type of Bucket,crush algorithm, the data and its copies can be placed in different buckets of that type, forming a fault domain. Even if any device in the fault domain is damaged, the data is safe and available to avoid a single point of failure.
650) this.width=650; "class=" Size-full wp-image-611 aligncenter "src=" http://www.xsky.com/wordpress/wp-content/ Uploads/2016/11/2.png "alt=" 2 "style=" border:0px;vertical-align:middle;width:720px;margin:20px auto; "/>
If the bucket is mapped properly with the cluster's physical architecture, Crush map can be used to locate the physical device problem within the cluster. For example, if the cluster has an OSD corresponding to the hard drive, you can easily locate its physical location from the crush map so that it can be quickly replaced. For example, if you see in the crush map that all Osds under a host are down, the possible problem will be that the host power is broken or the network is broken, not on the OSD itself.
Buckets have weights in addition to types. You can define a weight for the lowest-level buckets, such as the OSD. According to the tree structure of the bucket level, the weight of the bucket at the upper level is the sum of the weights of all sub-tree levels below it. The weight on the OSD represents the proportion of data stored on the OSD. If it is 0, no data will be stored on it. If an OSD has a weight of 1 and the other is 2, the amount of data stored on the first OSD will only be half the amount of data on the second OSD. Weights can be used to represent the true capacity of a physical disk corresponding to an OSD. It can also be used to reduce the load on an OSD.
Specific Crush Map Operation View official website: http://docs.ceph.com/docs/master/rados/operations/crush-map/?highlight=crushmap
Summarize
In order to ensure the high availability of storage data, it is necessary to plan the cluster deployment well in the prophase. The same rack server is a fault domain that stores copies of data in different fault domains, ensuring that no hardware failure occurs on the disk, on the server, or even the entire rack fails, without downtime or data loss.
Hope this level can give Ceph Novice Reference , please readers a matter of opinion, predict funeral , Please look forward to the "Architecture disaster preparedness Design".
This article is from the "Attitude decides everything" blog, please make sure to keep this source http://sangh.blog.51cto.com/6892345/1884396
From the traditional operation to the cloud operations evolution of software-defined storage (v)