Data is central to the operation of the entire system, and data loss caused by human or non-human beings can have an immeasurable impact on the business. Therefore, the system administrator will consider the data backup means to protect the business data. But now the rise of cloud data centers is bringing massive amounts of data together. Compared to small-capacity (less than 500GB) RTO, RPO-sensitive scenarios that are common in traditional backup industries, the challenges posed by cloud data centers are completely beyond the reach of effective protection.
level Five: PB Level data center disaster preparedness Design level on
Drawing difficulty: four stars
Traditional backup methods are usually designed to protect the application, relying on the agent to invoke the interface of the application side. Data consistency can be fully guaranteed. However, traditional backup is limited in both backup speed and storage efficiency when it comes to indexing tables, in the face of large volumes of small files, or petabytes of data at the center of Cloud computing. And from a security point of view, only by the data at the production end of the data backup is not yet able to provide customers with effective protection. For some key business systems (e.g., online trading systems, medical data), data is the core of the entire system. Once the function of the system hardware fails, the aging of the storage media, the human error, and the unpredictable external factors cause the data to be accidentally lost or damaged, it will have an immeasurable impact on the business. In order to avoid this situation, the data can be effectively restored or the data is available in real time by a full copy of the local offsite.
This article is a simple description of the common data protection methods, specific details can be found on the Internet, no longer repeat here. In the next article, we'll focus on the common disaster-preparedness architecture design for storage of the underlying Ceph port.
There are several common ways to protect yourself:
1. Data protection based on storage level.
Traditional storage of data protection, generally there are two ways: one, asynchronous replication, through the snapshot to pass the data between the variables. The advantage of this approach is that it saves bandwidth efficiently and does not require the installation of additional software on the upper-level business platform or any pressure on the upper-level business system. Second, real-time replication, the use of storage function to write data to do the shunt. In contrast to asynchronous replication, the bandwidth requirements are higher, regardless of RPO or RTO. In general, however, most traditional storage companies use this kind of storage function software as a profit pool sold separately, and each storage brand or even the same storage brand different generations can not achieve storage-level remote protection.
2, application-based data protection.
In addition to storage-level replication, some manufacturers have a separate path through the drive layer at the host level to write data flow. In this way, the compatibility of the storage side is effectively avoided and the deployment cost is greatly reduced. However, this approach requires the installation of a set of data separation software at the host level, which limits the compatibility of the operating system (the software on the market is only compatible with Windows and a few Linux versions). And the use of host-based data separation, the host CPU, memory will cause greater pressure, under the extreme business load, will be the business platform deadlock. From the security perspective there are some hidden dangers.
3, data protection based on storage gateway.
The most popular storage-dual-live approach now has two data protection solutions, namely IBM's SVC solution and EMC's Vplex solution, which is not detailed here. The benefit of this approach is to use storage virtualization gateways to pool different brands of storage and use storage gateways for data separation. Data can be protected in real-time or asynchronously, based on the needs of the enterprise. But the disadvantage is also obvious, and storage gateway as a total data traffic out of the portal itself can become a bottleneck! and limited by the business strategy and technical route, from the cost (basically hundreds of thousands of levels) or from the wide range of equipment compatibility (although the various virtualization gateway manufacturers say that the node can be smooth expansion, but the maximum number of nodes has been within 16 nodes, and all need to maintain the same-generation products) There is a great limit.
4, remote replication based on backup software.
Some backup software uses its own private protocol to remotely synchronize the local backup data to the remote backup node. In this way, the efficient use of bandwidth and the data of the most efficient. But the disadvantage is also very obvious, can not do business in real-time availability.
In summary, there is no one type of data resiliency that can be applied to all business scenarios. System administrators still need to choose the data protection scheme that is best suited to their business characteristics.
This article briefly describes the common methods of data protection, in the next article will be described in the storage of the underlying Ceph terminal common disaster preparedness architecture design, I hope this article can give novice reference and understanding. Please be a matter of opinion, anticipate the funeral, please look forward to the "Architecture disaster preparedness Design".
This article is from the "Attitude decides everything" blog, please make sure to keep this source http://sangh.blog.51cto.com/6892345/1884394
From the traditional operation to the cloud operations evolution of software-defined storage (v)