Replication (Replication) is a technology that copies a set of data from one data source to one or more data sources. The methods are mainly divided into synchronous replication and asynchronous replication:
1. Synchronous replication: Each write operation is required to be completed on the source and target ends before the next operation is processed. It is characterized by less data loss, which will affect the performance of the production system unless the target system is physically close to the production system.
2. Asynchronous replication: before processing the next operation, do not wait for data to be copied to the target system. The characteristic is that there is a time difference between the copied data and the source data, but this kind of copy has less impact on the performance of the production system.
In the design of the disaster recovery plan, the choice of data replication technology is related to the final disaster recovery effect, that is, the size of the RTO and RPO values. According to the application of data replication technology in different system layers, it can be divided into the following five types:
1. Host-based
data replication technology
Host-based data replication is carried out by mirroring or copying disk volumes. The business is carried out at the volume manager layer of the host, which has small restrictions on hardware devices, especially storage devices. The host system of the production center and the backup center is established through the IP network. Data transmission channel, reliable data transmission, relatively high efficiency; through the host data management software to achieve remote data replication, when the main data center data is damaged, you can restore applications from the backup center or restore data from the backup center at any time.
Host-based data replication does not require the same storage devices on both sides, and has greater flexibility. The disadvantage is that the replication function will occupy some host CPU resources and require higher software requirements (many software cannot provide point-in-time snapshot functions) , Has a certain impact on the performance of the host.
2. Data replication technology based on application and middle layer
Application-level data replication uses applications to perform synchronous or asynchronous write operations with the databases of the active and standby centers to ensure the consistency of the data in the active and standby centers. The disaster recovery center can operate normally at the same time as the production center, which can not only perform disaster recovery, but also Part of the function sharing is achieved, but the implementation of this technology is complicated and directly related to the application software business logic, which is difficult to implement and maintain, and the use of application-level data replication will increase the risk of the system and the risk of data loss.
Independent of the underlying operating system, database, and storage, applications can implement double write or multiple writes according to requirements, thereby realizing the data replication function between the master and multiple data copies. This application-implemented technology can be encapsulated and implemented at the middleware or application platform level, transparent to the above applications, and can also be implemented at the application level.
Its main advantage is that it can be customized according to needs, and can be replicated at the application and database level; the main disadvantage is that there are no mature middleware products on the market that are suitable for large-scale promotion and use by traditional IT companies. If it is completely implemented by the application packaging platform or application, the complexity of the code will increase and the maintenance cost of the application will increase.
3. Data replication technology based on database
The replication technology based on database software includes physical replication and logical replication.
Logical replication is to use the redo log and archive log of the database to transfer the log of the site where the master is located to the site where the copy is located, and to achieve data replication by redo SQL. Logical replication only provides asynchronous replication, and the final consistency of the primary copy data cannot guarantee real-time consistency;
Physical replication is not based on the SQL Apply operation to complete the replication, but through the synchronous or asynchronous persistent writing of the redo log or archive log at the replica site to achieve the replication function, and the data at the replica site can provide read-only functionality.
Open platform database replication technology is a structured data replication technology based on database log (log), which obtains data additions, deletions, and changes by analyzing the source database online log or archive log, and then applying these changes to the target database , Synchronize the source database with the target database, so that the database can be active or even active in multiple sites, and achieve the purpose of continuous business availability and disaster tolerance.
4. Data replication technology based on storage system gateway
The storage gateway is located between the server and the storage, and is a dedicated storage service technology built on the SAN network. This technology is based on storage virtualization technology.
The direct definition of storage virtualization: a transparent abstraction layer of storage resources formed in storage devices, that is, storage virtualization is an abstraction layer between servers and storage, and it is a logical representation of physical storage. Its main purpose is to abstract physical storage media as logical storage space, integrate scattered and complex heterogeneous storage management into unified and simple centralized storage management, and reduce the many storage problems that people face from complexity (including storage The process of read and write methods, connection methods, storage specifications or structures, etc.), and the process of decentralization (storage management) is storage virtualization.
The storage gateway provides various data storage services for incoming IO data streams, greatly improving the flexibility, diversity, and heterogeneity that are difficult to achieve at the server or storage level. Using storage gateways, remote data replication, heterogeneous storage integration, storage device high-availability mirroring, snapshot services, data migration services, and even some storage gateways can provide accurate continuous data protection and continuous data recovery services for back-end storage data.
Because the storage gateway offloads the replication workload of servers and arrays, it can run across a large number of server platforms and storage arrays, making it an ideal choice for disaster recovery technologies in highly heterogeneous environments. In addition, due to the unique advantages of bandwidth optimization and refined data recovery, this technology has also become a mainstream disaster recovery technology.
The main contention of this technology is the degree of development of performance guarantee capabilities. In recent years, with the continuous popularization of SAN applications, the management complexity, low resource utilization, and low data service capabilities of storage devices themselves brought about by heterogeneous storage devices and explosive growth in the amount of data in SAN networks have promoted The development and application of storage gateways.
5. Data replication based on storage media
Through the built-in firmware or operating system of the storage system, IP network or fiber channel and other transmission media links, the data is copied to the remote side in a synchronous or asynchronous manner, so as to realize the disaster protection of the production data.
The main feature of using storage media-based data replication technology to build a disaster recovery solution is that it requires higher network connections and hardware. Storage-based replication can be a "one-to-one" replication method, or a "one-to-many or many-to-one" replication method, that is, one stored data is copied to multiple remote storage or multiple stored data is copied to the same Remote storage, and replication can be bidirectional.
Storage replication technology is based on the realization of direct mirroring between storage disk arrays, through the built-in firmware or operating system of the storage system, using the IP network or fiber channel and other transmission interface connections, to copy the data in a synchronous or asynchronous manner to remote. Of course, under normal circumstances, this mode must be realized between storage system controllers of the same storage brand and the same model, and it is also one of the necessary conditions to be equipped with low latency and large bandwidth.
In storage array-based replication, the replication software runs on one or more storage controllers, which is very suitable for environments with a large number of servers for the following reasons:
Independent of the operating system; able to support Windows and Unix-based operating systems and mainframes (high-end arrays); license fees are generally based on storage capacity rather than the number of connected servers; no management work on connected servers is required.
Since the replication work is handed over to the storage controller to complete, the problem of excessive server performance overhead can be avoided when the asynchronous transmission local cache is large, so that the storage array-based replication is very suitable for mission-critical and high-end transaction applications .
Summary
In actual work, it cannot be said that which type of technology is necessarily superior to another type of technology. Advantage is always a relative concept. In actual applications, companies need to choose a technical route that is more suitable for their own business scenarios. After all, there are only suitable ones. Is the best.