With the rapid development of information technology, it has greatly promoted the progress of human society. Online Information Exchange and e-commerce
,
Office automation, automatic control technology, and other information technologies greatly reduce people's labor intensity and facilitate people's work, study, and life. However, the problems of data backup and disaster tolerance are increasingly being solved.
People pay attention to it. However, there is no uniform standard in the industry. Each storage vendor will adopt technologies suitable for its own technology and matching the original products when developing its own disaster recovery strategy. In this way,
As the basis for disaster tolerance, it is very important to select an appropriate data replication architecture.
Data replication is a way to achieve data distribution, that is, to distribute data in one system to another or multiple systems with different geographical locations through the network, to meet the needs of scalable organizations and reduce the master server
And improve data usage efficiency. The process of data replication is similar to the publishing process of newspapers and magazines, that is, information is quickly transmitted from the information source to the information receiving place. For users, selecting an appropriate data replication architecture becomes the key to improving the replication sales rate.
Currently, there are three types of data replication: storage-based storage array and vswitch-based
(San-based); host-based ). Next we will talk about the current three data replication architectures.
I. Storage-based storage array)
Disk Array
Multiple disks form an array and are used as a single disk. data is stored in different hard disks in a segmented manner.
When accessing data, related disks in the array work together to greatly reduce the data access time and improve space utilization. Disk Array
The different technologies used are called raid.
Level. Different Levels are applicable to different systems and applications to solve data security issues.
Generally, high-performance disk arrays are achieved in the form of hardware, further combining disk cache control and disk arrays in a controller (RAID
Controler) or control card, for different users to solve the four requirements of the disk output into the system:
(1) Increase the access speed
(2) fault tolerance (fault tolerance), security
(3) effective use of disk space
(4) Balance CPU and memory as much as possible
And disk performance differences to improve the overall performance of the computer.
Currently, the industry has two basic remote copy methods based on disk systems:
Synchronized PPRC remote copy: Synchronous remote copy can provide the latest data current value in a remote location, but the application will be delayed due to waiting for the completion of write I/O operations.
Asynchronous PPRC remote copy: asynchronous remote copy has the least impact on application performance, but the remote disk system has a latency compared with the local system in terms of data newest.
The following describes the implementation methods of the IBM online storage products in the two solutions.
Synchronous PPRC data-level disaster backup solution: IBM's PPRC provides a base for resentment towards disaster backup. PPRC stands for peertopeerremotecopy, which is a storage
Is a basic, real-time, and application-independent remote data image function. PPRC is a disaster recovery solution with no data loss and full recovery.
Asynchronous PPRC data-level disaster backup solution: to improve the efficiency of the PPRC data backup solution, you can consider using the flash copy function software of IBM to achieve asynchronous implementation.
PPRC data backup. In asynchronous mode, PPRC can return a "Write success" signal to the host if the remote update is not completed. The advantage is that it can be in the master/backup data center.
When data link bandwidth becomes a bottleneck between data links, the asynchronous mode does not affect the performance of the production system of the master data center.
Disadvantages:
1. Data may be lost;
2. Data Consistency cannot be guaranteed when asynchronous synchronization cannot be completed successfully.
Ii. vswitch-based
(San-based)
San refers to storage devices and servers mounted on TCP/IP LAN through a separate network (usually high-speed optical fiber network ).
Group
. When there is a requirement for massive data access, data can be transmitted between related servers and backend storage devices through the storage area network. Simply put, San is a brand new network for storage.
San is based on the fiber channel (FC) to share storage devices. It breaks through the existing distance limit and capacity limit. servers exchange data directly with storage devices through the storage network, released valuable LAN
Resources.
SAN storage has six features:
(1) data sharing for large-capacity storage devices
(2) high-speed Interconnection between high-speed computers and high-speed storage devices
(3) Flexible Storage Device Configuration Requirements
(4) fast data backup
(5) compatible with previous storage devices
(6) Improved data reliability and security
In the San-based Data Replication solution, I would like to introduce brocade's tapestry
DMM. It is composed of brocade's smart San network application platform ap7420 and core software. The smallest unit for Data Replication by DMM is the data volume, which is a lun for the storage array,
It can copy the Luns on a storage array to any storage array in the San network. Because DMM does not work on the host layer, data replication is transparent to the host, that is, no host control is required.
It does not consume host resources and frees the host from the load of Data Replication. The operations are the same for file systems on the host or for bare database devices.
Because the DMM hardware platform is the smart San network application platform ap7420, ap7420's multi-protocol support further enhances the DMM function, enabling DMM not only in San
Data replication is performed inside the network, and data replication between different San networks can be performed using ap7420 to support fcrs (Fiber Channel Routing service), and fcip protocol will be used for future
Cross-Wan data replication and remote data replication can be achieved.
Iii. Host-Based)
The Host-Based Data Replication Disaster Tolerance Method works on the volume manager layer of the host. Data Disaster Tolerance is achieved through disk volume mirroring or replication. In this way, you do not need to use the same storage device on both sides, which has great flexibility. However, the replication function will occupy the CPU of some hosts.
Resources have a certain impact on the performance of the host. Therefore, the scalability of this method is poor, and the actual running performance is not very good. Host-based methods may also affect system stability and security, because they may inadvertently gain unauthorized access to protected data.
OpenView
Host-Based software, using LAN and WAN for remote data replication. The running environment of SM software is Windows2000/NT.
.
Using SM, remote offices can be replicated to the central storage center, or asynchronous IP network replication, as well as one-to-many and multiple-to-one methods. Sm can automatically switch steps to restore data at the file/byte level.
System, schedule replication, and so on, to achieve high efficiency. It supports Microsoft clusters and configures bandwidth usage planning tools and bandwidth allocation management, so that users can easily manage the loan proportions occupied by file replication to ensure application entry
Line. Sm can copy the data written to a disk by any application to achieve multiple copies. Users can use any IP network to fully reduce operating costs. HP
OpenView SM targets applications such as files, printing services, email services, and Web services.
Comparison and summary of the three Architectures
Host replication technology
Host-Based implementation requires the host's CPU
Because the transmission efficiency of TCP/IP is lower than that of FC channel, the system performance is greatly affected. At the same time, this mode only supports specific operating systems
So when the customer has other application systems, another solution is required. In this way, management is complex and requires huge investment in software. Host-Based replication cannot meet the project's technical requirements of "backing up data at a certain time. In addition, the original TCP/IP network may cause serious packet loss and poor performance.
However, host-based replication is often used in a small storage environment. Its cost is low. Even if the software license fee, hardware equipment fee, and service fee are included, the price is only high-end memory-based
Copy a fraction of the system. Therefore, most of its customer groups are small-scale enterprises that store users in the initial stage or have insufficient budget, and are mostly used within a department.
Disk replication technology
This technology is completely transparent to the host operating system. For the future addition of new operating platforms, you can achieve replication without increasing any investment in replication software. In this way, the management is relatively simple and the maximum degree is
It protects users' investment and makes full use of resources. Storage-based replication generally uses ATM or fiber channel as a remote link connection. asynchronous replication is not required, and synchronous replication can be performed,
This ensures data consistency.
However, because storage is provided by storage hardware vendors, there are limitations in compatibility. The user needs to use the devices of the same manufacturer. The selection surface is too small, and the cost is easy to increase.
The requirements for line bandwidth are also high. For small and medium-sized enterprises with sufficient budgets and not complex storage environments, it is appropriate to choose storage-based technologies.
Vswitch replication technology
As a new technology, San-based is widely used in enterprise-level applications, whether it is a seller or in the design architecture. With the efforts of various manufacturers to launch
Excellent SAN storage products, including the increasingly mature Fiber Channel (FC) technology, have now gradually entered the 4 Gbit/s era, coupled with replication on different platforms and so on.
So many benefits are doomed to the bright future of SAN storage. Although its price has always been high, think about its high data replication performance and high reliability. If the data transmission rate is high and the budget is not high
If the number is small, it is still based on SAN storage.