Recovery
As application availability and disaster recovery capabilities are increasingly valued, more and more companies are starting to adopt a dual-site strategy. Multi-site application availability solution technologies for geographically dispersed parallel systems (geographically dispersed Parallel Sysplex, GDPS) are also maturing, which can improve application availability and disaster recovery capabilities.
All of the solutions described below assume a broadband connection interconnection between the primary site and the recovery site through an enterprise system connection (ESCON):
Backup and Recovery
Of all IT resources, the data is the most important, but also the most unstable and complex. Other resources, such as processing power, vendor-supplied software, DASD, storage devices, buildings, etc., are ultimately replaceable, but most of the data is irreplaceable. And the data is the most important for business activities. Here we will discuss different IBM products for disaster recovery that can be used for different types of data management and to set up different data backup options. This section mainly describes the features for disaster recovery, but does not cover all of the available features of these products.
Remote Copy
IBM's remote copy capability enables rapid and efficient disaster recovery when an application site is down. This feature enables real-time maintenance of a mirrored copy of the data at the remote site and ensures that remote replication of the data is written to the remote site in exactly the same order as the primary site. This solution automatically tracks the data on the DASD volume under the remote copy control mechanism. This tracing occurs independently of the use of these data applications. Therefore, independent remote copy functionality from different applications is not required.
Applied performance protection, data current value options, and data independence are all part of IBM's remote copy design. The remote copy method has two different types:
Peer-to-peer Remote Copy (PPRC)
Extended remote copy (XRC)
Both PPRC and XRC are trying to protect the data by maintaining a real-time copy between Dsad volumes. A remote copy is beyond the narrow sense of a double copy because it allows the secondary volume to be stored on the apogee. The primary purpose of a dual copy is to protect the data from damage to the device, while remote copies are more widely used.
Peer-to-peer Remote Copy (PPRC)
PPRC provides disaster recovery to maximize the current value of the data. If you are in the following situations, PPRC will be well suited to your needs:
Avoiding data loss is your top priority
The distance between your run site and the recovery site is no more than 103 km
Your workloads and requirements can withstand the performance penalty of synchronizing copies
PPRC provides two types of choices that help preserve the current value and integrity of data. One option is to mark the volume as "critical", ensuring that when the secondary volume cannot be updated, the original update will also be invalidated, regardless of whether the volume is in sync or out-of-sync state, even when the disaster occurs. This feature is provided through APAR and is set to be available at write time.
The second option is to use the recently improved system error recovery program (Error Recovery procedure,erp). When a problem occurs, ERP logs the error message before returning to the host, and a copy of those records is configured at the remote site, providing the ability to continuously provide information such as which volumes are synchronized and which volumes are not synchronized, even during a disaster, which is provided through APAR.
Extended remote copy (XRC)
Extended remote Copy (XRC) is an asynchronous copy function that has minimal impact on most application performance during normal operation. XRC creates a copy copy of your running data at the remote recovery site.
If you belong to the following situation, XRC will be more suitable for you:
Application performance during normal operation is your top priority
can accept minor delay for current value update of secondary site data
If the distance between your running site and the recovery site is more than 103 km or a "line" problem occurs, you can use CNT copyxpress or other channel extenders to extend your XRC solution through the telecommunications line. Because XRC requires a os/390 system data mover (Systems database Mover, SDM), it works only in os/390 environments.
To ensure data integrity, the data Mover is designed so that it can be updated at the remote site in the same order as the primary site. This feature is particularly important when remote copy data is expanded on some storage controllers.
Both of these solutions are able to automatically track data on a DASD volume under remote copy mechanism. Tracking is independent of the application of the data used. As a result, decentralized remote copy capabilities from different applications are not required. Once a remote copy is established on a volume, the remote copy will be run transparently. When the data is written to the primary dsad, no intervention from the user is applied, and the data is copied to the remote Dsad.
PPRC and XRC differ in several ways: the impact on Dsad I/O performance, the current value of the data at the time of the disaster, the utilization of system resources, operating distance, and operational control.
The ESS supports some of the hardware-assisted copy features for two purposes: Provides mirroring operations for disaster recovery solutions and copy capabilities that provide instant copies of data. Storwatch ESS Expert Copy Service Web browser interface provides a way to establish and manage PPRC in any environment. It provides an organized way for users who use Control Panel to create graphical view tasks to manage operations.
Concurrent copy function (Concurrent copy)
Concurrent copies are a feature provided by the improved DFSMS/MVS and IBM S1390 Model 3 and model 6. Ramac Virtual Arrays (Ramac virtual array, RVA) and Enterprise Storage servers (Enterprise Storage server, ESS) become the follow-up products of the IBM S1390 series of course.
Concurrent copies allow for a "point-in-time" copy of the data, which can be updated at the same time. Only updates to the database must be stopped when a copy request is made (the length of time to stop is in seconds). Once the request is accepted, the update can be restored, and the concurrent copy will create a copy of the data when the request is issued. This feature greatly reduces the time that is not available to the database for backup copies. In many cases, the time savings are measured in hours, and concurrent copies greatly increase the flexibility of the scheduling of online operations and batching in a os/390 environment.
Prior to the occurrence of concurrent copies, alternate transposition between physical and logical information dumps is often required. In the dump process, the data is not available for other applications. Physical dumps are performed faster, but must be restored to a similar device. Because a dump is done every night, and recovery does not take place very often, physical dumps generally reduce downtime.
When concurrent copies appear, the rules are changed. For concurrent copy dumps, the data is not available until the concurrent copy dump request is accepted. In the actual dump process, the data is available. In the case of concurrent copies, logical dumps have the same data availability as physical dumps; When you do not use concurrent copies, logical dumps are better at data availability than physical dumps.
DFSMSDSS also provides concurrent copy functionality. This function is invoked by the CONCURRENT parameter embedded in the DFSMSDSS control statement. Dfsmsdss can be invoked as a normal task step, or it can be invoked by a program using the Dfsmsdss API. Most concurrent copy work is performed not by DFSMSDSS, but by the system data Mover (SDM), which is the DFSMS/MVS component.
After the environment initialization is complete, the copy begins and the data update is restored. If the data being copied needs to be updated or the updated data has not been copied, the data is copied to the Sidefile in the IBM DASD controller cache, and the update continues until it is completed. To minimize the cache footprint, the data is transferred from the cache sidefile to the MVS data space sidefile. When copying data, Dfsmsdss constantly retrieves the sidefile before the disk (is deposited), so the backup does not contain any data updates that occur after the copy request is accepted.
Concurrent copies in the ESS work the same way they do in IBM s1390-6. Concurrent copies are initiated by the CONCURRENT keyword contained in Dfdss, or by an application that DFSMSDSS as a copy program and makes internal calls to it.
Quick copy (Flash copy)
The quick Copy feature provided by ESS enables the computing center to create a copy of a logical volume or dataset within a few seconds. Because it only takes a few seconds to create a quick copy of the data, your application only needs to be interrupted for a very short period of time. After that, your application will continue to run. The unique features of quick copy enable the computing center to randomly schedule backups of the data set to provide rapid recovery of data in the event of a disaster.
A quick copy can only be used between disk volumes, which requires that the target volume be in the same logical subsystem as the source volume. When a copy operation is established, an association is established between the target volume and the source volume. Once this association is established, the volume copy will be accessible and a background job will replicate all tracks copied from the source volume to the target volume. If the ESS storwatch expert Copy Service establishes a quick copy process, you can use the Nocopy option to disable this background copy task. If you only need a copy function for a short period of time, the above functionality is available.
Fast copies can be started by the os/390 copy program DFSMSDSS, and for systems with volumes or LUNs set up in the ESS, they can be started through the Web interface Storwatch the ESS expert Copy service. The quick copy feature can also be combined with other hardware-assisted features such as PPRC, allowing you to create a quick copy of a PPRC secondary volume in seconds.
Business data recovery as the main part of business-related data is managed by one or more database management systems (DBMS), this section describes the recovery process for the master station and highlights the differences in disaster recovery.
In the traditional sense, database recovery is based on a secure Point-in-time backup (image copy) for database recovery, and the use of a secure copy of the DBMS history to perform a forward recovery at will. For DBMS databases, it is very likely that the "live forward scrolling" and "Live Remote Update" solutions will be performed.
In the event that the computing center uses historical data for forward recovery, the historical data, together with the necessary recovery control information, must be securely stored away from the station. If the DBMS uses a dual history record, the secondary history data can be assigned to the remote attached DASD. This may be a slightly more expensive solution, but it can eliminate the risk of historical data loss. Otherwise, you need to use disk mirroring for remote real-time history, such as IBM's remote copy feature.
Enterprise System Connection (ESCON)
ESCON greatly improves the internal connectivity between the processor and I/O devices and between multiprocessor. With ESCON, data can be transmitted at a rate of 18.6mb/seconds. The maximum distance of the transmission depends on the type of fibre optic cable, the components of the internal connection, and the control unit used. Most of these distance specifications are not rigidly restrictive. Exceeding these limits can only lead to performance degradation. However, if you exceed a certain value, the system will stop working.
The use of ESCON XDF can make the channel connections sufficient to meet the requirements of many disaster recovery solutions. There is now a large bandwidth CTC connection between the master station and level two sites away from 60km, which can be used to directly backup data between the processor and the processor. Both DASD and cassette-disk devices can be located at 43km away from the main site and allow for a simple and efficient copy of critical data from the station. This means that critical data will quickly and securely complete the backup process. This is done in addition to the traditional manual transfer of backup data to a secure site.