Dataguard is an important solution officially recommended by Oracle for the MongoDB architecture. Currently, RAC + Local federation uard + Remote Federation uard has become a common HA architecture in the industry. Switchover and failover are the most common operation scenarios for RAC and replica uard.
Failover is an unplanned switchover action. Generally, the Primary database Primary has a hardware and software fault. If you cannot continue to provide data access services, You Need To forcibly disconnect the Primary and use Standby as the data access point. In 11g Active Data Guard, Standby is usually used as the Read Only With Apply State, providing a Read-Only Data access source, simulating the architecture of Read/write splitting. After Failover, the Primary database actually exits the Oracle HA architecture and becomes a free object. Standby becomes the new Primary after switching. This process is role switching.
Switchover does not cause data loss. Standby can accept and apply all Redo Log data. Failover is hard to say. Depending on the Protection Mode, whether a transaction is committed on the master database depends on whether the transaction is accepted on standby and whether the log data on the application is applied. Therefore, data may be lost during Failover. As a DBA, we need to consider how to protect data and reduce data loss (Gap) as much as possible when the site of the Primary site permits (because the fault of the Primary is unknown ).
This article mainly demonstrates the Failover process.
References:
Important configuration parameters of Oracle Data Guard
Configure Oracle 11g Data Guard based on the same host
Explore Oracle 11g elastic uard
Oracle Data Guard (RAC + DG) archive deletion policies and scripts
Role conversion for Oracle Data Guard
FAL gap in Oracle Data Guard logs
Oracle 11g Data Guard Error 16143 Heartbeat failed to connect to standby
1. experiment environment description
We still use ora11g and ora11gsy pairing nodes. Primary is ora11g, Standby is ora11gsy, and both versions are 11.2.0.4.
Start ora11gsy and standby.
[Oracle @ SimpleLinux ~] $ Export ORACLE_SID = ora11gsy
[Oracle @ SimpleLinux ~] $ Sqlplus/nolog
SQL * Plus: Release 11.2.0.4.0 Production on Mon Apr 21 21:27:28 2014
Copyright (c) 1982,201 3, Oracle. All rights reserved.
SQL> conn/as sysdba
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 372449280 bytes
Fixed Size 1364732 bytes
Variable Size 331353348 bytes
Database Buffers 33554432 bytes
Redo Buffers 6176768 bytes
Database mounted.
Database opened.
Start the apply process.
-- After the Standby end is started, the default value is Read Only.
SQL> select open_mode from v $ database;
OPEN_MODE
--------------------
READ ONLY
SQL> alter database recover managed standby database using current logfile disconnect from session;
Database altered.
SQL> select open_mode from v $ database;
OPEN_MODE
--------------------
READ ONLY WITH APPLY
Then start the Primary end.
[Oracle @ SimpleLinux ~] $ Env | grep ORACLE_SID
ORACLE_SID = ora11g
[Oracle @ SimpleLinux ~] $ Sqlplus/nolog
SQL * Plus: Release 11.2.0.4.0 Production on Tue Apr 22 15:26:29 2014
Copyright (c) 1982,201 3, Oracle. All rights reserved.
SQL> conn/as sysdba
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 313860096 bytes
Fixed Size 1364340 bytes
Variable Size 272633484 bytes
Database Buffers 33554432 bytes
Redo Buffers 6307840 bytes
Database mounted.
Database opened.
2. Failover Experiment
We manually simulate a Primary crash and disable it directly.
SQL> shutdown abort
ORACLE instance shut down.
In the real environment, Primary's faults are diverse and the phenomena are also diverse. The most thorough reason is that the Primary site directly loses contact and cannot be accessed. There are not many such cases, but they can also happen. Such as disk (non-redundant) damage, power failure, and natural disasters. The simplest case may be that the listener must be restarted or the instance must be stopped.
The diversity of faults means that there are various recovery opportunities. In the 11g architecture, Oracle considers that, although the Oracle database cannot be opened, it can be started to the mount state.
The importance of the Mount status lies in the fact that the control_file control file can be read at this stage, and the location and information of the archived logs and online logs can be read. This means that data recovery is most likely to be performed to avoid data loss.
In the 11g version, the manual log flush feature is introduced to make up for the problem of log data not being transferred.
SQL> startup mount
ORACLE instance started.
Total System Global Area 313860096 bytes
Fixed Size 1364340 bytes
Variable Size 272633484 bytes
Database Buffers 33554432 bytes
Redo Buffers 6307840 bytes
Database mounted.
Refresh logs:
SQL> alter system flush redo to 'ora11gsy ';
System altered.
In this case, the alert log displays information and transmits the log.
Tue Apr 22 15:31:00 2014
Resetting standby activation ID 4239920854 (0xfcb80ed6)
Tue Apr 22 15:31:00 2014
Archived Log entry 14 added for thread 1 sequence 27 ID 0xfcb80ed6 dest 1:
Media Recovery Waiting for thread 1 sequence 28
Tue Apr 22 15:31:00 2014
Standby switchover readiness check: Checking whether recoveryapplied all redo ..
Physical Standby applied all the redo from the primary.
Check the log gap. You can view the view v $ archive_gap.
SQL> select thread #, low_sequence #, high_sequence # from v $ archive_gap;
No rows selected
If no obvious gap is found, the failover will not suffer data loss. On the standby side, you must disable apply and end the application.
SQL> alter database recover managed standby database cancel;
Database altered.
SQL> alter database recover managed standby database finish;
Database altered
SQL> select open_mode, switchover_status from v $ database;
OPEN_MODE SWITCHOVER_STATUS
----------------------------------------
READ ONLY TO PRIMARY
Note: This process will not be executed frequently, and there is no way to automatically flush redo in a version like 10 Gb. The solution is to copy the logs to the Standby end from the Primary directory and manually load the logs.
For more details, please continue to read the highlights on the next page: