Oracle 11g備庫無法開啟ADG的原因分析
今天碰到一個有些奇怪的問題,但是奇怪的現象背後都是有本質的因果。
下午在做一個環境的檢查時,發現備庫是在mount階段,這可是一個11gR2的庫,沒有ADG實在是太浪費了,對於這種情況感覺太不應該了。
所以嘗試啟動至open階段,發現狀態一直是read only,在ADG中應該是READ ONLY WITH APPLY才對啊。
使用dg broker設定為READ-ONLY,備庫的資料庫日誌如下:
Standby Database: stestdb3, Enabled Physical Standby (0x02010000)
08/14/2014 16:03:28
version check on database stestdb3 detected stale metadata,
requesting update from primary database
Creating process RSM0
12/29/2015 16:28:11
Command EDIT DATABASE stestdb3 SET STATE = READ-ONLY completed
Read-Only state no longer supported
12/29/2015 16:29:10
似乎也看不出來什麼端倪。使用dg broker查看一下。發現報了下面的錯誤。
DGMGRL> show configuration;
Configuration - testdb
Protection Mode: MaxPerformance
Databases:
testdbbak93 - Primary database
stestdb3 - Physical standby database
Error: ORA-16766: Redo Apply is stopped
Fast-Start Failover: DISABLED
Configuration Status:
ERROR
查看dg broker的日誌如下:
Data Guard Broker initializing...
Data Guard Broker initialization complete
Tue Dec 29 16:47:15 2015
SMON: enabling cache recovery
No Resource Manager plan active
Physical standby database opened for read only access.
Completed: alter database open
Tue Dec 29 16:47:16 2015
idle dispatcher 'D000' terminated, pid = (18, 1)
Tue Dec 29 16:51:40 2015
Primary database is in MAXIMUM PERFORMANCE mode
RFS[3]: Assigned to RFS process 3596
RFS[3]: Selected log 7 for thread 1 sequence 72606 dbid -1549369665 branch 746558785
Tue Dec 29 16:51:41 2015
RFS[4]: Assigned to RFS process 3590
RFS[4]: Selected log 8 for thread 1 sequence 72605 dbid -1549369665 branch 746558785
Tue Dec 29 16:51:42 2015
Archived Log entry 69432 added for thread 1 sequence 72605 ID 0xa829ec3b dest 2:
從上面的情況可以很明顯看到,確實MRP沒有開始工作,只有RFS在接收歸檔。
然後使用dg broker把備庫設定為ONLINE狀態,再次查看dg broker的檢查,發現檢查就沒有問題了。
DGMGRL> show configuration;
Configuration - testdb
Protection Mode: MaxPerformance
Databases:
testdbbak93 - Primary database
stestdb3 - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
總體感覺這不是一個11g的庫。
然後再次嘗試,手工啟動到open階段,然後可以看到備庫還是READ ONLY,重啟之後問題依然存在。
對於這個問題,最好的方式也還是查看日誌,這個備庫是一年前重啟的了,慶幸的是資料庫日誌依然存在。從當時的啟動情況來看,也沒有其它的錯誤。
但是我注意到了compatible這個參數,因為在11g的庫中還是比較顯眼的。所以這個參數引起了我的好奇。
結果帶著疑問在MOS一查,果然有幾篇相關的文章,看來又碰上一個遺留問題,而且有一個相關的BUG描述。
ACTIVE DATAGUARD (ADG) NOT POSSIBLE WITH COMPATIBLE < 11.1.0.0.0 (Doc ID 1363396.1)
BUG:13032521 - ADG PHYSICAL STANDBY GOES TO MOUNT STATE INSTEAD OF READ ONLY WITH APPLY
問題基本定位後,主備庫中查看這個參數都是10.2.0.5.0
SQL> show parameter compa
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
compatible string 10.2.0.5.0
那麼按照bug描述的WA,是設定備庫的compatible為11.1.0.7以上,這個參數的修改需要重啟執行個體,所以還是比較影響的,主庫目前是沒法重啟了。
SQL> alter system set compatible='11.2.0.3.0';
alter system set compatible='11.2.0.3.0'
*
ERROR at line 1:
ORA-02095: specified initialization parameter cannot be modified
現在備庫設定一番,先看看行不行。
SQL> alter system set compatible='11.2.0.3.0' scope=spfile;
System altered.
重啟時,可以看到備庫的資料庫日誌有下面這麼一段輸出。
Tue Dec 29 17:25:26 2015
Spfile /U01/app/Oracle/product/11.2.3/db_1/dbs/spfiletestdb.ora is in old pre-11 format and compatible >= 11.0.0; converting to new H.A.R.D. compliant format.
Completed: alter database mount
但是再次設定為ONLINE,查看資料庫狀態依舊是MOUNT
SQL> select open_mode from v$database;
OPEN_MODE
--------------------
READ ONLY
看來備庫修改還不行,主庫也得修改一致。
不過查看資料庫日誌可以看到下面的這麼一段內容,發現MRP啟動失敗。
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE THROUGH ALL SWITCHOVER DISCONNECT USING CURRENT LOGFILE
Attempt to start background Managed Standby Recovery process (testdb)
Tue Dec 29 17:57:03 2015
MRP0 started with pid=29, OS id=17740
MRP0: Background Managed Standby Recovery process started (testdb)
started logmerger process
Tue Dec 29 17:57:08 2015
Managed Standby Recovery starting Real Time Apply
Parallel Media Recovery started with 16 slaves
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Media Recovery Log /U01/app/oracle/fra/StestDB3/archivelog/2015_12_29/o1_mf_1_72606_c84n0xml_.arc
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE THROUGH ALL SWITCHOVER DISCONNECT USING CURRENT LOGFILE
Errors with log /U01/app/oracle/fra/StestDB3/archivelog/2015_12_29/o1_mf_1_72606_c84n0xml_.arc
MRP0: Background Media Recovery terminated with error 38800
Errors in file /U01/app/oracle/diag/rdbms/stestdb3/testdb/trace/testdb_pr00_17745.trc:
ORA-38800: Cannot start Redo Apply on the open physical standby database
Managed Standby Recovery not using Real Time Apply
Recovery interrupted!
MRP0: Background Media Recovery process shutdown (testdb)
看來這個參數變化影響確實不小,備庫先恢複正常狀態再說,等協調主庫重啟再處理了,所以開始恢複參數原有的設定。把compatible設定為10.2.0.5.0
?但是重啟的時候就開始報錯了。
SQL> alter database mount;
alter database mount
*
ERROR at line 1:
ORA-00201: control file version 11.2.0.3.0 incompatible with ORACLE version
10.2.0.5.0
ORA-00202: control file: '/U01/app/oracle/oradata/testdb/control01.ctl'
這個問題看似還有餘地,在主庫產生備庫控制檔案,傳輸過去,mount就沒有問題了
主庫:
SQL> alter database create standby controlfile as '/tmp/std1.ctl';
Database altered.
?備庫:
SQL> alter database mount standby database;
Database altered.
但是這個時候查看備庫的資料庫日誌,發現問題貌似變麻煩了。檔案頭部已經修改,已經不同步了。
ALTER DATABASE RECOVER managed standby database disconnect from session
Attempt to start background Managed Standby Recovery process (testdb)
Tue Dec 29 18:28:13 2015
MRP0 started with pid=30, OS id=24283
MRP0: Background Managed Standby Recovery process started (testdb)
started logmerger process
Tue Dec 29 18:28:18 2015
Managed Standby Recovery not using Real Time Apply
Read of datafile '/U01/app/oracle/oradata/testdb/system01.dbf' (fno 1) header failed with ORA-01130
Rereading datafile 1 header failed with ORA-01130
MRP0: Background Media Recovery terminated with error 1110
Errors in file /U01/app/oracle/diag/rdbms/stestdb3/testdb/trace/testdb_pr00_24288.trc:
ORA-01110: data file 1: '/U01/app/oracle/oradata/testdb/system01.dbf'
ORA-01122: database file 1 failed verification check
ORA-01110: data file 1: '/U01/app/oracle/oradata/testdb/system01.dbf'
ORA-01130: database file version 11.2.0.3.0 incompatible with ORACLE version 10.2.0.5.0
Slave exiting with ORA-1110 exception
Errors in file /U01/app/oracle/diag/rdbms/stestdb3/testdb/trace/testdb_pr00_24288.trc:
ORA-01110: data file 1: '/U01/app/oracle/oradata/testdb/system01.dbf'
ORA-01122: database file 1 failed verification check
ORA-01110: data file 1: '/U01/app/oracle/oradata/testdb/system01.dbf'
ORA-01130: database file version 11.2.0.3.0 incompatible with ORACLE version 10.2.0.5.0
Recovery Slave PR00 previously exited with exception 1110
MRP0: Background Media Recovery process shutdown (testdb)
Completed: ALTER DATABASE RECOVER managed standby database disconnect from session
對應的trace檔案如下:
*** 2015-12-29 18:28:18.495 4320 krsh.c
Managed Standby Recovery not using Real Time Apply
Read of datafile '/U01/app/oracle/oradata/testdb/system01.dbf' (fno 1) header failed with ORA-01130
Rereading datafile 1 header failed with ORA-01130
V10 STYLE FILE HEADER:
Compatibility Vsn = 186647296=0xb200300
Db ID=2745597631=0xa3a67ebf, Db Name='testDB'
Activation ID=0=0x0
Control Seq=1=0x1, File size=147200=0x23f00
File Number=1, Blksiz=8192, File Type=3 DATA
Tablespace #0 - SYSTEM rel_fn:1
對於這種情況,其實恢複備庫11g的控制檔案,重啟主庫 應該就可以解決了,但是重啟主庫還需要協調時間,找維護視窗,所以不是一蹴而就的事情,那麼這個期間容災是重中之重,一旦主庫出了問題,���響還是不小,所以最後的無奈之舉就是重建備庫。
當然搭建備庫還是可以採用11g的active方式。
rman target sys@xxxxx auxiliary sys@xxxx nocatalog
RMAN> duplicate target database for standby from active database nofilenamecheck;
?然後就沒有然後了,就是備庫搭建成功了,看著白忙活一場,心中像打翻了五味瓶。