Data Guard中MRP無法啟動的問題分析和解決
自己手頭有一套dataguard環境,因為也有些日子沒有用了,結果突然心血來潮準備啟動起來學習一下,突然發現在敲了命令 recover managed standby database disconnect from session之後,命令運行正常,但是後台卻報了ora錯誤。
Sat Jun 27 23:16:39 2015
Recovery Slave PR00 previously exited with exception 1157
Errors in file /u02/dg11g/diag/rdbms/dg11g/DG11G/trace/DG11G_mrp0_6514.trc:
ORA-01157: cannot identify/lock data file 7 - see DBWR trace file
ORA-01110: data file 7: '/u02/dg11g/oradata/DG11G/test_new01.dbf'
MRP0: Background Media Recovery process shutdown (DG11G)
Sat Jun 27 23:16:39 2015
Completed: ALTER DATABASE RECOVER managed standby database disconnect from session
RFS[162]: Opened log for thread 1 sequence 171 dbid 1028247664 branch 880742847
RFS[161]: Opened log for thread 1 sequence 173 dbid 1028247664 branch 880742847
RFS[160]: Opened log for thread 1 sequence 172 dbid 1028247664 branch 880742847
通過上面的日誌我們可以看到,MRP進程是在做資料恢複的時候報了ora錯誤ora-01157
但是RFS還是沒有問題,RFS主要是從主庫來傳輸歸檔檔案的,可以看到能夠正常從主庫中傳輸歸檔日誌,sequence#號為171,173,172的歸檔日誌都傳輸到了備庫。
本來這個問題沒有引起多大的關注,想可能是哪些歸檔檔案沒有用到導致的,但是發現MRP壓根用不了。所以儘管歸檔傳輸完成了,但是資料變更還是應用不到備庫。
查看v$archive_gap沒有任何記錄,說明沒有歸檔日誌apply的時候出現問題。
我們來看看這個ora問題的一些明細資訊,提示是在7號資料檔案的地方報了ora-01157錯誤。
Errors in file /u02/dg11g/diag/rdbms/dg11g/DG11G/trace/DG11G_mrp0_6514.trc:
ORA-01157: cannot identify/lock data file 7 - see DBWR trace file
ORA-01110: data file 7: '/u02/dg11g/oradata/DG11G/test_new01.dbf'
從官方對於這個問題的描述來看,似乎是資料檔案出了問題。
$ oerr ora 01157
01157, 00000, "cannot identify/lock data file %s - see DBWR trace file"
// *Cause: The background process was either unable to find one of the data
// files or failed to lock it because the file was already in use.
// The database will prohibit access to this file but other files will
// be unaffected. However the first instance to open the database will
// need to access all online data files. Accompanying error from the
// operating system describes why the file could not be identified.
// *Action: Have operating system make file available to database. Then either
// open the database or do ALTER SYSTEM CHECK DATAFILES.
因為這個環境被折騰了不知道多少遍,反覆切換,反覆測試,我都不記得是哪些特殊的操作導致了這個問題了。所以這個問題還得從頭來分析。
首先查看了一下/u02/dg11g/oradata/DG11G/test_new01.dbf 這個檔案,發現在檔案系統中竟然不存在。
但是在資料字典資訊中卻存在,使用的sql語句為,可以返回對應的記錄來。
select name,file# from v$datafile where file#=7;
從這個情況來看,可能是在備庫端誤刪除了這個資料檔案造成的。對於刪除的資料檔案我們怎麼來評估呢,首先得查看主庫,查看主庫中的檔案情況,但是在主庫中這個資料檔案和資料表空間壓根不存在。
這樣一來這個問題就有些棘手了。
如果能夠修複MRP的問題,看似這個問題就引刃而解,如果修複不了,可能這個dataguard就不可用了,可能得考慮重建一個物理備庫了。
對此我們採取保守態度,帶著一絲嘗試看看備庫能不能啟動到open read only狀態。
但是這三個操作的結果讓我有些迷茫了。
open不了,說可能需要恢複,恢複的檔案竟然是system01.dbf,嘗試recover until cancel也未果。
idle> alter database open read only;
alter database open read only
*
ERROR at line 1:
ORA-10458: standby database requires recovery
ORA-01196: file 1 is inconsistent due to a failed media recovery session
ORA-01110: data file 1: '/u02/dg11g/oradata/DG11G/system01.dbf'
idle> recover database until cancel;
ORA-00283: recovery session canceled due to errors
ORA-01610: recovery using the BACKUP CONTROLFILE option must be done
idle> alter database open read only;
alter database open read only
*
ERROR at line 1:
ORA-10458: standby database requires recovery
ORA-01196: file 1 is inconsistent due to a failed media recovery session
ORA-01110: data file 1: '/u02/dg11g/oradata/DG11G/system01.dbf'
對於這個問題,如果有一個sql語句能夠一針見血的解決問題就好了,自己在反覆嘗試之後發現還是有的,問題的解決思路就是先解決ORA-01157問題,然後dataguard中的MRP問題就能引刃而解。
對於ora-01157這個問題中的資料檔案在主庫中不存在,但是在備庫的資料字典中存在,我們可以直接在備庫中把資料字典中的問題先解決了。
idle> alter database datafile '/u02/dg11g/oradata/DG11G/test_new01.dbf' offline drop;
Database altered.
然後dataguard的日誌中就出現而來轉機,在後台會去校正這個檔案的問題,只是拋出了一個警告。Warning: Datafile 7 (/u02/ora11g/oradata/TEST11G/test_new01.dbf) is offline during full database recovery and will not be recovered
然後MRP就正常啟動了。後台開始使用歸檔檔案做資料恢複了。
alter database datafile '/u02/dg11g/oradata/DG11G/test_new01.dbf' offline drop
Completed: alter database datafile '/u02/dg11g/oradata/DG11G/test_new01.dbf' offline drop
Sat Jun 27 23:24:08 2015
ALTER DATABASE RECOVER managed standby database disconnect from session
Attempt to start background Managed Standby Recovery process (DG11G)
Sat Jun 27 23:24:08 2015
MRP0 started with pid=25, OS id=8431
MRP0: Background Managed Standby Recovery process started (DG11G)
started logmerger process
Sat Jun 27 23:24:13 2015
Managed Standby Recovery not using Real Time Apply
Parallel Media Recovery started with 2 slaves
Warning: Datafile 7 (/u02/ora11g/oradata/TEST11G/test_new01.dbf) is offline during full database recovery and will not be recovered
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_121_880742847.dbf
Completed: ALTER DATABASE RECOVER managed standby database disconnect from session
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_122_880742847.dbf
Sat Jun 27 23:24:31 2015
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_123_880742847.dbf
Recovery deleting file #7:'/u02/dg11g/oradata/DG11G/test_new01.dbf' from controlfile.
Deleted file /u02/dg11g/oradata/DG11G/test_new01.dbf
Recovery dropped tablespace 'TEST_NEW'
Recovery created file /u02/dg11g/oradata/DG11G/test_new01.dbf
Successfully added datafile 7 to media recovery
Datafile #7: '/u02/dg11g/oradata/DG11G/test_new01.dbf'
Recovery deleting file #7:'/u02/dg11g/oradata/DG11G/test_new01.dbf' from controlfile.
Deleted file /u02/dg11g/oradata/DG11G/test_new01.dbf
Recovery dropped tablespace 'TEST_NEW'
Recovery deleting file #7:'/u02/dg11g/oradata/DG11G/test_new01.dbf' from controlfile.
Deleted file /u02/dg11g/oradata/DG11G/test_new01.dbf
Recovery dropped tablespace 'TEST_NEW'
Recovery deleting file #7:'/u02/dg11g/oradata/DG11G/test_new01.dbf' from controlfile.
Deleted file /u02/dg11g/oradata/DG11G/test_new01.dbf
Recovery dropped tablespace 'TEST_NEW'
Media Recovery Log /u02/dg11g/switchover/DG11G/archivelog/1_124_880742847.dbf
Media Recovery Log /u02/dg11g/switchover/DG11G/archivelog/1_125_880742847.dbf
Recovery deleting file #7:'/u02/dg11g/oradata/DG11G/test_new01.dbf' from controlfile.
Deleted file /u02/dg11g/oradata/DG11G/test_new01.dbf
Recovery dropped tablespace 'TEST_NEW'
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_126_880742847.dbf
Sat Jun 27 23:24:49 2015
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_127_880742847.dbf
Sat Jun 27 23:25:01 2015
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_128_880742847.dbf
Sat Jun 27 23:25:17 2015
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_129_880742847.dbf
Sat Jun 27 23:25:29 2015
比較有意思的是查看日誌可以看到,資料檔案被反覆建立刪除了很多次。最後還是以drop終止。
然後就開始使用一大堆的歸檔檔案做資料恢複了。
Sat Jun 27 23:28:30 2015
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_172_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_173_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_174_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_175_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_176_880742847.dbf
Sat Jun 27 23:28:40 2015
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_177_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_178_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_179_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_180_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_181_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_182_880742847.dbf
Sat Jun 27 23:28:52 2015
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_183_880742847.dbf
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_184_880742847.dbf
在主庫中查看,redo的序號185,備庫中的序號是184。
sys@TEST11G> select sequence#,status from v$log;
SEQUENCE# STATUS
---------- ----------------
184 INACTIVE
185 CURRENT
183 INACTIVE
在備庫中查看後台進程的情況,可以看到MRP已經記錄在冊了。
idle> select process,status,sequence# from v$managed_standby;
PROCESS STATUS SEQUENCE#
--------- ------------ ----------
ARCH CONNECTED 0
ARCH CONNECTED 0
ARCH CONNECTED 0
ARCH CONNECTED 0
MRP0 WAIT_FOR_LOG 186
--------------------------------------分割線 --------------------------------------
Oracle Data Guard 重要配置參數
基於同一主機配置 Oracle 11g Data Guard
探索Oracle之11g DataGuard
Oracle Data Guard (RAC+DG) 歸檔刪除策略及指令碼
Oracle Data Guard 的角色轉換
Oracle Data Guard的日誌FAL gap問題
Oracle 11g Data Guard Error 16143 Heartbeat failed to connect to standby 處理方法
--------------------------------------分割線 --------------------------------------