Data Guard中MRP無法啟動的問題分析和解決

來源:互聯網
上載者:User

Data Guard中MRP無法啟動的問題分析和解決

自己手頭有一套dataguard環境,因為也有些日子沒有用了,結果突然心血來潮準備啟動起來學習一下,突然發現在敲了命令 recover managed standby database disconnect from session之後,命令運行正常,但是後台卻報了ora錯誤。
Sat Jun 27 23:16:39 2015
Recovery Slave PR00 previously exited with exception 1157
 Errors in file /u02/dg11g/diag/rdbms/dg11g/DG11G/trace/DG11G_mrp0_6514.trc:
 ORA-01157: cannot identify/lock data file 7 - see DBWR trace file
 ORA-01110: data file 7: '/u02/dg11g/oradata/DG11G/test_new01.dbf'
MRP0: Background Media Recovery process shutdown (DG11G)
 Sat Jun 27 23:16:39 2015
 Completed: ALTER DATABASE RECOVER  managed standby database disconnect from session
 RFS[162]: Opened log for thread 1 sequence 171 dbid 1028247664 branch 880742847
 RFS[161]: Opened log for thread 1 sequence 173 dbid 1028247664 branch 880742847
 RFS[160]: Opened log for thread 1 sequence 172 dbid 1028247664 branch 880742847
通過上面的日誌我們可以看到,MRP進程是在做資料恢複的時候報了ora錯誤ora-01157
但是RFS還是沒有問題,RFS主要是從主庫來傳輸歸檔檔案的,可以看到能夠正常從主庫中傳輸歸檔日誌,sequence#號為171,173,172的歸檔日誌都傳輸到了備庫。

 本來這個問題沒有引起多大的關注,想可能是哪些歸檔檔案沒有用到導致的,但是發現MRP壓根用不了。所以儘管歸檔傳輸完成了,但是資料變更還是應用不到備庫。
 查看v$archive_gap沒有任何記錄,說明沒有歸檔日誌apply的時候出現問題。

 我們來看看這個ora問題的一些明細資訊,提示是在7號資料檔案的地方報了ora-01157錯誤。
Errors in file /u02/dg11g/diag/rdbms/dg11g/DG11G/trace/DG11G_mrp0_6514.trc:
ORA-01157: cannot identify/lock data file 7 - see DBWR trace file
ORA-01110: data file 7: '/u02/dg11g/oradata/DG11G/test_new01.dbf'
從官方對於這個問題的描述來看,似乎是資料檔案出了問題。
$ oerr ora 01157
 01157, 00000, "cannot identify/lock data file %s - see DBWR trace file"
 // *Cause:  The background process was either unable to find one of the data
 //        files or failed to lock it because the file was already in use.
 //        The database will prohibit access to this file but other files will
 //        be unaffected. However the first instance to open the database will
 //        need to access all online data files. Accompanying error from the
 //        operating system describes why the file could not be identified.
 // *Action: Have operating system make file available to database. Then either
 //        open the database or do ALTER SYSTEM CHECK DATAFILES.
因為這個環境被折騰了不知道多少遍,反覆切換,反覆測試,我都不記得是哪些特殊的操作導致了這個問題了。所以這個問題還得從頭來分析。
 首先查看了一下/u02/dg11g/oradata/DG11G/test_new01.dbf 這個檔案,發現在檔案系統中竟然不存在。
 但是在資料字典資訊中卻存在,使用的sql語句為,可以返回對應的記錄來。
select name,file# from v$datafile where file#=7;

從這個情況來看,可能是在備庫端誤刪除了這個資料檔案造成的。對於刪除的資料檔案我們怎麼來評估呢,首先得查看主庫,查看主庫中的檔案情況,但是在主庫中這個資料檔案和資料表空間壓根不存在。
 這樣一來這個問題就有些棘手了。
如果能夠修複MRP的問題,看似這個問題就引刃而解,如果修複不了,可能這個dataguard就不可用了,可能得考慮重建一個物理備庫了。
 對此我們採取保守態度,帶著一絲嘗試看看備庫能不能啟動到open read only狀態。
 但是這三個操作的結果讓我有些迷茫了。
open不了,說可能需要恢複,恢複的檔案竟然是system01.dbf,嘗試recover until cancel也未果。
idle> alter database open read only;
 alter database open read only
 *
 ERROR at line 1:
 ORA-10458: standby database requires recovery
 ORA-01196: file 1 is inconsistent due to a failed media recovery session
 ORA-01110: data file 1: '/u02/dg11g/oradata/DG11G/system01.dbf'

 idle> recover database until cancel;
 ORA-00283: recovery session canceled due to errors
 ORA-01610: recovery using the BACKUP CONTROLFILE option must be done

 idle> alter database open read only;
 alter database open read only
 *
 ERROR at line 1:
 ORA-10458: standby database requires recovery
 ORA-01196: file 1 is inconsistent due to a failed media recovery session
 ORA-01110: data file 1: '/u02/dg11g/oradata/DG11G/system01.dbf'

 

對於這個問題,如果有一個sql語句能夠一針見血的解決問題就好了,自己在反覆嘗試之後發現還是有的,問題的解決思路就是先解決ORA-01157問題,然後dataguard中的MRP問題就能引刃而解。
 對於ora-01157這個問題中的資料檔案在主庫中不存在,但是在備庫的資料字典中存在,我們可以直接在備庫中把資料字典中的問題先解決了。
idle> alter database datafile '/u02/dg11g/oradata/DG11G/test_new01.dbf' offline drop;
 Database altered.
然後dataguard的日誌中就出現而來轉機,在後台會去校正這個檔案的問題,只是拋出了一個警告。Warning: Datafile 7 (/u02/ora11g/oradata/TEST11G/test_new01.dbf) is offline during full database recovery and will not be recovered
然後MRP就正常啟動了。後台開始使用歸檔檔案做資料恢複了。

alter database datafile '/u02/dg11g/oradata/DG11G/test_new01.dbf' offline drop
 Completed: alter database datafile '/u02/dg11g/oradata/DG11G/test_new01.dbf' offline drop
 Sat Jun 27 23:24:08 2015
 ALTER DATABASE RECOVER  managed standby database disconnect from session
 Attempt to start background Managed Standby Recovery process (DG11G)
 Sat Jun 27 23:24:08 2015
 MRP0 started with pid=25, OS id=8431
 MRP0: Background Managed Standby Recovery process started (DG11G)
  started logmerger process
 Sat Jun 27 23:24:13 2015
 Managed Standby Recovery not using Real Time Apply
 Parallel Media Recovery started with 2 slaves
 Warning: Datafile 7 (/u02/ora11g/oradata/TEST11G/test_new01.dbf) is offline during full database recovery and will not be recovered
 Waiting for all non-current ORLs to be archived...
 All non-current ORLs have been archived.
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_121_880742847.dbf
 Completed: ALTER DATABASE RECOVER  managed standby database disconnect from session
 Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_122_880742847.dbf
 Sat Jun 27 23:24:31 2015
 Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_123_880742847.dbf
 Recovery deleting file #7:'/u02/dg11g/oradata/DG11G/test_new01.dbf' from controlfile.
 Deleted file /u02/dg11g/oradata/DG11G/test_new01.dbf
 Recovery dropped tablespace 'TEST_NEW'
 Recovery created file /u02/dg11g/oradata/DG11G/test_new01.dbf
 Successfully added datafile 7 to media recovery
 Datafile #7: '/u02/dg11g/oradata/DG11G/test_new01.dbf'
 Recovery deleting file #7:'/u02/dg11g/oradata/DG11G/test_new01.dbf' from controlfile.
 Deleted file /u02/dg11g/oradata/DG11G/test_new01.dbf
 Recovery dropped tablespace 'TEST_NEW'
 Recovery deleting file #7:'/u02/dg11g/oradata/DG11G/test_new01.dbf' from controlfile.
 Deleted file /u02/dg11g/oradata/DG11G/test_new01.dbf
 Recovery dropped tablespace 'TEST_NEW'
 Recovery deleting file #7:'/u02/dg11g/oradata/DG11G/test_new01.dbf' from controlfile.
 Deleted file /u02/dg11g/oradata/DG11G/test_new01.dbf
 Recovery dropped tablespace 'TEST_NEW'
Media Recovery Log /u02/dg11g/switchover/DG11G/archivelog/1_124_880742847.dbf
Media Recovery Log /u02/dg11g/switchover/DG11G/archivelog/1_125_880742847.dbf
 Recovery deleting file #7:'/u02/dg11g/oradata/DG11G/test_new01.dbf' from controlfile.
 Deleted file /u02/dg11g/oradata/DG11G/test_new01.dbf
 Recovery dropped tablespace 'TEST_NEW'
Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_126_880742847.dbf
Sat Jun 27 23:24:49 2015
 Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_127_880742847.dbf
 Sat Jun 27 23:25:01 2015
 Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_128_880742847.dbf
 Sat Jun 27 23:25:17 2015
 Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_129_880742847.dbf
 Sat Jun 27 23:25:29 2015

比較有意思的是查看日誌可以看到,資料檔案被反覆建立刪除了很多次。最後還是以drop終止。
 然後就開始使用一大堆的歸檔檔案做資料恢複了。

Sat Jun 27 23:28:30 2015
 Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_172_880742847.dbf
 Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_173_880742847.dbf
 Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_174_880742847.dbf
 Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_175_880742847.dbf
 Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_176_880742847.dbf
 Sat Jun 27 23:28:40 2015
 Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_177_880742847.dbf
 Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_178_880742847.dbf
 Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_179_880742847.dbf
 Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_180_880742847.dbf
 Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_181_880742847.dbf
 Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_182_880742847.dbf
 Sat Jun 27 23:28:52 2015
 Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_183_880742847.dbf
 Media Recovery Log /u02/dg11g/flash_recovery_area/DG11G/archivelog/1_184_880742847.dbf

在主庫中查看,redo的序號185,備庫中的序號是184。
sys@TEST11G> select sequence#,status from v$log;
  SEQUENCE# STATUS
 ---------- ----------------
        184 INACTIVE
        185 CURRENT
        183 INACTIVE


在備庫中查看後台進程的情況,可以看到MRP已經記錄在冊了。
idle> select process,status,sequence# from v$managed_standby;
 PROCESS  STATUS        SEQUENCE#
 --------- ------------ ----------
 ARCH      CONNECTED            0
 ARCH      CONNECTED            0
 ARCH      CONNECTED            0
 ARCH      CONNECTED            0
MRP0      WAIT_FOR_LOG        186

--------------------------------------分割線 --------------------------------------

Oracle Data Guard 重要配置參數

基於同一主機配置 Oracle 11g Data Guard

探索Oracle之11g DataGuard

Oracle Data Guard (RAC+DG) 歸檔刪除策略及指令碼

Oracle Data Guard 的角色轉換

Oracle Data Guard的日誌FAL gap問題

Oracle 11g Data Guard Error 16143 Heartbeat failed to connect to standby 處理方法

--------------------------------------分割線 --------------------------------------

相關文章

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.