誤刪重做記錄檔組導致啟動資料庫報錯ORA-03113
聽不止一個人說起,學技術就要不斷地折騰、搞破壞,比如說備份恢複,你就可以嘗試刪檔案,不管是記錄檔、臨時檔案、資料檔案還是system檔案。刪了之後,重啟資料庫肯定報錯,有的甚至當時資料庫就掛掉,這樣你就可以學著恢複,一破一立之間,很多常規的備份恢複手段也就算是領教了。我今天就嘗試著把虛擬機器上的一個重做記錄檔組刪除。
1.環境準備
我們在Oracle11g中進行測試,資料庫處於非歸檔狀態。
SQL>
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL>
SQL> archive log list;
Database log mode No Archive Mode
Automatic archival Disabled
Archive destination /u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch
Oldest online log sequence 9
Current log sequence 11
SQL>
2.刪除一個重做記錄檔組,重啟資料庫報錯
首先,通過查詢v$log視圖來擷取資料庫重做記錄檔組的狀態。
SQL> select GROUP#,MEMBERS,ARCHIVED,STATUS from v$log;
GROUP# MEMBERS ARC STATUS
---------- ---------- --- ----------------
1 1 NO INACTIVE
2 1 NO CURRENT
3 1 NO INACTIVE
SQL>
然後,通過ls命令查看資料檔案,刪除第一個重做記錄檔組(該檔案組只有一個日誌成員)。
[oracle@ hoegh HOEGH]$ ls
control01.ctl redo01.log sysaux01.dbf undotbs01.dbf
control02.ctl redo02.log system01.dbf users01.dbf
example01.dbf redo03.log temp01.dbf
[oracle@hoegh HOEGH]$
[oracle@hoegh HOEGH]$
[oracle@hoegh HOEGH]$ rm redo01.log
[oracle@hoegh HOEGH]$ ls
control01.ctl control02.ctl example01.dbf redo02.log redo03.log sysaux01.dbf system01.dbf temp01.dbf undotbs01.dbf users01.dbf
重啟資料庫,資料庫報錯。
SQL>
SQL> shu immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 941600768 bytes
Fixed Size 1348860 bytes
Variable Size 515902212 bytes
Database Buffers 419430400 bytes
Redo Buffers 4919296 bytes
Database mounted.
ORA-03113: end-of-file on communication channel
Process ID: 5196
Session ID: 125 Serial number: 5
SQL>
SQL> select status from v$instance;
ERROR:
ORA-03114: not connected to ORACLE
SQL>
3.查看警示記錄檔,定位問題
ORA-03113報錯是一個非常經典的報錯,報錯原因多種多樣,從報錯資訊中並看不出是什麼原因導致的報錯,我們可以到警示記錄檔中查看有價值的線索。
[oracle@enmoedu1 trace]$ tail -40 alert_HOEGH.log
Wed Jul 08 21:59:30 2015
MMON started with pid=15, OS id=5443
Wed Jul 08 21:59:30 2015
MMNL started with pid=16, OS id=5445
starting up 1 dispatcher(s) for network address \'(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))\'...
starting up 1 shared server(s) ...
ORACLE_BASE from environment = /u01/app/oracle
Wed Jul 08 21:59:39 2015
alter database mount
Wed Jul 08 21:59:43 2015
Successful mount of redo thread 1, with mount id 2105928075
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: alter database mount
Wed Jul 08 22:11:45 2015
Time drift detected. Please check VKTM trace file for more details.
Wed Jul 08 22:11:59 2015
alter database open
Wed Jul 08 22:11:59 2015
Errors in file /u01/app/oracle/diag/rdbms/HOEGH/HOEGH/trace/HOEGH_lgwr_5435.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: \'/u01/app/oracle/oradata/HOEGH/redo01.log\'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/HOEGH/HOEGH/trace/HOEGH_lgwr_5435.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: \'/u01/app/oracle/oradata/HOEGH/redo01.log\'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/HOEGH/HOEGH/trace/HOEGH_ora_5451.trc:
ORA-00313: open failed for members of log group 1 of thread
ORA-00312: online log 1 thread 1: \'/u01/app/oracle/oradata/HOEGH/redo01.log\'
USER (ospid: 5451): terminating the instance due to error 313
Wed Jul 08 22:12:00 2015
System state dump requested by (instance=1, osid=5451), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/HOEGH/HOEGH/trace/HOEGH_diag_5425.trc
Dumping diagnostic data in directory=[cdmp_20150708221200], requested by (instance=1, osid=5451), summary=[abnormal instance termination].
Instance terminated by USER, pid = 5451
其中,黃色標註部分為關鍵資訊,我們知道“/u01/app/oracle/oradata/HOEGH/redo01.log”這個檔案找不到了。
4.啟動資料庫到mount狀態,重建重做記錄檔組
從警示日誌可以看出,第一組重做記錄檔組丟了,我們可以通過sql語句“alter database clear logfile group 1;”重建記錄檔組;確認記錄檔建立成功後,將資料庫切換到open狀態。
SQL> startup nomount
ORACLE instance started.
Total System Global Area 941600768 bytes
Fixed Size 1348860 bytes
Variable Size 515902212 bytes
Database Buffers 419430400 bytes
Redo Buffers 4919296 bytes
SQL> alter database mount;
Database altered.
SQL> select group#,sequence#,archived,status from v$log;
GROUP# SEQUENCE# ARC STATUS
---------- ---------- --- ----------------
1 7 NO INACTIVE
3 6 NO INACTIVE
2 8 NO CURRENT
SQL>
SQL>
SQL>
SQL>
SQL> alter database clear logfile group 1;
Database altered.
SQL> select group#,sequence#,archived,status from v$log;
GROUP# SEQUENCE# ARC STATUS
---------- ---------- --- ----------------
1 0 NO UNUSED
3 6 NO INACTIVE
2 8 NO CURRENT
啟動資料庫到open狀態
SQL>
SQL> alter database open;
Database altered.
SQL>
SQL> select group#,sequence#,archived,status from v$log;
GROUP# SEQUENCE# ARC STATUS
---------- ---------- --- ----------------
1 0 NO UNUSED
2 8 NO CURRENT
3 6 NO INACTIVE
此時我們再次查看檔案清單,結果如下。
[oracle@hoegh HOEGH]$ ls
control01.ctl control02.ctl example01.dbf redo01.log redo02.log redo03.log sysaux01.dbf system01.dbf temp01.dbf undotbs01.dbf users01.dbf
[oracle@hoegh HOEGH]$
5.手動切換重做記錄檔組
為了確保建立的記錄檔組可用,我們可以手動切換記錄檔組,改變建立記錄檔組的狀態(由UNUSED改為其他)。
SQL>
SQL> alter system switch logfile;
System altered.
SQL> select group#,sequence#,archived,status from v$log;
GROUP# SEQUENCE# ARC STATUS
---------- ---------- --- ----------------
1 9 NO CURRENT
2 8 NO ACTIVE
3 6 NO INACTIVE
SQL> alter system switch logfile;
System altered.
SQL> select group#,sequence#,archived,status from v$log;
GROUP# SEQUENCE# ARC STATUS
---------- ---------- --- ----------------
1 9 NO ACTIVE
2 8 NO ACTIVE
3 10 NO CURRENT
SQL> alter system switch logfile;
System altered.
SQL> select group#,sequence#,archived,status from v$log;
GROUP# SEQUENCE# ARC STATUS
---------- ---------- --- ----------------
1 9 NO INACTIVE
2 11 NO CURRENT
3 10 NO INACTIVE
SQL>
其中,
current:表示該日誌組為當前日誌組,oracle正在使用該日誌組;
active:當current redo組發生日誌切換時,狀態會改變為active,在這個狀態下,如果資料庫為歸檔模式,archive進程會歸檔active日誌組;如果發生資料庫crash,該日誌組也是執行個體恢複必需的日誌組;
inactive:當active日誌組歸檔完畢並且oracle判斷不需要進行執行個體恢複時,會將其狀態修改為inactive,等待下一輪的使用;所以當日誌組為inactive的時候,如果資料庫為歸檔模式.那麼日誌肯定是歸檔完成了。