RMAN備份時遭遇ORA-19571
進行RMAN備份時出現ORA-19571錯誤,導致備份任務終止,具體錯誤如下:
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup plus archivelog command at 07/10/2015 16:18:43
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 07/10/2015 16:18:24
ORA-19571: archived-log recid 85421 stamp 650564644 not found in control file
錯誤提示很明顯,是因為在控制檔案中沒有找到某個歸檔檔案記錄,導致備份失敗,看上去像是控制檔案記錄被覆蓋了。控制檔案中的記錄分為兩類, 一類是迴圈使用的記錄,當記錄的solt滿時,會覆蓋老的記錄,記錄的儲存時間由參數control_file_record_keep_time控制。所以這裡首先檢查這個參數的設定。
SQL> show parameter control_file
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
control_file_record_keep_time integer 15
參數配置為15天,接下來再檢查報錯中的歸檔日誌的產生時間
SQL> select recid,SEQUENCE#,ARCHIVED,STATUS,COMPLETION_TIME from v$archived_log where recid = 125609;
RECID SEQUENCE# ARC S COMPLETION_TIME
---------- ---------- --- - -------------------
125609 885421 YES A 2015-07-08 02:05:59
從上面的資訊看,歸檔上兩天前產生的,該記錄在控制檔案中不應該到期。為保證備份任務及時完成,不影響下一天的正常業務,首先手動將歸檔資訊註冊到控制檔案。
RMAN>catalog start with '/arch1/';
使用上面的命令註冊時,在其中一個節點上提示沒有發現可註冊的檔案,使用下面的命令分別對每個歸檔檔案進行註冊
run{
catalog archivelog '/arch1/xxx_1_47849_801075830.dbf';
catalog archivelog '/arch1/xxx_1_47850_801075830.dbf';
...省略部分
catalog archivelog '/arch1/xxx_1_47854_801075830.dbf';
catalog archivelog '/arch1/xxx_1_47855_801075830.dbf';
}
手動註冊後備份成功,但為什麼歸檔資訊沒有正確的保留在控制檔案中,接下來做進一步分析。
首先檢查資料庫的alert日誌,發現在備份任務失敗前出現如下警告:
WARNING: You are creating/reusing datafile /dev/rlvims_control1.
WARNING: Oracle recommends creating new datafiles on devices with zero offset. The command "/usr/sbin/mklv -y LVname -T O -w n -s n -r n VGname NumPPs" can be used. Please contact Oracle customer support for more details.
WARNING: You are creating/reusing datafile /dev/rlvims_control1.
WARNING: Oracle recommends creating new datafiles on devices with zero offset. The command "/usr/sbin/mklv -y LVname -T O -w n -s n -r n VGname NumPPs" can be used. Please contact Oracle customer support for more details.
WARNING: You are creating/reusing datafile /dev/rlvims_control2.
WARNING: Oracle recommends creating new datafiles on devices with zero offset. The command "/usr/sbin/mklv -y LVname -T O -w n -s n -r n VGname NumPPs" can be used. Please contact Oracle customer support for more details.
WARNING: You are creating/reusing datafile /dev/rlvims_control2.
WARNING: Oracle recommends creating new datafiles on devices with zero offset. The command "/usr/sbin/mklv -y LVname -T O -w n -s n -r n VGname NumPPs" can be used. Please contact Oracle customer support for more details.
WARNING: You are creating/reusing datafile /dev/rlvims_control3.
WARNING: Oracle recommends creating new datafiles on devices with zero offset. The command "/usr/sbin/mklv -y LVname -T O -w n -s n -r n VGname NumPPs" can be used. Please contact Oracle customer support for more details.
WARNING: You are creating/reusing datafile /dev/rlvims_control3.
WARNING: Oracle recommends creating new datafiles on devices with zero offset. The command "/usr/sbin/mklv -y LVname -T O -w n -s n -r n VGname NumPPs" can be used. Please contact Oracle customer support for more details.
Expanded controlfile section 28 from 564 to 1128 records
Requested to grow by 564 records; added 3 blocks of records
這個警告資訊是在AIX平台下,沒有將資料庫檔案放置在零位移的raw logical volumes裝置上,下面對這個錯誤進行驗證:
xxx1>dbfsize /dev/rlvims_control1
Database file: /dev/rlvims_control1
Database file type: raw device
Database file size: 1866 16384 byte blocks
xxx1>dbfsize /dev/rlvims_control2
Database file: /dev/rlvims_control2
Database file type: raw device
Database file size: 1866 16384 byte blocks
xxx1>dbfsize /dev/rlvims_control3
Database file: /dev/rlvims_control3
Database file type: raw device
Database file size: 1866 16384 byte blocks
發現控制檔案所在的裝置的確存在位移,如果沒有位移,會提示Database file type: raw device without 4K starting offset。在AIX中,不同的vg類型決定了不同的lv結構,original valume group在資料落地的時候會產生4k的位移量,邏輯卷前 4k 用於儲存 control block (LVCB),big volume group在建立裸裝置時可以指定參數消除位移,scalable volume group不會產生位移,所以,在支援scalable volume group的系統中,一定要建立scalable volume group。
下面對卷組的類型進行確認:
VOLUME GROUP: vgims VG IDENTIFIER: 00f7614c00004c000000013b4a54f3b1
VG STATE: active PP SIZE: 256 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 3388 (867328 megabytes)
MAX LVs: 512 FREE PPs: 488 (124928 megabytes)
LVs: 68 USED PPs: 2900 (742400 megabytes)
OPEN LVs: 66 QUORUM: 7 (Enabled)
TOTAL PVs: 12 VG DESCRIPTORS: 12
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 12 AUTO ON: no
Concurrent: Enhanced-Capable Auto-Concurrent: Disabled
VG Mode: Concurrent
Node ID: 1 Active Nodes: 2
MAX PPs per VG: 130048
MAX PPs per PV: 1016 MAX PVs: 128
LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
PV RESTRICTION: none INFINITE RETRY: no
標註的部分標明,該卷組是big volume group,所以要消除4k位移量,需要在建立lv的時候指定-T O參數。不巧,安裝這套資料庫的DBA沒有指定這個參數,導致控制檔案放置在了存在4k位移量的裝置上。
使用存在位移量的裝置存放資料庫檔案,當資料庫檔案的塊大小超過4k時(控制檔案一般為16k),資料庫的塊就可能會進行分裂,跨越lv條帶邊界,這樣在作業系統崩潰或者重啟的時候就很可能導致資料區塊破碎,造成檔案損壞,這是非常危險的。
但是這個警告資訊出現時不是致命的,而且資料庫目前運行正常,不應該導致控制檔案中記錄丟失。資料庫使用的是裸裝置,懷疑會不會是控制檔案增長超過了
lv的大小,下面檢查控制檔案大小
SQL> select CREATION_TIME,CHECKPOINT_TIME,FILESIZE/1024/1024 from v$backup_controlfile_details;
CREATION_TIME CHECKPOINT_TIME FILESIZE/1024/1024
------------------- ------------------- ------------------
2012-12-03 17:03:51 2015-06-30 07:52:47 20.390625
2012-12-03 17:03:51 2015-06-30 09:40:21 20.390625
2012-12-03 17:03:51 2015-06-30 12:21:55 20.390625
2012-12-03 17:03:51 2015-06-30 14:07:27 20.390625
2012-12-03 17:03:51 2015-07-01 12:21:36 20.390625
2012-12-03 17:03:51 2015-07-01 14:11:47 20.390625
2012-12-03 17:03:51 2015-07-02 12:22:19 20.390625
2012-12-03 17:03:51 2015-07-02 14:13:46 20.390625
2012-12-03 17:03:51 2015-07-03 12:26:32 20.390625
2012-12-03 17:03:51 2015-07-03 17:30:12 20.390625
2012-12-03 17:03:51 2015-07-04 12:22:04 20.390625
CREATION_TIME CHECKPOINT_TIME FILESIZE/1024/1024
------------------- ------------------- ------------------
2012-12-03 17:03:51 2015-07-05 12:22:45 20.390625
2012-12-03 17:03:51 2015-07-06 12:21:33 20.390625
2012-12-03 17:03:51 2015-07-07 07:45:44 20.390625
2012-12-03 17:03:51 2015-07-07 09:35:59 20.390625
2012-12-03 17:03:51 2015-07-08 07:46:13 20.390625
2012-12-03 17:03:51 2015-07-08 09:58:12 20.390625
2012-12-03 17:03:51 2015-07-08 12:21:47 20.390625
2012-12-03 17:03:51 2015-07-08 16:21:07 20.390625
2012-12-03 17:03:51 2015-07-09 12:21:48 20.390625
2012-12-03 17:03:51 2015-07-10 12:21:58 20.390625
2012-12-03 17:03:51 2015-07-11 12:22:24 20.390625
CREATION_TIME CHECKPOINT_TIME FILESIZE/1024/1024
------------------- ------------------- ------------------
2012-12-03 17:03:51 2015-07-12 12:22:41 20.390625
2012-12-03 17:03:51 2015-07-13 12:21:53 20.390625
2012-12-03 17:03:51 2015-07-14 12:22:02 20.390625
2012-12-03 17:03:51 2015-07-15 05:00:43 20.390625
2012-12-03 17:03:51 2015-07-15 11:37:27 27.265625
2012-12-03 17:03:51 2015-07-16 06:23:44 29.171875
2012-12-03 17:03:51 2015-07-16 10:37:40 29.171875
2012-12-03 17:03:51 2015-07-16 12:21:54 29.171875
控制檔案不到30M,存放控制檔案的lv為1G,因此上面的假設不成立。
再繼續分析alert日誌,在每次備份失敗前,會連續出現幾次ALTER SYSTEM ARCHIVE LOG,然後就出現上文提到的警告資訊,仔細觀察警告資訊,每次警告資訊出現都是伴隨著Expanded controlfile section。所以懷疑是由於備份時導致控制檔案增長,觸發上述錯誤,且底層可能伴隨著塊分裂,這時導致控制檔案資訊丟失或是備份進程不能正確讀取控制檔案資訊,觸發ORA-19571。
臨時的調整方案可以將註冊歸檔檔案的動作寫入到備份指令碼中,或者在出現報錯後手動註冊歸檔資訊,再進行備份。
--------------------------------------推薦閱讀 --------------------------------------
RMAN 配置歸檔日誌刪除策略
Oracle基礎教程之通過RMAN複製資料庫
RMAN備份策略制定參考內容
RMAN備份學習筆記
OracleDatabase Backup加密 RMAN加密
--------------------------------------分割線 --------------------------------------