有老朋友找到我,說一個客戶的資料庫異常,問題是asm無法正常mount,提示是缺少兩塊磁碟.問我是否可以恢複.因為是內網環境,通過他那邊發過來的零零散散的資訊,大概分析如下
asm alert日誌報錯
ERROR: diskgroup DGROUP1 was not mounted
Fri Aug 12 16:03:12 EAT 2016
SQL> alter diskgroup DGROUP1 mount
Fri Aug 12 16:03:12 EAT 2016
NOTE: cache registered group DGROUP1 number=1 incarn=0xf6781b5c
Fri Aug 12 16:03:12 EAT 2016
NOTE: Hbeat: instance first (grp 1)
Fri Aug 12 16:03:16 EAT 2016
NOTE: start heartbeating (grp 1)
Fri Aug 12 16:03:16 EAT 2016
NOTE: cache dismounting group 1/0xF6781B5C (DGROUP1)
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DGROUP1 was not mounted
前台嘗試mount asm 磁碟組報錯ORA-15042
從這裡可以明顯的看出來asm 磁碟組無法正常mount,是由於缺少asm disk 15,16.如果想恢複asm,最好的方法就是找出來這兩個磁碟.通過kfed對現在的磁碟進行分析,最後我們發現asm disk 14對應的磁碟為disk160,,asm disk 17對應的disk163,根據第一感覺很可能是disk161和disk161兩塊盤異常,讓機房檢查硬體無任何警示
OS層面分析
省略和本次結論無關的記錄
ls -l /dev/rdisk
crw-rw---- 1 oracle dba 13 0x000070 Jan 1 2016 disk160
crw-rw---- 1 oracle dba 13 0x000073 Jan 1 2016 disk163
ls -l /dev/disk
brw-r----- 1 bin sys 1 0x000070 Jan 13 2015 disk160
brw-r----- 1 bin sys 1 0x000071 Jan 13 2015 disk161
brw-r----- 1 bin sys 1 0x000072 Jan 13 2015 disk162
brw-r----- 1 bin sys 1 0x000073 Jan 13 2015 disk163
這裡我們發現在hp unix中/dev/disk下面磁碟都存在,但是/dev/rdisk下面丟失,通過ioscan相關命令繼續分析
ioscan -fNnkC disk
disk 160 64000/0xfa00/0x70 esdisk CLAIMED DEVICE HP OPEN-V
/dev/disk/disk160 /dev/rdisk/disk160
disk 161 64000/0xfa00/0x71 esdisk CLAIMED DEVICE HP OPEN-V
/dev/disk/disk161
disk 162 64000/0xfa00/0x72 esdisk CLAIMED DEVICE HP OPEN-V
/dev/disk/disk162
disk 163 64000/0xfa00/0x73 esdisk CLAIMED DEVICE HP OPEN-V
/dev/disk/disk163 /dev/rdisk/disk163
這裡我們基本上可以確定是/dev/rdisk下面的盤發生丟失.進一步分析,因為rdisk是彙總後的盤符,那我們分析彙總前的盤符是否正常
ioscan -m dsf
/dev/rdisk/disk160 /dev/rdsk/c29t12d4
/dev/rdsk/c28t12d4
/dev/rdisk/disk163 /dev/rdsk/c29t12d7
/dev/rdsk/c28t12d7
ls -l /dev/rdsk
crw-r----- 1 bin sys 188 0x1dc000 Apr 22 2014 c29t12d0
crw-r----- 1 bin sys 188 0x1dc100 Apr 22 2014 c29t12d1
crw-r----- 1 bin sys 188 0x1dc300 Jan 13 2015 c29t12d3
crw-r----- 1 bin sys 188 0x1dc400 Jan 13 2015 c29t12d4
crw-r----- 1 bin sys 188 0x1dc500 Jan 13 2015 c29t12d5
crw-r----- 1 bin sys 188 0x1dc600 Jan 13 2015 c29t12d6
crw-r----- 1 bin sys 188 0x1dc700 Jan 13 2015 c29t12d7
crw-r----- 1 bin sys 188 0x1cc100 Apr 22 2014 c28t12d1
crw-r----- 1 bin sys 188 0x1cc300 Jan 13 2015 c28t12d3
crw-r----- 1 bin sys 188 0x1cc400 Jan 13 2015 c28t12d4
crw-r----- 1 bin sys 188 0x1cc500 Jan 13 2015 c28t12d5
crw-r----- 1 bin sys 188 0x1cc600 Jan 13 2015 c28t12d6
crw-r----- 1 bin sys 188 0x1cc700 Jan 13 2015 c28t12d7
通過這裡我們基本上可以大概判斷出來/dev/rdsk/c28t12d5,/dev/rdsk/c28t12d6,/dev/rdsk/c29t12d5,/dev/rdsk/c29t12d6就是我們需要找的/dev/rdisk/disk161和disk162的彙總之前的盤符.也就是說,現在我們判斷只有/dev/rdisk下面的字元裝置有問題,其他均正常.
通過系統命令修複異常
insf -e -H 64000/0xfa00/0x71
insf -e -H 64000/0xfa00/0x72
hp-asm-disk
現在已經可以正常看到/dev/rdisk/disk161和/dev/rdisk/disk162盤符,初步判斷,os層面盤符已經恢複正常.修改磁碟許可權和所屬組
chmod 660 /dev/rdisk/disk161
chmod 660 /dev/rdisk/disk162
chown oracle:dba /dev/rdisk/disk161
chown oracle:dba /dev/rdisk/disk162
正常啟動asm,mount磁碟組,open資料庫
asm-mount
這次的恢複,主要是從作業系統層面判斷解決問題,從而實現資料庫完美恢複,資料0丟失.有類似恢複案例