經過兩次測試,感覺RAC很脆弱。
1.拔除RAC1的public網線,站在RAC2旁邊看變化,發現VIP很快轉換到RAC2,使用者仍然可以使用。
2.1分鐘後,RAC2自動重啟,察看原因是共用盤無法mount,此時另一同事正在config SAN,無法確定是否真的共用盤出了問題。
3.乾脆來個更狠的測試,拔除兩台DB的電源,再插回去,重新開機,發現CRS無法啟動。
[root@racdb02 install]# crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.
4. 在網路上尋找了很多方法,均無效果,metalink上也沒有合適的方法
5.在兩個node上分別RUN root102.sh
[oracle@racdb01 ~]$ /u01/oracle/product/10.2/crs1/install/root102.sh
6. reboot兩台 node
7. [root@racdb01 oracle]# crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.rac.db application ONLINE UNKNOWN racdb02
ora....s1.inst application ONLINE UNKNOWN racdb01
ora....s2.inst application ONLINE UNKNOWN racdb02
ora....esdb.cs application ONLINE OFFLINE
ora....es1.srv application ONLINE OFFLINE
ora....es2.srv application ONLINE OFFLINE
ora....01.lsnr application ONLINE ONLINE racdb01
ora....b01.gsd application ONLINE ONLINE racdb01
ora....b01.ons application ONLINE ONLINE racdb01
ora....b01.vip application ONLINE ONLINE racdb01
ora....02.lsnr application ONLINE ONLINE racdb02
ora....b02.gsd application ONLINE UNKNOWN racdb02
ora....b02.ons application ONLINE UNKNOWN racdb02
ora....b02.vip application ONLINE ONLINE racdb02
8.試圖刪除instance失敗,刪除service racdb失敗,刪除監聽器失敗
9.crs_start -all
無法啟動的Service仍然無法啟動.
10.卸載CRS,重新安裝,升級到1204
11.CRS所有服務正常啟動
12.srvctl add database -d rac -o /u01/oracle/product/10.2/db1/
13.srvctl add instance -d rac -i rac1 -n racdb01
srvctl add instance -d rac -i rac2 -n racdb02
14.在兩個NODE上reboot
15.發現rac2啟動正常,但rac1的instance無法啟動
[oracle@racdb01 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.rac.db application ONLINE ONLINE racdb02
ora....s1.inst application OFFLINE OFFLINE
ora....s2.inst application ONLINE ONLINE racdb02
ora....01.lsnr application ONLINE ONLINE racdb01
ora....b01.gsd application ONLINE ONLINE racdb01
ora....b01.ons application ONLINE ONLINE racdb01
ora....b01.vip application ONLINE ONLINE racdb01
ora....02.lsnr application ONLINE ONLINE racdb02
ora....b02.gsd application ONLINE ONLINE racdb02
ora....b02.ons application ONLINE ONLINE racdb02
ora....b02.vip application ONLINE ONLINE racdb02
16.srvctl remove instance -d rac -i rac1
17.srvctl add instance -d rac -i rac1 -n racdb01
18.試圖啟動rac1的服務
[oracle@racdb01 ~]$ srvctl start instance -d rac -i rac1 -o mount;
PRKP-1001 : Error starting instance rac1 on node racdb01
CRS-1028: Dependency analysis failed because of:
CRS-0223: Resource 'ora.rac.rac1.inst' has placement error.
[oracle@racdb01 ~]$ crs_start ora.rac.rac1.inst
Attempting to start `ora.rac.rac1.inst` on member `racdb01`
`ora.rac.rac1.inst` on member `racdb01` has experienced an unrecoverable failure.
Human intervention required to resume its availability.
CRS-0215: Could not start resource 'ora.rac.rac1.inst'.
[oracle@racdb01 admin]$ crs_start ora.rac.rac1.inst
CRS-1028: Dependency analysis failed because of:
'Resource in UNKNOWN state: ora.rac.rac1.inst'
CRS-0223: Resource 'ora.rac.rac1.inst' has placement error.
19.繼續在網上找資料,折騰了半天,仍無效.
20.在metalink上看到一篇文章,說是TNSNArac.ora有問題
21.查看我的tnsnarac.ora,發現原來設的racdb這個service(for透明故障切換用的)仍在,而重新裝CRS後並未設定,刪除之
22.[oracle@racdb01 admin]$ crs_start ora.rac.rac1.inst
Attempting to start `ora.rac.rac1.inst` on member `racdb01`
Start of `ora.rac.rac1.inst` on member `racdb01` succeeded.
23.[oracle@racdb01 admin]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.rac.db application ONLINE ONLINE racdb02
ora....s1.inst application ONLINE ONLINE racdb01
ora....s2.inst application ONLINE ONLINE racdb02
ora....01.lsnr application ONLINE ONLINE racdb01
ora....b01.gsd application ONLINE ONLINE racdb01
ora....b01.ons application ONLINE ONLINE racdb01
ora....b01.vip application ONLINE ONLINE racdb01
ora....02.lsnr application ONLINE ONLINE racdb02
ora....b02.gsd application ONLINE ONLINE racdb02
ora....b02.ons application ONLINE ONLINE racdb02
ora....b02.vip application ONLINE ONLINE racdb02
終於啟動了,鼓掌.折騰了兩天了!