今天遇到一怪問題,納服的哥們又打過來說我們的 data pump 進程沒啟動,導致資料無法同步到對端,
因為以往 data pump 進程沒啟動原因很簡單,總是報 WARNING OGG-01223 TCP/IP error 146 (Connection refused).
出現這種錯誤,要麼是因為網路不通,要麼是因為對端的 manager 進程未啟動。
和納服哥們一起核實後發現該對端進程正常運行,且網路也是通的,自己還不信,登上對端的機器發現,的確如其所說。
嘗試重啟本地的 data pump 進程發現,進程的 lag 只增不減,進程 report 和 ggserr 日誌中不斷輸出
WARNING OGG-01223 Cannot find executable file './server'. 錯誤資訊
GGSCI (bjyschxzg1) 4> view report PZJ_NF1
2013-06-19 00:01:30 WARNING OGG-01223 Cannot find executable file './server'.
2013-06-19 00:01:30 WARNING OGG-01223 Cannot find executable file './server'.
2013-06-19 00:01:30 WARNING OGG-01223 Cannot find executable file './server'.
2013-06-19 00:01:30 WARNING OGG-01223 Cannot find executable file './server'.
2013-06-19 00:01:30 WARNING OGG-01223 Cannot find executable file './server'.
2013-06-19 00:01:30 WARNING OGG-01223 Cannot find executable file './server'.
bjyschxzg1:/home/oracle/ggs$tail -f ggserr.log
2013-06-19 00:15:47 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle, pzj_nf6.prm: Cannot find executable file './server'.
2013-06-19 00:15:47 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle, pzj_nf6.prm: Cannot find executable file './server'.
2013-06-19 00:15:47 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle, pzj_nf6.prm: Cannot find executable file './server'.
2013-06-19 00:15:47 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle, pzj_nf6.prm: Cannot find executable file './server'.
2013-06-19 00:15:47 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle, pzj_nf6.prm: Cannot find executable file './server'.
2013-06-19 00:15:48 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle, pzj_nf6.prm: Cannot find executable file './server'.
2013-06-19 00:15:48 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle, pzj_nf6.prm: Cannot find executable file './server'.
2013-06-19 00:15:48 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle, pzj_nf6.prm: Cannot find executable file './server'.
GoldenGate 中 Data Pump 進程只負責將 redo 資料流傳輸到 target 端,並不負責將 redo 資料流寫入 target 的本地磁碟檔案,這部分工作是由
target 端的 mgr 進程自動啟動的 collector 進程負責。
官方文檔如下描述 collector 進程:
The Collector process operates on the target system to receive incoming data and write it to the trail.
Dynamic Collector
Typically, Oracle GoldenGate users do not interact with the Collector process. It is started
dynamically by the Manager process. This is known as a dynamic collector.
Static Collector
You can run a static Collector manually by running the SERVER program at the command
line with the following syntax and input parameters as shown:
server <parameter> [<parameter>] [...]
由於我們的環境中都是使用動態 collector,正常情況下 mgr 啟動時會調用 ggs 執行個體的 home 目錄下的 server 二進位檔案啟動 collector 進程。
ggs home 下的 server 二進位檔案
bjyscsjqz:/home/oracle/ggs$ls -lt server
-rwxr-x---. 1 oracle oinstall 13757119 Aug 24 2012 server
mgr 調用 ggs home 下的 server 二進位檔案啟動的 collector 進程
localhost.localdomain:/home/oracle$ps -ef | grep goldengate | grep -v grep | grep server
oracle 11035 10883 0 11:00 ? 00:00:02 ./server -w 300 -p 7815-8000 -m 7809 -k -l /goldengate/ggs/ggserr.log
登入目標端的納服資料庫主機發現目標端的 ggs home下無該 server 二進位檔案
bjyscnfdbnfzc01:/home/oracle/ggs$ls -lt server
ls: cannot access server: No such file or directory
同時儘管 mgr 進程已經啟動,但實際並未啟動 collector 進程,這就是源端的 data pump 進程報錯並掛起的原因。
bjyscnfdbnfzc01:/home/oracle/ggs$ps -ef | grep goldengate
oracle 29790 28264 34 15:34 ? 00:09:27 ./mgr PARAMFILE /oracle/oradata4/goldengate/dirprm/mgr.prm REPORTFILE /oracle/oradata4/goldengate/dirrpt/MGR.rpt PROCESSID MGR PORT 7809
oracle 29794 29790 0 15:34 ? 00:00:01 /oracle/oradata4/goldengate/extract PARAMFILE /oracle/oradata4/goldengate/dirprm/extzj_mh.prm REPORTFILE /oracle/oradata4/goldengate/dirrpt/EXTZJ_MH.rpt PROCESSID EXTZJ_MH USESUBDIRS
oracle 29803 29790 0 15:34 ? 00:00:01 /oracle/oradata4/goldengate/extract PARAMFILE /oracle/oradata4/goldengate/dirprm/pmpzj_mh.prm REPORTFILE /oracle/oradata4/goldengate/dirrpt/PMPZJ_MH.rpt PROCESSID PMPZJ_MH USESUBDIRS
oracle 29807 29790 0 15:34 ? 00:00:08 /oracle/oradata4/goldengate/replicat PARAMFILE /oracle/oradata4/goldengate/dirprm/rzj_nf1.prm REPORTFILE /oracle/oradata4/goldengate/dirrpt/RZJ_NF1.rpt PROCESSID RZJ_NF1 USESUBDIRS
oracle 29812 29790 0 15:34 ? 00:00:03 /oracle/oradata4/goldengate/replicat PARAMFILE /oracle/oradata4/goldengate/dirprm/rzj_nf6.prm REPORTFILE /oracle/oradata4/goldengate/dirrpt/RZJ_NF6.rpt PROCESSID RZJ_NF6 USESUBDIRS
oracle 31564 31146 0 16:01 pts/2 00:00:00 grep goldengate
針對這個問題,MOS 文章 [ID 1550203.1] 對其原因描述如下:
Cause
message could be caused by
- "server" binary in TARGET $GG_HOME is missing or has the incorrect permissions
- GoldenGate manager in TARGET environment is unable to launch new server collector processes (hung process)
Solution
1. Check "server" binary is located in TARGET $GG_HOME and with correct permissions like:
[ogg@gglnx1 gg]$ ls -lrt server
-rwxr-x---. 1 ogg ogg 13619841 Apr 3 23:32 server
2. Stop/start GoldenGate manager on TARGET environment
3. Start remote Data Pump Extract(s).
轉載請註明作者出處及原文連結,否則將追究法律責任:
作者:xiangsir
原文連結:http://blog.csdn.net/xiangsir/article/details/9122057
QQ:444367417
MSN:xiangsir@hotmail.com