For performance problems or some of the more bizarre problems, there are many points you can start to analyze.
Ready to write a series about using Ash/dba_hist_active_sess_history, with the Oradebug, with the Linux command strace,pstack or use and so on and so on tools ~ ~ to induce some ideas, That is, when all the results of the analysis so far have no clue, then how to do in-depth troubleshooting.
For example, when a SQL execution is slow, look at the SQL history wait event, look at the SQL execution plan, but if there is nothing particularly obvious waiting, or the execution plan seems to be OK, feel that the SQL should be performed properly, why do it so slow? Then you can use the next oradebug to analyze the call stack of the process, or the state dump of the session to see something, to analyze the correct results and root cause, Of course, the history of the session of the waiting time Ash is also worth to delve into, the implementation plan is a very big topic, so this series is mainly for their own work in the future encounter similar problems, or no ideas, you can have some clues ~ ~
==============================================
First write a use pstack and Strace Analysis Goldengate issue as series of the first article ~
Here is the database does not have any problem, Goldengate also see what is wrong, feel not to know the light of a bling bling to solve the problem ~
----------------------------------START--------------------------------------------------------
Receive an alert:
Delayed the 3hour+ and went to Ggsci to see
Ggsci (ora-bi-p-5.va2.b2c.nike.com) 2>!
Info R4_ot
Replicat R4_ot last Started 2016-11-02 05:23 Status RUNNING
INTEGRATED
Checkpoint Lag 00:00:00 (updated 00:00:04 ago)
Process ID 27164
Log Read Checkpoint file/gg/app/ggadmin/product/12.1.2.1.0/dirdat/replicate/pdot/r4000020
2016-11-02 05:15:15.156360 RBA 535238535
(Lag R4_ot output also shows no lag)
There is no lag, but the information in the database said there is lag, it will not be the process hang, first go to the database to see the status of the Apply process
Select Apply_name,null delay_min from dba_apply where status<> ' ENABLED '
Union
Select Capture_name,null from dba_capture where status<> ' ENABLED '
Union
Select Ac.apply_name, round ((sysdate-applied_message_create_time) *1440)
From Dba_apply_progress ap,gv$streams_apply_coordinator AC
where Ac.apply_name=ap.apply_name
;
Apply_name Delay_min
------------------------------ ----------
Ogg$r4_ot 247
So look at Ggserr.log, view report and so did not see what come, because there is no error!!!
What to do at this time, the database said that the apply process has been lag for three hours, but goldengate inside did not see any error log,
Of course, I have tried several times the info R4_ot,rba has not changed, perhaps the source of no business in running so r4_ot RBA not change is also possible ~
Suddenly thought to analyze the status of the process, is not what wait.
Advanced to the database inside did not see Ggadmin user wait event what, but no ah, so now the state is Ogg also did not error, also show no lag, in the database also no corresponding transaction in running, then from where to start?
~~~~~~~~~~~~~~~~~~~~~~~~~
And then I thought about doing a pstack to track the status of the Ogg R4_ot spid.
[Email protected]:D Cbiprd5:/gg/app/ggadmin/product/12.1.2.1.0/dirdat/replicate/pdot$pstack 27164
Thread 5 (thread 0x7f716ae20700 (LWP 27165)):
#0 0x00000038c700b68c in [email protected] @GLIBC_2.3.2 () from/lib64/libpthread.so.0
#1 0x0000000000692e91 in ggs::gglib::multithreading::event::wait (unsigned int) ()
#2 0x00007f717105a69c in Ggs::gglib::gglog::loggingtime::timethread (void*) () from/gg/app/ggadmin/product/ 12.1.2.1.0/libgglog.so
#3 0x00000000006908e4 in Ggs::gglib::multithreading::thread::runthread (ggs::gglib::multithreading::thread:: threadargs*) ()
#4 0x00000038c7007aa1 in Start_thread () from/lib64/libpthread.so.0
#5 0x00000038c6ce893d in Clone () from/lib64/libc.so.6
Thread 4 (thread 0x7f716a61f700 (LWP 27166)):
#0 0x00000038c700b68c in [email protected] @GLIBC_2.3.2 () from/lib64/libpthread.so.0
#1 0x0000000000692e91 in ggs::gglib::multithreading::event::wait (unsigned int) ()
#2 0x00007f71710c0f16 in Ggs::gglib::gglog::logbufferimpl::P ublisherthread (void*) () from/gg/app/ggadmin/product/ 12.1.2.1.0/libgglog.so
#3 0x00000000006908e4 in Ggs::gglib::multithreading::thread::runthread (ggs::gglib::multithreading::thread:: threadargs*) ()
#4 0x00000038c7007aa1 in Start_thread () from/lib64/libpthread.so.0
#5 0x00000038c6ce893d in Clone () from/lib64/libc.so.6
Thread 3 (thread 0x7f71694e3700 (LWP 27167)):
#0 0x00000038c700ba5e in [email protected] @GLIBC_2.3.2 () from/lib64/libpthread.so.0
#1 0x0000000000692e4d in ggs::gglib::multithreading::event::wait (unsigned int) ()
#2 0x00007f71710bcbaf in Ggs::gglib::gglog::D omconfiguratorimpl::watchthread (void*) () from/gg/app/ggadmin/product/ 12.1.2.1.0/libgglog.so
#3 0x00000000006908e4 in Ggs::gglib::multithreading::thread::runthread (ggs::gglib::multithreading::thread:: threadargs*) ()
#4 0x00000038c7007aa1 in Start_thread () from/lib64/libpthread.so.0
#5 0x00000038c6ce893d in Clone () from/lib64/libc.so.6
Thread 2 (thread 0x7f7163fff700 (LWP 27168)):
#0 0x00000038c700ba5e in [email protected] @GLIBC_2.3.2 () from/lib64/libpthread.so.0
#1 0x0000000000692e4d in ggs::gglib::multithreading::event::wait (unsigned int) ()
#2 0x00007f71710a3e1a in Cmessagefactoryimpl::repetitionthread (void*) () from/gg/app/ggadmin/product/12.1.2.1.0/ Libgglog.so
#3 0x00000000006908e4 in Ggs::gglib::multithreading::thread::runthread (ggs::gglib::multithreading::thread:: threadargs*) ()
#4 0x00000038c7007aa1 in Start_thread () from/lib64/libpthread.so.0
#5 0x00000038c6ce893d in Clone () from/lib64/libc.so.6
Thread 1 (thread 0x7f716ae23a20 (LWP 27164)):
#0 0x00000038c700f00d in Nanosleep () from/lib64/libpthread.so.0
#1 0x0000000000638330 in Ggdelaymsecs (long) ()
#2 0x000000000057110e in Wait_for_more (short) ()
#3 0x00000000005a539d in Process_replicat_loop () ()
#4 0x00000000005bd51a in Replicat_main (int, char**) ()
#5 0x00000000006905af in Ggs::gglib::multithreading::mainthread::execmain () ()
#6 0x00000000006908e4 in Ggs::gglib::multithreading::thread::runthread (ggs::gglib::multithreading::thread:: threadargs*) ()
#7 0x00000000006909eb in Ggs::gglib::multithreading::mainthread::run (int, char**) ()
#8 0x00000000005bc1af in Main ()
Obviously you can see that the process is wait!!!!
Make a strace and look deeper.
Process 27164 attached
Restart_syscall (< resuming interrupted call ...>) = 0
Futex (0x281ca64, Futex_wake_op_private, 1, 1, 0x281ca60, {futex_op_set, 0, FUTEX_OP_CMP_GT, 1}) = 1
Futex (0x281ca38, futex_wake_private, 1) = 0
Stat ("/etc/localtime", {st_mode=s_ifreg|0644, st_size=118, ...}) = 0
Stat ("/etc/localtime", {st_mode=s_ifreg|0644, st_size=118, ...}) = 0
Stat ("/etc/localtime", {st_mode=s_ifreg|0644, st_size=118, ...}) = 0
Select (7, [6], NULL, NULL, {0, 0}) = 0 (Timeout)
Lseek (534773760, seek_set) = 534773760
Read ("Details</b></p><li>flybeam struc" ..., 1048576) = 477287
Read (14, "", 1048576) = 0
Lseek (0, Seek_set) = 0
Read ("F\0\7f0\0\3\2430\0\0\10gg\r\ntl\n\r1\0\0\2\0\0042\0\0\4" ..., 1048576) = 1048576
Lseek (1048576, seek_set) = 1048576
Lseek (534773760, seek_set) = 534773760
Read ("Details</b></p><li>flybeam struc" ..., 1048576) = 477287
Stat ("/gg/app/ggadmin/product/12.1.2.1.0/dirdat/replicate/pdot/r4000021", 0x7ffcac69c610) = 1 ENOENT (No such file or Directory
Nanosleep ({1, 0}, NULL) = 0
-------------------------------------------
See stat ("/gg/app/ggadmin/product/12.1.2.1.0/dirdat/replicate/pdot/r4000021", 0x7ffcac69c610) =-1 ENOENT (No Such file or directory)
At a glance, the next trail R4_ot to read is not found.
See what Trail is
Ggsci (ora-bi-p-5.va2.b2c.nike.com) 2> info r4_ot
Replicat R4_ot last Started 2016-11-02 08:43 Status RUNNING
INTEGRATED
Checkpoint Lag 00:00:00 (updated 00:00:02 ago)
Process ID 7212
Log Read Checkpoint file/gg/app/ggadmin/product/12.1.2.1.0/dirdat/replicate/pdot/r4000020
2016-11-02 08:51:11.956186 RBA 535238535
Ggsci (ora-bi-p-5.va2.b2c.nike.com) 3> sh ls-lrt/gg/app/ggadmin/product/12.1.2.1.0/dirdat/replicate/pdot/r4*
-rw-r-----. 1 ggadmin oinstall 1023999938 Oct 16:41/gg/app/ggadmin/product/12.1.2.1.0/dirdat/replicate/pdot/r4000018
-rw-r-----. 1 ggadmin oinstall 1023999769 Nov 1 18:53/gg/app/ggadmin/product/12.1.2.1.0/dirdat/replicate/pdot/r4000019
-rw-r-----. 1 ggadmin oinstall 535251047 Nov 2 03:57/gg/app/ggadmin/product/12.1.2.1.0/dirdat/replicate/pdot/r4000020
-rw-r-----. 1 ggadmin oinstall 204917668 Nov 2 05:23/gg/app/ggadmin/product/12.1.2.1.0/dirdat/replicate/pdot/r4000346
-rw-r-----. 1 ggadmin oinstall 1264 Nov 2 05:23/gg/app/ggadmin/product/12.1.2.1.0/dirdat/replicate/pdot/r4000347
-rw-r-----. 1 ggadmin oinstall 59060543 Nov 2 08:49/gg/app/ggadmin/product/12.1.2.1.0/dirdat/replicate/pdot/r4000348
It is obvious that the serial number seqno is not continuous, why this situation (first not considered, first repaired)
So alter R4_ot extseqno 0346 Extrba 0
Of course, after the Ogg alter there have been some problems, but also successfully resolved, here will skip the talk.
The main thing is actually to share the thought process:
Received the alarm-"Ogg did not see the problem-" the database of Ogg user is not running R4_ot related affairs, but is Apply_process lag-"through strace/pstack smooth troubleshooting to the problem cause, thereby solving the problem.
Keep thinking~~
=============ended============================
Oradebug/strace/pstack and other Analysis database performance Problem series one