OGG extract process checkpoint time back to 00:00:00 troubleshooting, checkpoint
1. Fault
The Extract process (SEXTR01) status is running, but the Lag at Chkpt has reached more than five hours, and the time has been increasing. No new logs are extracted. The status information is as follows:
GGSCI (calladgdb) 21> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING Extract running dpeywgl 00:00:00 00:00:04 Extract running SEXTR0105:14:5800:00:03 |
2. Fault Analysis 2.1 debug the Process status information
GGSCI (calladgdb) 23> info sextr01, showch debug EXTRACT SEXTR01 Last Started Status RUNNING Checkpoint Lag 05:21:26 (updated 00:00:04 ago) Log Read Checkpoint Oracle Redo Logs 18:05:16 Thread 1, Seqno 5554, RBA 611373536 SCN 3126.136656462 (13426204423758) Log Read Checkpoint Oracle Redo Logs 00:00:00 Thread 2, Seqno 4016, RBA 1024 SCN44527.110828288 (191242119617280) Current Checkpoint Detail: Read Checkpoint #1 ...... Read Checkpoint #2 Oracle Threaded Redo Log Startup Checkpoint (starting position in the data source ): Thread #: 2 Sequence #: 4015 RBA: 640984080 Timestamp: 18:03:51. 000000 SCN: 3126.136602577 (13426204369873) Redo File: + DGROUP1/calldb/onlinelog/group_15.267.799671_1 Recovery Checkpoint(Position of oldest unprocessed transaction in the data source ): Thread #: 2 Sequence #: 4015 RBA: 640984080 Timestamp: 18:03:51. 000000 SCN: 3126.136602577 (13426204369873) Redo File: + DGROUP1/calldb/onlinelog/group_15.267.799671_1 Current Checkpoint (position of last record read in the data source ): Thread #: 2 Sequence #: 4016 RBA: 1024 Timestamp: 1988-01-01 00:00:00. 000000 SCN: 44527.110828288 (191242119617280) Redo File: + DGROUP1/calldb/onlinelog/group_16.266.799671_3 ...... |
An incredible problem is found here:
Thread #: The timestamp of the current checkpoint of 2 is actuallyThe decimal number of 1988-01-0100:00:00. 000000 and SCN is 191242119617280, which is far greater than the SCN of the current database and has one more digit.
Time regresses. Instead, the SCN is surging, and the time is so personalized. This question is quite interesting.
2.2 view the report of the extract Process
View report sextr01 is used to view the process report, prompting that there is an ERROR. thread #2 4016 archive log cannot be opened. (Because I did not copy the information, the current status can only be described in this way), but the actual physical file of archivelog No. 4016 exists, and the permissions are normal.
2.3 stop the process and showch its status.
GGSCI (calladgdb) 24> stop sextr01 GGSCI (calladgdb) 25>Info SEXTR01, showch EXTRACT SEXTR01 Last Started Status STOPPED Checkpoint Lag 06:56:55 (updated 00:00:58 ago) Log Read Checkpoint Oracle Redo Logs 18:05:16 Thread 1, Seqno 5554, RBA 611373536 SCN 3126.136656462 (13426204423758) Log Read Checkpoint Oracle Redo Logs 00:00:00 Thread 2, Seqno 4016, RBA 1024 SCN 44527.110828288 (191242119617280) Current Checkpoint Detail: Read Checkpoint #1 ...... Read Checkpoint #2 Oracle Threaded Redo Log Startup Checkpoint (starting position in the data source ): Thread #: 2 Sequence #: 4015 RBA: 640984080 Timestamp: 18:03:51. 000000 SCN: 3126.136602577 (13426204369873) Redo File: + DGROUP1/calldb/onlinelog/group_15.267.799671_1 Recovery Checkpoint(Position of oldest unprocessed transaction in the data source ): Thread #: 2 Sequence #: 4015 RBA: 640984080 Timestamp: 18:03:51. 000000 SCN: 3126.136602577 (13426204369873) Redo File: + DGROUP1/calldb/onlinelog/group_15.267.799671_1 Current Checkpoint (position of last record read in the data source ): Thread #: 2 Sequence #: 4016 RBA: 1024 Timestamp: 1988-01-01 00:00:00. 000000 SCN: 44527.110828288 (191242119617280) Redo File: + DGROUP1/calldb/onlinelog/group_16.266.799671_3 ...... |
3. analyze the cause
The checkpoint time of thread #2 of the process is 00:00:00. 000000, it is definitely not normal. Combined with the error information, the 4016 archiving log file cannot be found, and the sequence #: 4016 of the current checkpoint thread #2 of the process, while RBA is 1024.
Since RBA has been running 1024, archivelog cannot be found. If the archive log file No. 4016 has not been extracted, RBA should be 0.
Based on this analysis, we decided to manually restore the chkeckpoint of the extract process to obtain the recover chkeckpoint, and adjust next extseqno to RBA No. 0 of 4016 to see if it can be solved.
4. Solution 4.1: Adjust netxt sequno to 4016, and RBA to 0.
GGSCI (calladgdb) 2> alter SEXTR01, thread 2, extseqno 4016, extrba 0
4.2 adjust ioextseqno and ioextrba to the rechvercheckpoint status of the process
GGSCI (calladgdb) 3> alter SEXTR01, thread 2, ioextseqno 4015, ioextrba 640984080
4.3 verify the Adjustment Result
GGSCI (calladgdb) 4> info sextr01, showch
EXTRACT SEXTR01 Initialized 2015-01-26 0:48 Status STOPPED Checkpoint Lag 07:43:41 (updated 00:00:51 ago) Log Read Checkpoint Oracle Redo Logs 18:05:16 Thread 1, Seqno 5554, RBA 611373536 SCN 3126.136656462 (13426204423758) Log Read Checkpoint Oracle Redo Logs 00:00:00 Thread 2, Seqno 4016, RBA 0 SCN 0.0 (0) Current Checkpoint Detail: Read Checkpoint #1 ...... Read Checkpoint #2 Oracle Threaded Redo Log Startup Checkpoint (starting position in the data source ): Thread #: 2 Sequence #: 4016 RBA: 0 Timestamp: 1988-01-01 00:00:00. 000000 SCN: Not available Redo File: + DGROUP1/calldb/onlinelog/group_16.266.799671_3 Recovery Checkpoint (position of oldest unprocessed transaction in the data source ): Thread #: 2 Sequence #: 4015 RBA: 640984080 Timestamp: SCN: Not available Redo File: Current Checkpoint (position of last record read in the data source ): Thread #: 2 Sequence #: 4016 RBA: 0 Timestamp: 1988-01-01 00:00:00. 000000 SCN: Not available Redo File: + DGROUP1/calldb/onlinelog/group_16.266.799671_3 ...... |
The sequence # Of Start checkpoint and current chkeckpoint is changed to 4016, And the RBA number is changed to 0.
4.4 start the extract process sextr01 and perform brreset operations
GGSCI (calladgdb) 6>StartSEXTR01, brreset
GGSCI (calladgdb) 6> start SEXTR01, brreset Sending START request to MANAGER... EXTRACT SEXTR01 starting GGSCI (calladgdb) 7> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING Extract running dpeywgl 00:00:00 00:00:01 Extract starting SEXTR01 07:43:41 00:03:47 |
4.5 Display Process status information several times to verify the effect of delay reduction
GGSCI (calladgdb) 10> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING Extract running dpeywgl 04:46:20 00:00:01 Extract running SEXTR0104:23:1300:00:01 GGSCI (calladgdb) 11> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING Extract running dpeywgl 01:53:22 00:00:09 Extract running SEXTR0101:27:2500:00:08 GGSCI (calladgdb) 12> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING Extract running dpeywgl 00:00:00 00:00:01 Extract running SEXTR0100:00:0200:00:00 |
View the Process status through info all several times and see that the lag at chkpt Time drops rapidly. Solve the problem.
Author: LI Junjie (Network Name: Step-by-Step), engaged in "system architecture, operating system, storage device, database, middleware, application" six levels of systematic performance optimization work
Join the system performance optimization professional group to discuss performance optimization technologies. GROUP: 258187244