Ora-00494/aiowait timeout cause instance crashed. (SYSTEM dump analysis case)

Source: Internet
Author: User
Tags dba session id psql oracle database

The error is as follows:

Wed Oct 21 03:06:22 2015
Thread 1 advanced to log sequence 254206 (LGWR switch)
Current log# 9 seq# 254206 mem# 0:/boss_sysdata/oradata/dtvboss/redo09.log
Wed Oct 21 03:17:10 2015
Warning:aiowait timed out 1 times
Wed Oct 21 03:22:11 2015
Errors in FILE/ORACLE/ADMIN/DTVBOSS/BDUMP/DTVBOSS_ARC0_404.TRC:
Ora-00494:enqueue [CF] held for too long (more than 900 seconds) by ' Inst 1, osid 367 '
Wed Oct 21 03:22:17 2015
System state dumped to trace FILE/ORACLE/ADMIN/DTVBOSS/BDUMP/DTVBOSS_ARC0_404.TRC
Wed Oct 21 03:22:50 2015
Errors in FILE/ORACLE/ADMIN/DTVBOSS/BDUMP/DTVBOSS_LGWR_365.TRC:
Ora-00494:enqueue [CF] held for too long (more than 900 seconds) by ' Inst 1, osid 367 '
Wed Oct 21 03:22:55 2015
Killing Enqueue blocker (pid=367) on resource cf-00000000-00000000
By killing session 1648.1
Wed Oct 21 03:22:55 2015
Errors in FILE/ORACLE/ADMIN/DTVBOSS/BDUMP/DTVBOSS_LGWR_365.TRC:
Ora-07445:exception Encountered:core Dump [Ksuklms () +672] [SIGSEGV] [Address not mapped to object] [0x000000062] [] []
Wed Oct 21 03:23:18 2015
Errors in FILE/ORACLE/ADMIN/DTVBOSS/BDUMP/DTVBOSS_PMON_343.TRC:
ORA-00470:LGWR process terminated with error
Wed Oct 21 03:23:18 2015
Pmon:terminating instance due to error 470
Wed Oct 21 03:23:18 2015
ORA-470 encountered when generating server alert SMG-3503
Wed Oct 21 03:23:18 2015
Errors in FILE/ORACLE/ADMIN/DTVBOSS/BDUMP/DTVBOSS_J006_22335.TRC:
Ora-00604:error occurred at recursive SQL level 1
ORA-00470:LGWR process terminated with error
Wed Oct 21 03:23:22 2015
Termination issued to instance processes. Waiting for the processes to exit
Wed Oct 21 03:23:28 2015
Instance termination failed to kill one or more processes
Wed Oct 21 03:23:58 2015
Instance terminated by Pmon, PID = 343
Wed Oct 21 03:24:06 2015
Shutting down instance (abort)
Wed Oct 21 03:25:18 2015
Warning:inbound Connection timed out (ORA-3136)
Wed Oct 21 03:25:18 2015
Warning:inbound Connection timed out (ORA-3136)
Wed Oct 21 03:25:18 2015


1. Analysis of causes of downtime:

Alert Log Analysis:
Wed Oct 21 03:06:22 2015
Thread 1 advanced to log sequence 254206 (LGWR switch)
Current log# 9 seq# 254206 mem# 0:/boss_sysdata/oradata/dtvboss/redo09.log
Wed Oct 21 03:17:10 2015
Warning:aiowait timed out 1 times– the time period system has an alarm for AIO waiting, which is controlled by the _iowait_timeouts hidden parameters, which will cause the system to go down after the value set by this parameter, which does not exceed the system setting value. However, it was suggested that the system was having problems with IO scheduling.
Wed Oct 21 03:22:11 2015
Errors in FILE/ORACLE/ADMIN/DTVBOSS/BDUMP/DTVBOSS_ARC0_404.TRC:
Ora-00494:enqueue [CF] held for too long (more than 900 seconds) by ' Inst 1, osid 367′–22 minute 17 sec system throws wrong, process 367 holds controlfile Enq Ueue lock time is timed by another process (ARC0), the Enqueue lock hold time is controlled by a suppressed parameter _controlfile_enqueue_timeout and defaults to 900 seconds
Wed Oct 21 03:22:17 2015
System state dumped to trace FILE/ORACLE/ADMIN/DTVBOSS/BDUMP/DTVBOSS_ARC0_404.TRC
Wed Oct 21 03:22:50 2015
Errors in FILE/ORACLE/ADMIN/DTVBOSS/BDUMP/DTVBOSS_LGWR_365.TRC:
Ora-00494:enqueue [CF] held for too long (more than 900 seconds) by ' Inst 1, osid 367′–22 minutes 50 seconds, Process 367 holds controlfile Enqueu E lock time is LGWR by other processes (the wait timeout)
Wed Oct 21 03:22:55 2015
Killing Enqueue blocker (pid=367) on resource CF-00000000-00000000–LGWR process found exception killing the process of holding controlfile equeue lock 367
By killing session 1648.1
Wed Oct 21 03:22:55 2015
Errors in FILE/ORACLE/ADMIN/DTVBOSS/BDUMP/DTVBOSS_LGWR_365.TRC:
Ora-07445:exception Encountered:core Dump [Ksuklms () +672] [SIGSEGV] [Address not mapped to object] [0x000000062] [] []
Wed Oct 21 03:23:18 2015
Errors in file/oracle/admin/dtvboss/bdump/dtvboss_pmon_343.trc:– following a series of error downtime start
ORA-00470:LGWR process terminated with error
Wed Oct 21 03:23:18 2015
Pmon:terminating instance due to error 470
Wed Oct 21 03:23:18 2015
ORA-470 encountered when generating server alert SMG-3503
Wed Oct 21 03:23:18 2015
Errors in FILE/ORACLE/ADMIN/DTVBOSS/BDUMP/DTVBOSS_J006_22335.TRC:
Ora-00604:error occurred at recursive SQL level 1
ORA-00470:LGWR process terminated with error
Wed Oct 21 03:23:22 2015
Termination issued to instance processes. Waiting for the processes to exit
Wed Oct 21 03:23:28 2015
Instance termination failed to kill one or more processes

Trace file analysis (trace too long extract key parts):
SERVICE NAME: (sys$background) 2015-10-21 03:22:11.264
Session ID: (1646.3) 2015-10-21 03:22:11.264
2015-10-21 03:22:11.264
Unable to get Enqueue on resource cf-00000000-00000000 (GES mode req=3 held=6)
Possible local blocker ospid=367 sid=1648 sser=1 time_held=37 secs (GES mode req=6 held=4)
Dump local blocker:initiate state dump for KILL blocker
Possible owner[17.367] on resource cf-00000000-00000000
Dumping process Info of pid[17.367] requested by pid[23.404]
Ora-00494:enqueue [CF] held for too long (more than 900 seconds) by ' Inst 1, Osid 367′

——————————————————————————-
ENQUEUE [CF] HELD for TOO LONG

Enqueue Holder: ' Inst 1, Osid 367′

Process ' Inst 1, Osid 367′is holding a enqueue for maximum time.
The process would be terminated.

Oracle Support Services triaging information:to Find the Root-cause
At the call stack of process ' Inst 1, Osid 367′located below. Ask the
Developer that owns the the "Non-service layer in" stack to investigate.
Common Service layers are enqueues (ksq), latches (KSL), library cache
Pins and Locks (KGL), and Row cache locks (KQR).

Dumping Process 17.367 Info:
2015-10-21 03:22:11.300
Dumping diagnostic information for Ospid 367:
OS PID = 367
loadavg:0.44 0.62 1.82
Swap Info:free_mem = 1435.35M RSV = 59461.24M
alloc = 55941.14M Avail = 55124.91 Swap_free = 58645.01M
F S UID PID PPID C PRI NI ADDR SZ wchan stime TTY time CMD
0 S Oracle 367 1 0 39 20? 4116160? Mar 30? 498:14 ora_ckpt_dtvboss-the entire system at that time free memory and swap still leave room, enough to use the system, so the lack of memory reasons can be excluded. This prompts process 367 for the Oracle CKPT process to confirm that the process holding Controlfile Enqueue lock was the Oracle CKPT process

Processes 23:– the process for the ARC0 archive process, trace information is displayed on the waiting Controlfile enqueue queue lock. Wait for event as Enq:cf–contention
—————————————-
So:8a9004f88, Type:2, owner:0, flag:init/-/-/0x00
(process) Oracle pid=23, calls CUR/TOP:8A95C6650/8A95C6650, flag: (2) SYSTEM
int error:0, call error:0, Sess error:0, txn error 0
(post info) Last Post received:0 0 24
Last Post received-location:ksasnd
Last process to post me:b66002018 1 6
Last Post sent:0 0 24
Last Post sent-location:ksasnd
Last process posted by me:b66005770 1 2
(latch info) wait_event=0 bits=0
Process Group:default, Pseudo proc:5eb3388f0
O/S info:user:oracle, Term:unknown, ospid:404
OSD PID Info:unix Process pid:404, IMAGE:ORACLE@HZBOSS-ORA1 (ARC0)
Dump of memory from 0X00000005F033B1F0 to 0x00000005f033b3f8
5f033b1f0 0000000C 00000000 00000008 af739c70 [... s.p]., ...
5f033b200 00000006 000313a7 00000008 a95d1620 [...] ...... ]
The 5f033b210 00000005 000313a7 00000008 af739c70 [... s.p]..
The 5f033b220 00000006 000313a7 00000008 af739c70 [... s.p]..
5f033b230 00000006 000313a7 00000008 a95d1fb8 [...].... ...]
The 5f033b240 00000005 000313a7 00000008 af739c70 [... s.p]..
The 5f033b250 00000006 000313a7 00000008 af739c70 [... s.p]..
5f033b260 00000006 000313a7 00000008 a95d1fb8 [...].... ...]
The 5f033b270 00000005 000313a7 00000008 af739c70 [... s.p]..
The 5f033b280 00000006 000313a7 00000008 af739c70 [... s.p]..
5f033b290 00000006 000313a7 00000008 a95d1fb8 [...].... ...]
The 5f033b2a0 00000005 000313a7 00000008 af739c70 [... s.p]..
5f033b2b0 00000006 000313a7 00000008 a95d17e8 [...].... ...]
The 5f033b2c0 00000005 000313a7 00000008 af739c70 [... s.p]..
The 5f033b2d0 00000006 000313a7 00000008 af739c70 [... s.p]..
5f033b2e0 00000006 000313a7 00000008 a95d1880 [...].... ...]
The 5f033b2f0 00000005 000313a7 00000008 af739c70 [... s.p]..
The 5f033b300 00000006 000313a7 00000008 af739c70 [... s.p]..
5f033b310 00000006 000313a7 00000008 a95d16b8 [...].... ...]
The 5f033b320 00000005 000313a7 00000008 af739c70 [... s.p]..
The 5f033b330 00000006 000313a7 00000008 af739c70 [... s.p]..
5f033b340 00000006 000313a7 00000008 a95d1880 [...].... ...]
The 5f033b350 00000005 000313a7 00000008 af739c70 [... s.p]..
The 5f033b360 00000006 000313a7 00000008 af739c70 [... s.p]..
5f033b370 00000006 000313a7 00000008 a95d16b8 [...].... ...]
The 5f033b380 00000005 000313a7 00000008 af739c70 [... s.p]..
The 5f033b390 00000006 000313a7 00000008 af739c70 [... s.p]..
5f033b3a0 00000006 000313a7 00000008 a95d17e8 [...].... ...]
The 5f033b3b0 00000005 000313a7 00000008 af739c70 [... s.p]..
The 5f033b3c0 00000006 000313a7 00000008 af739c70 [... s.p]..
5f033b3d0 00000006 000313a7 00000008 a95d17e8 [...].... ...]
The 5f033b3e0 00000005 000313a7 00000008 af739c70 [... s.p]..
5f033b3f0 00000006 000313a7 [...]
(FOB) flags=2 fib=8af713780 incno=0 pending I/O cnt=0
Fname=/boss_sysdata/oradata/dtvboss/control03.ctl
fno=2 lblksz=16384 fsiz=1308
(FOB) flags=2 fib=8af7133e0 incno=0 pending I/O cnt=0
Fname=/boss_sysdata/oradata/dtvboss/control02.ctl
Fno=1 lblksz=16384 fsiz=1308
(FOB) flags=2 fib=8af713040 incno=0 pending I/O cnt=0
Fname=/boss_sysdata/oradata/dtvboss/control01.ctl
Fno=0 lblksz=16384 fsiz=1308
—————————————-
so:8a9778500, Type:11, Owner:8a9004f88, flag:init/-/-/0x00
(broadcast handle) flag: (2) ACTIVE subscriber, Owner:8a9004f88,
event:21, last Message event:21,
Last message waited event:21, messages read:0
Channel: (5EB52CC20) scumnt Mount Lock
Scope:1, event:22, last Mesage event:20,
PUBLISHERS/SUBSCRIBERS:0/20,
Messages Published:1
—————————————-
so:8a95b7328, Type:4, Owner:8a9004f88, flag:init/-/-/0x00
(session) sid:1646 trans:0, CREATOR:8A9004F88, flag: (Wuyi) usr/-bsy/-/-/-/-/-
did:0001-0017-0000000d, Short-term did:0000-0000-00000000
Txn branch:0
oct:0, prv:0, sql:0, psql:0, User:0/sys
Service Name:sys$background
Last wait for ' enq:cf–contention ' blocking sess=0x0 seq=43322 wait_time=2929723 seconds since wait started=36
name|mode=43460004, 0=0, operation=0
Dumping session Wait History
For ' enq:cf–contention ' count=1 wait_time=2929723
name|mode=43460004, 0=0, operation=0
For ' enq:cf–contention ' count=1 wait_time=2929725
name|mode=43460004, 0=0, operation=0
For ' enq:cf–contention ' count=1 wait_time=2929726
name|mode=43460004, 0=0, operation=0
For ' enq:cf–contention ' count=1 wait_time=2929726
name|mode=43460004, 0=0, operation=0
For ' enq:cf–contention ' count=1 wait_time=2929725
name|mode=43460004, 0=0, operation=0
For ' enq:cf–contention ' count=1 wait_time=2929723
name|mode=43460004, 0=0, operation=0
For ' enq:cf–contention ' count=1 wait_time=2929725
name|mode=43460004, 0=0, operation=0
For ' enq:cf–contention ' count=1 wait_time=2929726
name|mode=43460004, 0=0, operation=0
For ' enq:cf–contention ' count=1 wait_time=2929724
name|mode=43460004, 0=0, operation=0
For ' enq:cf–contention ' count=1 wait_time=2929750
name|mode=43460004, 0=0, operation=0

So:8a95d1fb8, Type:5, owner:8a95c6650, flag:init/-/-/0x00
(Enqueue) cf-00000000-00000000 did:0001-0017-0000000d
lv:00 The 4a for each of the res_flag:0x2.
res:0x5eb40f608, lock_flag:0x0
own:0x8a95b7328, sess:0x8a95b7328, prv:0x8a95d1fe8– here is a description arc0 the queue lock resource for 0x5eb40f608. And in the ckpt of process 17th, 0x5eb40f608 resources are seen to be held by CKPT.

Handles in process (CKPT processes):

So:8a95b9df8, Type:4, owner:8a90037d0, flag:init/-/-/0x00
(session) sid:1648 trans:0, CREATOR:8A90037D0, flag: (Wuyi) usr/-bsy/-/-/-/-/-
did:0001-0011-00000005, Short-term did:0000-0000-00000000
Txn branch:0
oct:0, prv:0, sql:0, psql:0, User:0/sys
Service Name:sys$background
Waiting for ' direct path write ' blocking sess=0x0 seq=39369 wait_time=0 seconds wait since
File number=72, dba=1, Block cnt=1
Dumping session Wait History
For ' Direct path write ' Count=1 wait_time=0
File number=73, dba=1, Block cnt=1
For ' Direct path write ' Count=1 wait_time=0
File number=74, dba=1, Block cnt=1
For ' Direct path write ' Count=1 wait_time=0
File number=75, dba=1, Block cnt=1
For ' Direct path write ' Count=1 wait_time=0
File number=76, dba=1, Block cnt=1
For ' Direct path write ' Count=1 wait_time=309
File number=77, dba=1, Block cnt=1
For ' direct path read ' Count=1 wait_time=1
File number=8b, dba=1, Block cnt=1
For ' direct path read ' Count=1 wait_time=0
File number=8a, dba=1, Block cnt=1
For ' direct path read ' Count=1 wait_time=0
File number=89, dba=1, Block cnt=1
For ' direct path read ' Count=1 wait_time=1
File number=88, dba=1, Block cnt=1
For ' direct path read ' Count=1 wait_time=1
File number=87, dba=1, Block cnt=1
Temporary Object counter:0
—————————————-
SO:8A95D1CC0, Type:5, Owner:8a95c5df8, flag:init/-/-/0x00
(Enqueue) cf-00000000-00000000 did:0001-0011-00000005
lv:00 The 4a for each of the res_flag:0x2.
res:0x5eb40f608, MODE:SSX, lock_flag:0x0 —————-Ckpt held the resource in a SSX manner
Own:0x8a95b9df8, Sess:0x8a95b9df8, proc:0x8a90037d0, prv:0x5eb40f618

Process 10:LGWR Processes
—————————————-
so:b66002018, Type:2, owner:0, flag:init/-/-/0x00
(process) Oracle pid=10, calls CUR/TOP:B665C6DE0/B665C6DE0, flag: (6) SYSTEM
int error:0, call error:0, Sess error:0, txn error 0
(post info) Last Post received:0 0 24
Last Post received-location:ksasnd
Last process to post me:8a9000860 1 6
Last Post sent:109 0 4
Last Post SENT-LOCATION:KSLPSR
Last process posted by me:8a902ef90 105 0
(latch info) wait_event=0 bits=0
Process Group:default, Pseudo proc:5eb3388f0
O/S info:user:oracle, Term:unknown, ospid:365
OSD PID Info:unix Process pid:365, IMAGE:ORACLE@HZBOSS-ORA1 (LGWR)

so:b665bf398, Type:4, owner:b66002018, flag:init/-/-/0x00
(session) sid:824 trans:0, creator:b66002018, flag: (Wuyi) usr/-bsy/-/-/-/-/-
did:0001-000a-00000006, Short-term did:0000-0000-00000000
Txn branch:0
oct:0, prv:0, sql:0, psql:0, User:0/sys
Service Name:sys$background
Waiting for ' enq:cf–contention ' blocking Sess=0x8a95b9df8 seq=61266 wait_time=0 seconds since wait started=0 0x8a95b9df 8 The handle holder is the CKPT process for process 17
name|mode=43460005, 0=0, operation=0
Dumping session Wait History
For ' enq:cf–contention ' count=1 wait_time=2929719
name|mode=43460005, 0=0, operation=0
For ' enq:cf–contention ' count=1 wait_time=2929716
name|mode=43460005, 0=0, operation=0
For ' enq:cf–contention ' count=1 wait_time=2929716
name|mode=43460005, 0=0, operation=0
For ' enq:cf–contention ' count=1 wait_time=2929718
name|mode=43460005, 0=0, operation=0
For ' enq:cf–contention ' count=1 wait_time=2929719
name|mode=43460005, 0=0, operation=0
For ' enq:cf–contention ' count=1 wait_time=2929719

—————————————-
So:b665c6de0, Type:3, owner:b66002018, flag:init/-/-/0x00
(call) Sess:cur b665bf398, rec 0, usr b665bf398; depth:0
—————————————-
So:b665cce90, Type:5, Owner:b665c6de0, flag:init/-/-/0x00
(Enqueue) cf-00000000-00000000 did:0001-000a-00000006
lv:00 The 4a for each of the res_flag:0x2.
res:0x5eb40f608, REQ:SSX, lock_flag:0x0 ———— previously explained that the 5eb40f608 resource queue lock has been held by the CKPT process, while LGWR is applying to hold it in SSX mode.
own:0xb665bf398, sess:0xb665bf398, proc:0xb66002018, prv:0x5eb40f628

The

can be concluded that because the ARC0 process to write log needs to wait for the control file CF Lock, LGWR estimated to write log or switch is also waiting for the control file CF lock, and ckpt process holding controlfile enqueue lock time too long, Arch0 waits more than 900 seconds to cause an Oracle database system to judge an exception.
At this point, the Oracle database system notifies the LGWR process that the process kills the Controlfile Enqueue lock, which is equivalent to the LGWR process killing the ckpt process, and the CKPT process is an Oracle key process that causes a ripple effect, causing downtime.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.