oracle| Data | database
Platform: SunOS 5.8 generic_108528-23 sun4u SPARC sunw,ultra-enterprise
Database: 8.1.5.0.0
Symptom: Slow response, application request cannot return
Log in to the database and find that the Redo journal group is active except current
Oracle:/oracle/oracle8>sqlplus "/As SYSDBA"
Sql*plus:release 8.1.5.0.0-production on Thu June 23 18:56:06 2005
(c) Copyright 1999 Oracle Corporation. All rights reserved.
Connected to:
Oracle8i Enterprise Edition Release 8.1.5.0.0-production
With the partitioning and Java options
Pl/sql Release 8.1.5.0.0-production
Sql> select * from V$log;
group# thread# sequence# BYTES members ARC STATUS first_change# First_tim
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------
1 1 520403 31457280 1 NO ACTIVE 1.3861E+10 23-jun-05
2 1 520404 31457280 1 NO ACTIVE 1.3861E+10 23-jun-05
3 1 520405 31457280 1 NO ACTIVE 1.3861E+10 23-jun-05
4 1 520406 31457280 1 NO current 1.3861E+10 23-jun-05
5 1 520398 31457280 1 NO ACTIVE 1.3860E+10 23-jun-05
6 1 520399 31457280 1 NO ACTIVE 1.3860E+10 23-jun-05
7 1 520400 104857600 1 NO ACTIVE 1.3860E+10 23-jun-05
8 1 520401 104857600 1 NO ACTIVE 1.3860E+10 23-jun-05
9 1 520402 104857600 1 NO ACTIVE 1.3861E+10 23-jun-05
9 rows selected.
Sql>/
group# thread# sequence# BYTES members ARC STATUS first_change# First_tim
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------
1 1 520403 31457280 1 NO ACTIVE 1.3861E+10 23-jun-05
2 1 520404 31457280 1 NO ACTIVE 1.3861E+10 23-jun-05
3 1 520405 31457280 1 NO ACTIVE 1.3861E+10 23-jun-05
4 1 520406 31457280 1 NO current 1.3861E+10 23-jun-05
5 1 520398 31457280 1 NO ACTIVE 1.3860E+10 23-jun-05
6 1 520399 31457280 1 NO ACTIVE 1.3860E+10 23-jun-05
7 1 520400 104857600 1 NO ACTIVE 1.3860E+10 23-jun-05
8 1 520401 104857600 1 NO ACTIVE 1.3860E+10 23-jun-05
9 1 520402 104857600 1 NO ACTIVE 1.3861E+10 23-jun-05
9 rows selected.
If the log is in an active state, then obviously DBWR write is no longer able to keep up with the checkpoint triggered by log switch.
Next let's check the DBWR's busy level:
Sql>!
Oracle:/oracle/oracle8>ps-ef|grep Ora_
Oracle 2273 1 0 Mar 31? 57:40 ORA_SMON_HYSMS02
Oracle 2266 1 0 Mar 31? 811:42 ORA_DBW0_HYSMS02
Oracle 2264 1 Mar 31? 16999:57 ORA_PMON_HYSMS02
Oracle 2268 1 0 Mar 31? 1649:07 ORA_LGWR_HYSMS02
Oracle 2279 1 0 Mar 31? 8:09 ORA_SNP1_HYSMS02
Oracle 2281 1 0 Mar 31? 4:22 ORA_SNP2_HYSMS02
Oracle 2285 1 0 Mar 31? 9:40 ORA_SNP4_HYSMS02
Oracle 2271 1 0 Mar 31? 15:57 ORA_CKPT_HYSMS02
Oracle 2283 1 0 Mar 31? 5:37 ORA_SNP3_HYSMS02
Oracle 2277 1 0 Mar 31? 5:58 ORA_SNP0_HYSMS02
Oracle 2289 1 0 Mar 31? 0:00 ORA_D000_HYSMS02
Oracle 2287 1 0 Mar 31? 0:00 ORA_S000_HYSMS02
Oracle 2275 1 0 Mar 31? 0:04 ORA_RECO_HYSMS02
Oracle 21023 21012 0 18:52:59 pts/65 0:00 grep ora_
The DBWR process number is 2266.
Use the top command to observe:
Oracle:/oracle/oracle8>top
Last pid:21145; Load averages:3.38, 3.45, 3.67 18:53:38
725 processes:711 sleeping, 1 running, zombie, 3 on CPU
CPU states:35.2% Idle, 40.1% user, 9.4% kernel, 15.4% iowait, 0.0% swap
memory:3072m Real, 286M free, 3120M swap with use, 1146M swap free
PID USERNAME THR PRI Nice SIZE RES the state time CPU COMMAND
11855 SMSPF 1 0 1355M 1321M cpu/0 19:32 16.52% Oracle
2264 Oracle 1 0 0 1358M 1316M run 283.3H 16.36% Oracle
11280 Oracle 1 0 1356M 1321M sleep 79.8H 0.77% Oracle
6957 SMSPF 63M 14M sleep 107.7H 0.76% java
17393 SMSPF 1 0 1356M 1322M cpu/1 833:05 0.58% Oracle
29299 SMSPF 5 0 8688K 5088K sleep 18.5H 0.38% fee_ftp_get
21043 Oracle 1 0 3264K 2056K cpu/9 0:01 0.31% Top
20919 SMSPF 63M 17M sleep 247:02 0.29% java
25124 SMSPF 1 0 16M 4688K sleep 0:35 0.25% Smif_status_rec
8086 SMSPF 5 0 21M 13M sleep 41.1H 0.24% fee_file_in
16009 Root 1 0 4920K 3160K sleep 0:03 0.21% sshd2
25126 SMSPF 1 0 1355M 1321M sleep 0:26 0.20% Oracle
2266 Oracle 1 0 1357M 1317M sleep 811:42 0.18% Oracle
11628 SMSPF 7 0 3440K 2088K sleep 0:39 0.16% Sgip_client_ltz
26257 SMSPF 0 447M 178M sleep 533:04 0.15% java
We note that the No. 2266 process consumes only 0.18% of the CPU, apparently not busy, so the bottleneck is likely to be on Io.
Use the Iostat tool to check IO status.
GQGAI:/HOME/GQGAI>IOSTAT-XN 3
Extended Device Statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t%w%b Device
......
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0
1.8 38.4 32.4 281.0 0.0 0.7 0.0 16.4 0 c0t10d0
1.8 38.4 32.4 281.0 0.0 0.5 0.0 13.5 0 c0t11d0
24.8 61.3 1432.4 880.1 0.0 0.5 0.0 5.4 0 c1t1d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 9.1 0 0 hurraysms02:vold (pid238)
Extended Device Statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t%w%b Device
........
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0
0.3 8.3 0.3 47.0 0.0 0.1 0.0 9.2 0 8 c0t10d0
0.0 8.3 0.0 47.0 0.0 0.1 0.0 8.0 0 7 c0t11d0
11.7 65.3 197.2 522.2 0.0 1.6 0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 hurraysms02:vold (pid238)
Extended Device Statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t%w%b Device
........
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0
0.3 13.7 2.7 68.2 0.0 0.2 0.0 10.9 0 c0t10d0
0.0 13.7 0.0 68.2 0.0 0.1 0.0 9.6 0 c0t11d0
11.3 65.3 90.7 522.7 0.0 1.5 0.0 19.5 0 c1t1d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 hurraysms02:vold (pid238)
Extended Device Statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t%w%b Device
........
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0
0.0 8.0 0.0 42.7 0.0 0.1 0.0 9.3 0 7 c0t10d0
0.0 8.0 0.0 42.7 0.0 0.1 0.0 9.1 0 7 c0t11d0
11.0 65.7 978.7 525.3 0.0 1.4 0.0 17.7 0 c1t1d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 hurraysms02:vold (pid238)
Extended Device Statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t%w%b Device
........
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0
0.3 87.7 2.7 433.7 0.0 2.2 0.0 24.9 0 c0t10d0
0.0 88.3 0.0 436.5 0.0 1.8 0.0 19.9 0 Bayi c0t11d0
89.0 54.0 725.4 432.0 0.0 2.1 0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 hurraysms02:vold (pid238)
We note that the main volume of the storage database c1t1d0 is always in 99~100, while the speed of writing is only about 500k/s, which is extremely slow.
(%b percent of time the "disk is busy" (Transactions in progress)
KW/S kilobytes written per second)
According to our common sense T3 disk array usually by char write speed can reach 10m/s or so, previously tested some TPCC indicators, you can refer to: The use of Bonnie to test system IO speed.
and normal database random write is usually around 1~2m, apparently at this time the disk has been in an abnormal state, after confirmed that the hard drive has been damaged, RAID5 group damaged a hard disk.
After the replacement, the system gradually returned to normal.
Posted by Eygle at June 26, 2005