Author: eygle
Source: http://www.eygle.com/blog
Date: June 26,200 5
Link: http://www.eygle.com/archives/2005/06/oracleeaoieiear.html
«Use of new features of Oracle9i flashback query to recover and accidentally delete data | blog homepage | popular Oracle database performance optimization book»
Platform: SunOS 5.8 Generic_108528-23 sun4u iSCSI sunw, ultra-Enterprise
Database: 8.1.5.0.0
Symptom: the response is slow and application requests cannot be returned.
Log on to the database and find that the redo log group is in the active status except the current status.
oracle:/oracle/oracle8>sqlplus "/ as sysdba"SQL*Plus: Release 8.1.5.0.0 - Production on Thu Jun 23 18:56:06 2005(c) Copyright 1999 Oracle Corporation. All rights reserved.Connected to:Oracle8i Enterprise Edition Release 8.1.5.0.0 - ProductionWith the Partitioning and Java optionsPL/SQL Release 8.1.5.0.0 - ProductionSQL> select * from v$log; GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- 1 1 520403 31457280 1 NO ACTIVE 1.3861E+10 23-JUN-05 2 1 520404 31457280 1 NO ACTIVE 1.3861E+10 23-JUN-05 3 1 520405 31457280 1 NO ACTIVE 1.3861E+10 23-JUN-05 4 1 520406 31457280 1 NO CURRENT 1.3861E+10 23-JUN-05 5 1 520398 31457280 1 NO ACTIVE 1.3860E+10 23-JUN-05 6 1 520399 31457280 1 NO ACTIVE 1.3860E+10 23-JUN-05 7 1 520400 104857600 1 NO ACTIVE 1.3860E+10 23-JUN-05 8 1 520401 104857600 1 NO ACTIVE 1.3860E+10 23-JUN-05 9 1 520402 104857600 1 NO ACTIVE 1.3861E+10 23-JUN-059 rows selected.SQL> / GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- 1 1 520403 31457280 1 NO ACTIVE 1.3861E+10 23-JUN-05 2 1 520404 31457280 1 NO ACTIVE 1.3861E+10 23-JUN-05 3 1 520405 31457280 1 NO ACTIVE 1.3861E+10 23-JUN-05 4 1 520406 31457280 1 NO CURRENT 1.3861E+10 23-JUN-05 5 1 520398 31457280 1 NO ACTIVE 1.3860E+10 23-JUN-05 6 1 520399 31457280 1 NO ACTIVE 1.3860E+10 23-JUN-05 7 1 520400 104857600 1 NO ACTIVE 1.3860E+10 23-JUN-05 8 1 520401 104857600 1 NO ACTIVE 1.3860E+10 23-JUN-05 9 1 520402 104857600 1 NO ACTIVE 1.3861E+10 23-JUN-059 rows selected. |
If the logs are in the active status, it is clear that dbwr writes cannot keep up with the checkpoint triggered by log switch. Next, let's check the busy degree of dbwr:
SQL> !oracle:/oracle/oracle8>ps -ef|grep ora_ oracle 2273 1 0 Mar 31 ? 57:40 ora_smon_hysms02 oracle 2266 1 0 Mar 31 ? 811:42 ora_dbw0_hysms02 oracle 2264 1 16 Mar 31 ? 16999:57 ora_pmon_hysms02 oracle 2268 1 0 Mar 31 ? 1649:07 ora_lgwr_hysms02 oracle 2279 1 0 Mar 31 ? 8:09 ora_snp1_hysms02 oracle 2281 1 0 Mar 31 ? 4:22 ora_snp2_hysms02 oracle 2285 1 0 Mar 31 ? 9:40 ora_snp4_hysms02 oracle 2271 1 0 Mar 31 ? 15:57 ora_ckpt_hysms02 oracle 2283 1 0 Mar 31 ? 5:37 ora_snp3_hysms02 oracle 2277 1 0 Mar 31 ? 5:58 ora_snp0_hysms02 oracle 2289 1 0 Mar 31 ? 0:00 ora_d000_hysms02 oracle 2287 1 0 Mar 31 ? 0:00 ora_s000_hysms02 oracle 2275 1 0 Mar 31 ? 0:04 ora_reco_hysms02 oracle 21023 21012 0 18:52:59 pts/65 0:00 grep ora_ |
The process Number of dbwr is 2266.
Use the TOP command to observe:
oracle:/oracle/oracle8>toplast pid: 21145; load averages: 3.38, 3.45, 3.67 18:53:38725 processes: 711 sleeping, 1 running, 10 zombie, 3 on cpuCPU states: 35.2% idle, 40.1% user, 9.4% kernel, 15.4% iowait, 0.0% swapMemory: 3072M real, 286M free, 3120M swap in use, 1146M swap free PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND 11855 smspf 1 59 0 1355M 1321M cpu/0 19:32 16.52% oracle 2264 oracle 1 0 0 1358M 1316M run 283.3H 16.36% oracle 11280 oracle 1 13 0 1356M 1321M sleep 79.8H 0.77% oracle 6957 smspf 15 29 10 63M 14M sleep 107.7H 0.76% java 17393 smspf 1 30 0 1356M 1322M cpu/1 833:05 0.58% oracle 29299 smspf 5 58 0 8688K 5088K sleep 18.5H 0.38% fee_ftp_get 21043 oracle 1 43 0 3264K 2056K cpu/9 0:01 0.31% top 20919 smspf 17 29 10 63M 17M sleep 247:02 0.29% java 25124 smspf 1 58 0 16M 4688K sleep 0:35 0.25% smif_status_rec 8086 smspf 5 23 0 21M 13M sleep 41.1H 0.24% fee_file_in 16009 root 1 35 0 4920K 3160K sleep 0:03 0.21% sshd2 25126 smspf 1 58 0 1355M 1321M sleep 0:26 0.20% oracle 2266 oracle 1 60 0 1357M 1317M sleep 811:42 0.18% oracle 11628 smspf 7 59 0 3440K 2088K sleep 0:39 0.16% sgip_client_ltz 26257 smspf 82 59 0 447M 178M sleep 533:04 0.15% java |
We noticed that the CPU consumed by the process No. 2266 is only 0.18%, which is obviously not busy, so the bottleneck is probably on Io.
Use iostat to check the IO status.
gqgai:/home/gqgai>iostat -xn 3 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device...... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0 1.8 38.4 32.4 281.0 0.0 0.7 0.0 16.4 0 29 c0t10d0 1.8 38.4 32.4 281.0 0.0 0.5 0.0 13.5 0 27 c0t11d0 24.8 61.3 1432.4 880.1 0.0 0.5 0.0 5.4 0 26 c1t1d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 9.1 0 0 hurraysms02:vold(pid238) extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device........ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0 0.3 8.3 0.3 47.0 0.0 0.1 0.0 9.2 0 8 c0t10d0 0.0 8.3 0.0 47.0 0.0 0.1 0.0 8.0 0 7 c0t11d0 11.7 65.3 197.2 522.2 0.0 1.6 0.0 20.5 0 100 c1t1d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 hurraysms02:vold(pid238) extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device........ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0 0.3 13.7 2.7 68.2 0.0 0.2 0.0 10.9 0 12 c0t10d0 0.0 13.7 0.0 68.2 0.0 0.1 0.0 9.6 0 11 c0t11d0 11.3 65.3 90.7 522.7 0.0 1.5 0.0 19.5 0 99 c1t1d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 hurraysms02:vold(pid238) extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device........ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0 0.0 8.0 0.0 42.7 0.0 0.1 0.0 9.3 0 7 c0t10d0 0.0 8.0 0.0 42.7 0.0 0.1 0.0 9.1 0 7 c0t11d0 11.0 65.7 978.7 525.3 0.0 1.4 0.0 17.7 0 99 c1t1d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 hurraysms02:vold(pid238) extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device........ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0 0.3 87.7 2.7 433.7 0.0 2.2 0.0 24.9 0 90 c0t10d0 0.0 88.3 0.0 436.5 0.0 1.8 0.0 19.9 0 81 c0t11d0 89.0 54.0 725.4 432.0 0.0 2.1 0.0 14.8 0 100 c1t1d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 hurraysms02:vold(pid238) |
We noticed that the main volume of c1t1d0 stored in the database is always at 99 ~ 100, but the write speed is only about 500 kb/s, which is extremely slow.
(% B percent of time the disk is busy (transactions in progress)
KW/s kilobytes written per second)
According to our knowledge, the write speed of the T3 disk array by char can reach about 10 Mb/s. Some tpcc indicators have been tested before. For details, refer to: Use Bonnie to test system Io speed.
Under normal circumstances, the random write speed of databases is usually between 1 and 1 ~ About 2 M. It is clear that the disk is in an abnormal state. It is confirmed that the hard disk is damaged, and a hard disk is damaged in the RAID 5 group.
After replacement, the system gradually becomes normal.
Posted by eygle at June 26,200 5