Problem and Solution of business response affected by a monitoring system process

Source: Internet
Author: User


During the 11th period, the execution of a program in the Process of service response affected by a monitoring system process was particularly slow. It is suspected that the oracle database has a problem and needs to be processed as soon as possible, based on the information provided by the business, the relevant business is initiated from the wwzg3 host. The business mainly uses the account book notification table www_BOOK_CHG_NOTIFY and text message table WWW. www_busi_sms. Provides an SQL statement select * from WWW. www_book_scheme_fee_0104 a, WWW. www_book_scheme_rec_0104 B, WWW. crm_user cwhere. so_nbr = B. so_nbr and. wwwt_id = c. wwwt_id and. after_month <= to_char (sysdate, 'yyymmm ') and c. phone_id = B. phone_id and B. sts = 0 and. sts = 0; www.2cto.com 1. log on to the Database Host and find that the host has sufficient resources, the system resources do not have a bottleneck, and the CPU usage is only 40% SQL>! Sar-u 3 5 HP-UX wbdb1 B .11.31 U ia64 10/04/12 12:24:01% usr % sys % wio % idle12: 24: 04 27 4 8 6012: 24: 07 30 3 7 6012: 24: 10 28 2 8 6212: 24: 13 30 4 10 5612: 24: 16 30 3 10 57 Average 29 3 9 592, the SQL Execution speed provided by the development is extremely fast, the only SQL Execution Plan with some exceptions related to the business table is as follows, which is basically a full table scan, however, the database does not find that the statement is running www.2cto.com. Therefore, [SQL] PLAN_TABLE_OUTPUT --------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | TempSpc | Cost (% CPU) | Pstart | Pstop | percent | 0 | select statement | 1 | 430 | 87768 (1) | 1 | TABLE wwwESS by local index rowid | CRM_USER | 1 | 20 | 21 (0) | 2 | nested loops | 1 | 430 | 87768 (1) | 3 | hash join | 385 | 154K | 2832K | 79678 (1) | 4 | Hash join | 10999 | 2696K | 343M | 75513 (1) | 5 | merge join cartesian | 5634K | 279M | 20348 (1) | 6 | TABLE wwwESS FULL | www_BOOK_SCHEME_COND | 2189 | 32835 | 39 (0) | 7 | buffer sort | 2574 | 95238 | 20309 (1) | 8 | TABLE wwwESS FULL | www_BOOK_SCHEME_SAP_MAP | 2574 | 95238 | 9 (0) | 9 | TABLE wwwESS FULL | www_BOOK_SCHEME_REC_0104 | 2306K | 437M | 14986 (2) | 10 | TABLE wwwESS FULL | www_BOOK_SCHEME_FEE_0104 | 80636 | 12M | 3373 (2) | 11 | partition range all | 1 | 20 (0) | 1 | 10 | 12 | index range scan | IDX_USER_PHONE | 1 | 20 (0) | 1 | 10 | www.2cto.com 3. Check the top 10 SQL statements in the database, no abnormal SQL statements from wwzg3 host were found, basically all of them were omitted from wbzc4 awr 4. Based on the above inference, there was no problem with the database, so I turned to check whether the host running the relevant process was abnormal, sure enough. No problem. In the database, the resource usage on the wwzg3 host is too high, and the system usage is 100% CPU usage is 100%, which is consumed by % sys, root @ wwzg3 [/] # sar-u 3 5 HP-UX wwzg3 B .11.31 U ia64 10/04/12 13:47:38% usr % sys % wio % idle13: 47: 41 30 70 0 47:44 30 70 0 47:47 30 69 0 47:50 31 69 0 47:53 30 69 0 0 0 Average 30 69 0 0 0 www.2cto.com 5,262 idcp. THE sh process is running root @ wwzg3 [/] # ps-ef | grep idcp. sh | wc-l262 root 20282 10168 1 13:55:30 pts/17 grep 7675root @ wwzg3 [/] # ps-ef | grep I Dcp. sh root 10315 10146 0 13:55:40? 20:40:00/usr/bin/sh/home/patrol/IDCPCM/main/idcp. sh root 9848 9803 253? 25:36/usr/bin/sh/home/patrol/IDCPCM/main/idcp. sh root 29251 29179 251 19:10:00? 30: 05/usr/bin/sh/home/patrol/IDCPCM/main/idcp. sh root 19055 3432 0 16:40:00? 15:50:00 sh-c/home/patrol/IDCPCM/main/idcp. sh 1>/dev/null 2>/dev/null root 13603 3432 0? 18:20:00 sh-c/home/patrol/IDCPCM/main/idcp. sh 1>/dev/null 2>/dev/null root 13384 3432 0? 15:20:00 sh-c/home/patrol/IDCPCM/main/idcp. sh 1>/dev/null 2>/dev/null root 26541 3432 0? 21:10:00 sh-c/home/patrol/IDCPCM/main/idcp. sh 1>/dev/null 2>/dev/null root 29340 29289 252? 24:36/usr/bin/sh/home/patrol/IDCPCM/main/idcp. sh root 22296 22236 251 18:00:00? 34: 44/usr/bin/sh/home/patrol/IDCPCM/main/idcp. sh root 10677 10653 252 21:00:00? 21:20:00/usr/bin/sh/home/patrol/IDCPCM/main/idcp. sh root 20122 3432 0? 07:00:00 sh-c/home/patrol/IDCPCM/main/idcp. sh 1>/dev/null 2>/dev/null root 25174 25034 250? 03:20:00/usr/bin/sh/home/patrol/IDCPCM/main/idcp. sh root 6585 6499 251? 13:55:39/usr/bin/sh/home/patrol/IDCPCM/main/idcp. sh root 10206 10177 0? 21:40:00/usr/bin/sh/home/patrol/IDCPCM/main/idcp. sh root 25419 25344 253? 17:00:01/usr/bin/sh/home/patrol/IDCPCM/main/idcp. sh root 27135 3432 0? 13:55:40 sh-c/home/patrol/IDCPCM/main/idcp. sh 1>/dev/null 2>/dev/null root 10347 944 0? 13:55:39/usr/bin/sh/home/patrol/IDCPCM/main/idcp. sh root 10167 10166 0? 18:30:00/usr/bin/sh/home/patrol/IDCPCM/main/idcp. sh root 25569 3432 0? 13:55:40 sh-c/home/patrol/IDCPCM/main/idcp. sh 1>/dev/null 2>/dev/null root 10256 2025 0? 20:50:00/usr/bin/sh/home/patrol/IDCPCM/main/idcp. sh root 29111 29082 252? 24:08/usr/bin/sh/home/patrol/IDCPCM/main/idcp. sh6: the process starts a scheduled task every 10 minutes. However, from the process above, the processes that started running yesterday have not been completed yet, unknown exception cause # BOMC30 2 ***/home/patrol/CM/up_bin/get_cm.sh 1>/dev/null 2> & 10, 10, 20, 30, 40, 50 */home/patrol/IDCPCM/main/idcp. sh 1>/dev/null 2>/dev/null 7. kill related abnormal processes. To prevent the problem from happening again, manually comment out the automatic running of the job, system recovered to normal # BOMC30 2 ***/home/patrol/CM/up_bin/get_cm.sh 1>/dev/null 2> & 1, 40, 50 */home/patro L/IDCPCM/main/idcp. sh 1>/dev/null 2>/dev/null www.2cto.com the monitoring script automatically runs normally, and it is difficult to identify the cause of the exception. Therefore, to prevent similar problems, the script program is modified. If the previous scheduling of the judgment statement is not completed, the script is forcibly exited and re-executed, and added the log record IDCPNUM = 'ps-ef | grep idcp. sh | grep-v grep | wc-l | awk '{print $1} ''if [$ IDCPNUM-ge 1]; thenps-ef | grep idcp. sh | awk '{system ("kill-9" $2)} 'sleep 10cd $ TMDRrm PM *. txtfi Summary: the slow response to the database may not be a database problem at all. To make a comprehensive analysis and judgment, this problem is actually caused by the busy host of the client accessing the database.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.