Optimization case: DB performance problems caused by lack of overall planning

Source: Internet
Author: User
In the past few days, a customer's core database has been optimized. After the SQL statements with high resource consumption are optimized, the total number of physical reads and logical reads is reduced. After the customer feedback is optimized, the performance is improved, but there are still performance problems during business peak hours in some workdays. We use the combination of poor-performance business peak hours (I .e. problematic periods) and business with normal performance

In the past few days, a customer's core database has been optimized. After the SQL statements with high resource consumption are optimized, the total number of physical reads and logical reads is reduced. After the customer feedback is optimized, the performance is improved, but there are still performance problems during business peak hours in some workdays. We use the combination of poor-performance business peak hours (I .e. problematic periods) and business with normal performance

In the past few days, a customer's core database has been optimized. After the SQL statements with high resource consumption are optimized, the total number of physical reads and logical reads is reduced. After the customer feedback is optimized, the performance is improved, but there are still performance problems during business peak hours in some workdays.
We compared the performance data of poor service peak periods (that is, the problem period) with the normal service peak periods (that is, the baseline period), and found some problems:

The baseline Time period is from AM to AM on January 15, 626.2. The TPS (transaction volume per second) is 46 T/s, and the total DB Time for this period is (mins)

The problem period is from AM to AM on February 20,. the TPS is: 47 T/s (only 1 T/s more than the baseline period ), the total DB Time for this period is 2361.4 (mins)

It is also a one-hour sampling period. The total DB Time of the problem segment is nearly four times that of the baseline. By comparing the performance view of the two, we find that the single IO latency in the problem period is very high, as follows:
Event Waits Time (s) Avg wait (MS) % DB time Wait Class
Database CPU 2,082 55.42
Db file sequential read 62,140 774 12 20.61 User I/O
Direct path read 177,440 575 3 15.31 User I/O
Log file sync 17,486 145 8 3.86 Commit
Gc cr block 2-way 98,519 30 0 0.80 Cluster

In the baseline period, the read latency of a single sequence is 12 ms, the direct read latency is 3 ms, and the write latency of a single redolog is 8 ms,

Event Waits Time (s) Avg wait (MS) % DB time Wait Class
Direct path read 180,200 4,643 26 32.77 User I/O
Db file sequential read 55,483 2,286 41 16.13 User I/O
Database CPU 1,917 13.53
Gc buffer busy acquire 5,513 1,474 267 Cluster
Log file sync 17,541 1,298 74 9.16 Commit

In the problem period, the read latency of a single sequence is 41 ms, the direct read latency is 26 ms, And the redolog write latency is 74 ms.
(The recommended normal delay for a single IO in the Oracle document should be 0-20 ms; otherwise, the hardware needs to be upgraded ),
That is, when the business volume remains unchanged compared with the baseline period, the IO efficiency decreases significantly during the problem period, it is suspected that other business systems in the same RAID group on the storage layer may have a large number of IO operations during the problem period,
This results in a large IO latency for the system we are optimizing. Confirm with the customer's storage staff that this is indeed the case, the storage staff did not make a reasonable plan for the storage in combination with the database, just from the capacity management for their work convenience, divided and allocated LUN. This leads to performance problems. I think this problem exists in many enterprises. Poor cross-Department communication leads to no overall planning. In the end, DB will pay for the problem.

Therefore, we recommend that you improve your storage:
1. Isolate such key systems from other systems at the storage level to avoid affecting IO;
2. Upgrade storage when budget is available.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.