Optimization case: DB performance problems caused by lack of overall planning

Last Update:2018-06-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the past few days, a customer's core database has been optimized. After the SQL statements with high resource consumption are optimized, the total number of physical reads and logical reads is reduced. After the customer feedback is optimized, the performance is improved, but there are still performance problems during business peak hours in some workdays.
We compared the performance data of poor service peak periods (that is, the problem period) with the normal service peak periods (that is, the baseline period), and found some problems:

The baseline Time period is from AM to AM on January 15, 626.2. The TPS (transaction volume per second) is 46 T/s, and the total DB Time for this period is (mins)

The problem period is from AM to AM on February 20,. the TPS is: 47 T/s (only 1 T/s more than the baseline period ), the total DB Time for this period is 2361.4 (mins)

It is also a one-hour sampling period. The total DB Time of the problem segment is nearly four times that of the baseline. By comparing the performance view of the two, we find that the single IO latency in the problem period is very high, as follows:
Event Waits Time (s) Avg wait (MS) % DB time Wait Class
Database CPU 2,082 55.42
Db file sequential read 62,140 774 12 20.61 User I/O
Direct path read 177,440 575 3 15.31 User I/O
Log file sync 17,486 145 8 3.86 Commit
Gc cr block 2-way 98,519 30 0 0.80 Cluster

In the baseline period, the read latency of a single sequence is 12 ms, the direct read latency is 3 ms, and the write latency of a single redolog is 8 ms,

Event Waits Time (s) Avg wait (MS) % DB time Wait Class
Direct path read 180,200 4,643 26 32.77 User I/O
Db file sequential read 55,483 2,286 41 16.13 User I/O
Database CPU 1,917 13.53
Gc buffer busy acquire 5,513 1,474 267 Cluster
Log file sync 17,541 1,298 74 9.16 Commit

In the problem period, the read latency of a single sequence is 41 ms, the direct read latency is 26 ms, And the redolog write latency is 74 ms.
(The recommended normal delay for a single IO in the Oracle document should be 0-20 ms; otherwise, the hardware needs to be upgraded ),
That is, when the business volume remains unchanged compared with the baseline period, the IO efficiency decreases significantly during the problem period, it is suspected that other business systems in the same RAID group on the storage layer may have a large number of IO operations during the problem period,
This results in a large IO latency for the system we are optimizing. Confirm with the customer's storage staff that this is indeed the case, the storage staff did not make a reasonable plan for the storage in combination with the database, just from the capacity management for their work convenience, divided and allocated LUN. This leads to performance problems. I think this problem exists in many enterprises. Poor cross-Department communication leads to no overall planning. In the end, DB will pay for the problem.

Therefore, we recommend that you improve your storage:
1. Isolate such key systems from other systems at the storage level to avoid affecting IO;
2. Upgrade storage when budget is available.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Optimization case: DB performance problems caused by lack of overall planning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Optimization case: DB performance problems caused by lack of overall planning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support