How to Use AWR reports to diagnose database performance problems

Last Update:2014-10-11 Source: Internet

Author: User

Tags high cpu usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

How to Use AWR reports to diagnose database performance problems

How to proactively avoid problems and collect diagnostic information

Some problems are unpredictable, but most other problems can be avoided if some signs are discovered early. At the same time, if the problem does occur, it is very important to collect information when the problem occurs. Collect information about how to proactively avoid problems and diagnose problems.
The AWR report is a useful diagnostic tool for the overall performance of the database.
In general, when performance problems are detected, we will collect AWR reports covering the time period in which the problem occurred-but it is best to collect AWR reports covering only one hour-if the time is too long, the AWR report does not reflect the problem. An AWR report for a period of time without performance issues should also be collected as a reference to compare AWR reports for a period of time in question. The two AWR reports have the same time period. For example, they are both half an hour or one hour.

Generate and view Oracle AWR reports

Install Oracle 11gR2 (x64) in CentOS 6.4)

Steps for installing Oracle 11gR2 in vmwarevm

Install Oracle 11g XE R2 In Debian

Steps for generating an Oracle AWR report

Interpretation

When dealing with performance problems, we are most concerned about what the database is waiting.
When a process cannot be operated for some reason, it needs to wait. The most time-consuming wait event is what we need to pay attention to most because we can get the greatest benefit if we reduce it.
The "Top 5 Timed Events" section in the AWR report provides this information, allowing us to focus only on the main issues.
•
Top 5 Timed Events
As mentioned above, "Top 5 Timed Events" is the most important part in the AWR report. It indicates the most time-consuming wait event for database sessions, as shown below:

Top 5 Timed Events Avg % Total

~~~~~~~~~~~~~~~~~~ Wait Call

Event Waits Time (s) (MS) Time Wait Class

---------------------------------------------------------------------------

Db file scattered read 10,152,564 81,327 8 29.6 User I/O

Db file sequential read 10,327,231 75,878 7 27.6 User I/O

CPU time 56,207 20.5

Read by other session 4,397,330 33,455 8 12.2 User I/O

PX Deq Credit: send blkd 31,398 26,576 846 9.7 Other

-------------------------------------------------------------

The Top 5 Events section contains some Events related information. It records the total number of waits encountered during this period, the total waiting time, and the average waiting time; this part is sorted by the percentage of each Event in the total call time.

Based on the differences in the Top 5 Events section, next we need to check other sections of the AWR report to verify the detected problems or perform quantitative analysis. The waiting event needs to be evaluated based on the duration of the reporting period and the number of concurrent users in the data base at that time. For example, 10 million wait events within 10 minutes are more problematic than 10 million in 10 hours; the 10 million waiting events caused by 10 users are more problematic than the same waiting events caused by 10,000 users.

As in the above example, nearly 60% of the time is waiting for IO-related events.

• The "db file scattered read" event generally indicates that multiple reads are being performed due to full table scan or index fast full scan.
• The "db file sequential read" event is generally caused by a single read (such as a read index) operation that cannot be performed on multiple reads)

Other 20% of the time is spent on using or waiting for CPU time. High CPU usage is often caused by poor performance SQL statements (or these SQL statements may perform the same operation with fewer resources). For such SQL statements, too many IO operations are also a symptom. We will discuss the CPU usage later.

Based on the above, we will investigate whether there is a problem with this wait event. If there is a problem, solve it; if it is normal, check the next wait event.

There are usually two main reasons for excessive IO-related WAITS:

• The database has performed too many read operations.
• Each IO read operation is slow.
The information displayed in the Top 5 Events section will help us check:

• Whether the database has performed a large number of read operations:
The figure above shows that during this period, both types of read operations are greater than 10 million. Whether these operations are too many depends on the reported time being 1 hour or 1 minute. We can check the elapsed time in the AWR report. If there are too many read operations, we need to check the SQL Statistics information in the AWR report, because the read operations are initiated by SQL statements.
• Whether each IO read operation is slow:
The figure above shows that the average wait time of the two types of read operations is less than 8 ms during this period.
Whether 8 ms is fast or slow depends on the underlying hardware device. Generally, less than 20 ms can be considered acceptable.

We can also get more detailed information in the "Tablespace IO Stats" section of the AWR report.

Tablespace IO Stats DB/Inst: VMWREP/VMWREP Snaps: 1-15

-> Ordered by IOs (Reads + Writes) desc

Tablespace

------------------------------

Av Buffer Av Buf

Reads/s Rd (MS) Blks/Rd Writes/s Waits Wt (MS)

----------------------------------------------------------------------

TS_TX_DATA

14,246,367 283 7.6 4.6 145,263,880 2,883 3,844,161

USER

204,834 4 10.7 1.0 17,849,021 354 15,249

UNDOTS1

19,725 0 3.0 1.0 10,064,086 200 1,964

AE _TS

4,287,567 85 5.4 6.7 932 0 465,793

TEMP

2,022,883 40 0.0 5.8 878,049 17 0 0.0

UNDOTS3

1,310,493 26 4.6 1.0 941,675 19 43 0.0

TS_TX_IDX

1,884,478 37 7.3 1.0 23,695 0 73,703 8.3

> SYSAUX

346,094 7 5.6 3.9 112,744 2 0 0.0

SYSTEM

101,771 2 7.9 3.5 25,098 0 653 2.7

For example, we care about Av Rd (MS) metrics. If it is higher than 20 ms and there are many read operations at the same time, we may have to start investigating the potential IO problems from the OS perspective.

For more details, please continue to read the highlights on the next page:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More