Oracle performance I/O optimization (from the old white-a gold medal DBA story)

Source: Internet
Author: User

Io Optimization
In the case of massive data, more than 80% of database performance problems are related to Io, So I/O optimization is an important task throughout the whole process of massive database management.
I/O optimization involves a wide range of aspects, and I/O optimization focuses on some major aspects of Oracle Database optimization. Oracle Database Optimization in a massive database environment
I/O optimization is the most important task, because most large databases have more or less I/O performance problems.
Database I/O performance involves a wide range of areas, So I/O performance optimization is also the most complex work in Oracle Database optimization. The basic steps for I/O performance optimization are:
Such:
Collect System Data to locate I/O performance problems;
Analyze and develop solutions;
Adjust the system to solve the problem;
Evaluate the optimization results.
To optimize the I/O performance of Oracle databases, you must follow the basic principles of database optimization. First, the purpose of database optimization is to improve the system
The overall processing capability improves the transaction response speed. All optimization work should be carried out for this purpose. There are two ways to optimize database performance: one is to reduce processing
Time. Second, reduce wait events. To reduce the processing time, the application must be written efficiently and reasonably. Reducing the waiting time requires the system to allocate various resources reasonably without any bottlenecks.
The system resources used by the database include CPU, memory, storage, and network. Database optimization aims to rationally allocate these resources to ensure that no bottleneck exists for any type of resources.
According to the above principles, I/O performance optimization of Oracle databases cannot only improve the overall performance (including I/O performance) by reorganizing system resources.
In addition, other resources must be taken into account when optimizing I/O.
If I/O is unilaterally increased, other resource bottlenecks may also lead to a reduction in overall system performance. In particular, how to effectively utilize systems with limited resources
The combination of insufficient system resources is the key to system optimization by making full use of resources without overload.
How to determine the main problem of the system is the I/O problem, and further identify the root cause of the I/O problem is the key to solving the oracle I/O performance problem. I/O performance is not good, may be many
Problems. The first step for optimization is to determine the key performance bottlenecks.

The problems affecting Oracle database I/O performance cover a wide range of topics. Based on the author's years of experience in Oracle Database Management, the following aspects are the performance of image database I/O.
Main problems.
Storage performance bottlenecks: insufficient controllers, low cache, unreasonable cache settings, and insufficient I/O channel capacity.
Disk performance bottleneck: the number of disks is too small, and disks with a low speed are used.
An unreasonable raid mode is used.
When raid is used, I/O hotspots exist, and multiple hotspot files use the same disk.
Incorrect asynchronous I/O Configuration
Unreasonable database buffer settings, low buffer hit rate
The cache settings of PGA are too small (for Oracle 9i, when the automatic management mode is used, the PGA settings are too small), resulting in a large number of temporary tablespace operations
Redo logs have performance bottlenecks
The redo buffer settings are unreasonable.
Hotspot data
Serious tablespace fragmentation
Unreasonable storage parameters for tables and Indexes
Serious row migration
SQL statements with large table scans
Poor execution plan selected for SQL Execution

When the system encounters an I/O problem, the biggest challenge for the database administrator is how to find the most fundamental cause of the problem as soon as possible. I/O problems are very complicated, and no root cause is found.
Previous adjustments often fail to reach the final optimization goal. Note that I/O problems are often related to large SQL statements. If a system suddenly encounters an I/O performance problem, the first step is
Eliminate all problems beyond I/O.
Determining the existence of the I/O performance bottleneck and locating the equipment with the I/O performance bottleneck is the first step to solve the I/O performance problem. The operating system monitoring tool can be used to monitor I/O in real time. First
The method is to use the vmstat tool. Using this tool, you can view the value of Column B. If the value is relatively high, it indicates that there are many processes waiting for I/O, and I/O may have problems:

$ Vmstat 1 10
Procs memory
R B W AVM free
2 12 0 14572705 92752
2 12 0 14572705 93548
2 12 0 14572705 92910
2 12 0 14572705 93467
2 12 0 14572705 93546
2 12 0 14572705 93864
2 12 0 14572705 94557
2 12 0 14572705 93952
2 12 0 14572705 94017
2 12 0 14572705 93047

If the above Command finds that B is relatively large, it indicates that there may be I/O waits. The SAR command or iostat command can be further confirmed. If the sar command is used for monitoring
WIO is relatively large (for example, if the value exceeds 40, this value is an experience value and can be adjusted based on different systems), then I/O performance bottlenecks can be identified, as shown below:

$ Sar 1 10
HP-UX cuyn16 B .11.11 U 9000/800
15:01:44% USR % sys % wi/o % idle
15:01:45 16 3 57 24
15:01:45 15 2 59 19
15:01:45 21 3 57 16
15:01:45 20 2 63 14
15:01:45 17 2 67 12
15:01:45 12 1 75 7
15:01:45 16 2 75 5
15:01:45 10 1 84 1
15:01:45 18 2 79 6

If the WIO value is found to be longer than 40, the I/O wait is serious. You can use the SAR-D command to monitor which I/O devices have performance problems.
. If a device is found to be busy for more than 90% times, the device is busy. If the avserve of the device is large or much higher than that of other devices,
It indicates that the device has performance problems. For example:

15:02:01 device % busy avque R + w/s blks/s avwait avserv
Average c0t0d0 2.00 0.50 6 27 3.62 6.03
Average c3t8d0 1.10 0.50 4 16 3.23 5.69
Average c55t0d5 99.40 0.50 18 73 5.41 54.50
Average c55t1d0 4.20 0.50 5 15 5.39 8.49
Average c55t1d1 79.52 0.76 24 810 9.09
Average c55t11d0 68.33 0.52 23 2909 5.60
Average c55t11d2 31.07 1.14 25 1630 10.95
Average c55t11d3 16.98 0.51 22 3075 5.24 13.39
Average c55t11d5 71.83 2.59 26 1643 42.18
Average c55t11d6 76.12 0.50 23 3012 5.58
Average c55t12d0 30.57 1.02 26 1637 10.86
Average c55t12d1 21.48 0.50 20 2826 5.12
Average c55t12d3 80.72 2.74 29 1880 42.78 84.38
Average c55t12d4 70.03 0.52 23 2887 5.83
Average c55t14d1 100.00 54.57 104 6630 1315.98
Average c55t13d1 77.72 0.55 19 297 5.79 80.19

From the above data, we can see that some devices (such as c55t14d1) have a long wait event and avwait + avserv has a long wait time, indicating that the device exists.
Performance bottleneck. The avwait + avserv of most hdisk is still normal, which indicates that there is no universal performance bottleneck in the storage system, and the performance bottleneck is mainly concentrated in
Hotspot disk group.

Oracle's stacspack tool can also be used to check system I/O problems. If the system performance is poor, and the sequential read wait can be seen in the report
Events are the first few events in the system wait event, which accounts for a high proportion of the total wait time of the system. Therefore, the system may have I/O performance problems. Check
File I/O to further identify and identify devices with performance problems. The method is to check the average read response time of the file I/O. If the average read response of a file is
If the time exceeds 20 ms, the file system to which the file belongs may have performance problems. It should be noted that 20 ms is a relative parameter, in different application environments
This value may be different. By comparing the operating system, the database administrator should be able to quickly determine the average read response time and operating system of the system you manage
I/O ing.

Pay attention to several issues when checking the average read response time. The first problem is the I/O volume in the report time area. If the I/O volume of a file in the report time range is small,
The average response time may be less representative and can be further confirmed by checking other files stored on the same device. In another case, the average read per time
If the data volume is large (the number of blocks read each time is large), a slightly higher average read response time is also normal. The following data is removed from the problematic I/O database.
Statspack data:
Top 5 timed events
------------------------
Event waits time (s) % Total Ela time
Bytes --------------------------------------------------------------------------------------------------
DB file sequential read 661,802 45,404 60.79
SQL * Net more data from dblink 3180,371 7,894 10.57
CPU time 5,077 6.80
DB file scattered read 56,959 3,846 5.15
Buffer busy waits 42,376 2,541 3.40

It can be seen that dB file sequential read is the first event in the waiting event, accounting for more than 60% of the total waiting events, and each average wait for the Municipal Construction Committee
69ms. This is an example of a typical I/O performance bottleneck. Through statspack report file I/O performance analysis data, you can further check which files
A problem occurred:
Index_space_other/dev/vg10xp/rls_2g_vol05
9,171 2 52.2 1.0 7,911
/Dev/vg10xp/rls_2g_vol106
8,016 2 22.8 1.0 8,292
/Dev/vg10xp/rls_2g_vol107
7,567 2 9.8 1.0 8,058
/Dev/vg10xp/rls_2g_vol108
5,456 1 46.7 1.0 6.180
/Dev/vg10xp/rls_2g_vol109
5,925 2 57.3 1.0 6,265
/Dev/vg10xp/rls_2g_vol110
10,462 3 147.7 1.0 6,867
/Dev/vg10xp/rls_2g_vol111
6,071 2 130.0 1.0 5,638
/Dev/vg10xp/rls_2g_vol112
15,624 4 226.9 1.0 15,736
/Dev/vg10xp/rls_2g_vol112
14,421 4 206.2 1.0 17.136
/Dev/vg10xp/rls_2g_vol112
13,677 4 229.9 1.0 11.828
...

It can be seen that the average reading time of some files on/dev/vg10xp exceeds 200 ms, which causes serious performance problems. After verification,/dev/vg10xp is the logical volume on c55t14d1.

Analysis of I/O problems in statspack reports

I/O performance analysis is an important part of statspack analysis. I/O analysis can be based on two aspects. The first aspect is in wait events for dB, such as the following data
AVG
Total wait waits
Events waits timeouts time (s) (MS)/txn
DB file sequential 13,409,809 0 424,347 29 93.6
SQL * Net more data to client 1,187,633 0 8,024 7 7.7
Buffer busy waits 212,482 0 5,888 28 1.4
DB file scatter read 143,778 0 3,154 22 0.9
From the above view, DB file sequential read (29 ms) and DB file scatter read (22 Ms) have a high waiting time, indicating that I/O has obvious performance.
Problem. For different systems, the normal range of these values is not the same, and baseline data can be formed through long-term accumulation. However, you should pay attention to it after 10 ms.
The I/O Sub-system may be faulty if it exceeds 20 ms. In this example, the system's I/O performance has some problems. To determine the guess, you can further check the file I/O
Details:

File IO stats DB: hscp instance: hcsp snaps: 21-33
-> Ordered by tablespace, file

Tablespace filename
----------------------------------------------------------------------------------------------
AV buffer AV Buf
Reads reads/s RD (MS) blks/RD writes/s waits wt (MS)
---------------------------------------------------------------------------
Haier_test_data/dev/vgrac/rhaier_test_data02
132 0 163.6 2.5 65 0 0
Hcsp_ai_data/dev/vgrac/rhcsp_ai_data_01
1,363 0 19.1 1.0 1,349 0 0
Hcsp_ai_index/dev/vgrac/rhcsp_ai_index_01
4,649 1 17.5 1.0 3.614 0 2 0.0
Hcsp_base_data/dev/vgrac/rhcsp_base_data01
329,857 38 14.3 2.8 77,415 33,510 11.2
Hcsp_base_indx/dev/vgrac/rhscp_base_index01
72,577 8 14.7 1.0 419 0 111 3.2
Hcsp_comm_data/dev/vgrac/rhcsp_comm_data01
7,789 1 112.1 2.7 692 0 3,884

From the above data, we can see that the average read response time of most file access exceeds 20 ms. Basically, we can conclude that there is a problem with the system I/O.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.