Cause Analysis of slow database operation

Last Update:2018-12-05 Source: Internet

Author: User

Tags database issues high cpu usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

There are three common cases of slow Database
1. Slow down
2. Sudden slowdown
3. Slow down from time to time
In the first case, a long-term monitoring mechanism should be established to "gradually slow down. For example, when writing a shell script every day (usually 9 ~ 10 etc.) Periodically collects OS, network, and DB information. Reports are generated every week to analyze the collected information. The accumulation of such data can determine the optimization decisions in the future, and it can be the important data that the DBA persuaded the manager to adopt their own decisions. The value of DBA is embodied in the weekly report.
In the second case, "sudden slowdown" is also the easiest solution. From the business point of view, it is different from the previous use of dB, and then further judgment is made. Hardware/network faults usually cause a sudden drop in dB performance.
Step 1: Check the database/OS/Network System Log to eliminate Hardware/network problems.
Step 2: Check the wait events of the database and determine the possible problems based on the wait events. If there are no waiting events, you can eliminate database issues. If there is a waiting time, you can find the root cause of these events based on different waiting events.
For example, latch free and other wait events related to SQL parse, OS performance is high CPU usage
DB file scattered read and other wait times related to SQL disk read. The OS performance is that iostat can see an increase in disk read/write volumes.
Step 3: View OS information, such as CPU/IO/memory.
A. CPU usage
The CPU usage is not inversely proportional to the database performance. high CPU usage does not indicate slow database performance. generally, a database with a good optimization and a large business volume has a high CPU usage and is evenly distributed to each process. in turn, the CPU usage is high, which does not mean that the database performance is good. It is necessary to judge whether the CPU usage is reasonable based on the database wait events.
If the CPU usage of a process is high, it must be a problem with the process. if it is not an oracle process, the application can check whether the program has vulnerabilities such as endless loops. if it is an oracle process, you can look up the Oracle Data dictionary by using the PID to see the Initiating program of the process, the SQL statements being executed, and the waiting event. then, different solutions are used in different situations.
B. Io
Eliminate the hardware I/O problem and the database suddenly slows down. Generally, it is caused by one or more SQL statements.
If I/O is frequent, You can optimize the top SQL of disk reads. Of course, this is also the most stupid and most effective way to solve Io problems.
OS and storage configurations are also an important reason for Io impact.
For example, the most common issue of asynchronous IO in HP-Unix is that Oracle does not use AIO if DBA group has no mlock permission. however, if the admin of the operating system and the database are not well matched, this configuration is easily missed.
C. Memory
In the second case, the relationship with memory is relatively small. As long as the SGA configuration is reasonable and does not change, as long as it is not the application memory leak, it will not suddenly slow down.
In the third case, "slow down from time to time" is the most difficult solution. there are also various causes of problems on site. The most important thing is that when there is a slow phenomenon, the most information is captured as quickly as possible for analysis. write the shell script for data capturing, and press the Enter key in time when the phenomenon occurs.
Example
Database suddenly slows down
Background: after a new application is launched, the database suddenly slows down.
Step 1: investigate new applications
According to developers, new applications access newly created tables with a small amount of data and no complex SQL queries.
Query v $ sqlarea in the order of disk_reads/buffer_gets/executions. The top SQL statements do not contain the SQL statements of new applications. This eliminates the performance issues caused by access to new application databases.
Step 2: view the database log/OS log
Database Log can see a lot of ORA-7445 errors, as well as a lot of dump files. analyze the dump file (for a long time, you can refer to the dump file, but the details cannot be described .), it is found that the dump file generated when the new application accesses the remote DB through dblink. The application developer says it cannot be modified, and Oracle does not have the corresponding patch solution.
No error message in OS log
Step 3: Check the statspack report
From wait events, the top event is a buffer busy waits, And the DB file parallel write is equal to the IO-related wait event.
According to the statistical information of buffer busy waits, It is waiting for data block.
Some physical reads and other information are less abnormal than before.
I/O reads/writes of tablespace is not abnormal, but wait is obviously increased.
I/O problems are initially identified.
Step 4: View OS Information
1. Top Command (output as lab data, for format reference only)
Load averages: 0.05, 0.10, 0.09 10:18:32
307 processes: 304 sleeping, 1 zombie, 1 stopped, 1 on CPU
CPU states: 96.0% idle, 0.3% user, 2.6% kernel, 1.1% iowait, 0.0% swap
Memory: 4096 M real, 2660 m free, 1396 M swap in use, 3013 M swap free
PID username thr pri nice size res state time CPU command
11928 a21562 1 0 0 3008 K 2496 k cpu/1 0: 02 1.12% top
14965 mpgj76 4 59 0 10 m 3696 K sleep 0.18% view_server
At that time, the field data showed that the iowait value was much larger than before, and there were no abnormal processes.
2. Sar-D (output as laboratory data, for format reference only)
SunOS sc19 5.7 Generic_106541-42 sun4u 03/20/08
00:00:00 device % busy avque R + w/s blks/s avwait avserv
Sd410 17 0.4 50 1628 0.1
Sd410, a 0 0.0 0 0 0.0 0.0
Sd410, B 0 0.0 0 0 0.0 0.0
Sd410, C 0 0.0 0 0 0.0 0.0
Sd410, G 17 0.4 50 1628 0.1
At that time, the field data showed that avwait, avque, and blks/s devices storing data files were too large.
Step 5: view database wait events
If the performance of a database with a large business volume is poor, there will usually be a large number of wait events, and hundreds of wait events are very common. I usually perform group based on the event.
Select count (*), event from V $ session_wait where event not in ('smon timer', 'pmon timer', 'rdbms IPC message ', 'SQL * Net message from client') group by event order by 1 DESC;
The most wait event displayed in the output result is buffer busy waits.
Further analysis to find out the reason for waiting
Select count (*), P1, P2, P3 from V $ session_wait where event = 'buffer busy waits 'group by P1, P2, P3;
In the buffer busy waits wait event
P1 = file #
P2 = block #
P3 = ID (this ID corresponds to the reason for waiting)
According to P1, P2, and P3 group, buffer busy Waits is used to identify the objects in which the wait is concentrated.
Metalink describes the buffer busy waits wait event as follows:
"If P3 shows that the" buffer busy wait "is waiting for a block read to complete then the blocking session is likely to be waiting on an IO wait (eg: "DB file sequential read" or "DB file scattered read" for the same file # And block #."
The output results show that the waiting time is distributed across multiple different objects. The reason for the waiting is "waiting for a block read to complete", and the problem of Io is further analyzed.
If buffer busy Waits is waiting to be concentrated on an object, it indicates that there is a hot block. You can rebuild the object to add freelist, and add freelist group to the RAC environment.
You can use the following SQL statement to find the specific object.
Select owner, segment_name, segment_type from dba_extents where file_id = P1 and P2 between block_id and block_id + blocks;
P1 and P2 are the specific values detected by V $ session_wait.
Step 6: identify the cause and find the solution steps
Analysis:
1. The disk Io traffic increased by 2. The disk I/O is waiting to increase by 3. I/O traffic of DB does not increase by 4. I/O wait for an increase in dB
I/O access disks other than databases can be launched from 1, 2, 3, 4. Check disk configuration. The VG only stores database data files and database system files. Exclude data files and generate I/O database system files. Generally, database system files do not generate I/O, and only log and dump files are available for I/O reads and writes.
Conclusion: The ora-7445 generates a large number of core dump files to block Io
Solution:
1, eliminate the ora-7445. (application does not change the situation, can not solve)
2. Point the dump directory to another VG.
3. Allow Oracle to write as few core dump files as possible
Background_core_dump = partial
Shadow_core_dump = partial

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More