Oracle Database Performance Model

Source: Internet
Author: User

I have been thinking about the following question recently:How to Create a performance model for a database? As a DBA, we are faced with a huge challenge: how to ensure that the database performance can meet the needs of rapidly changing applications, and how to continuously increase the data volume and access volume, ensure that the application response time and database load are at a reasonable level. We may often face the following problems: What is the response time for an SQL statement to be executed 100 times per second? What is the impact on the database after an application is released? Therefore, evaluating the impact of applications on databases, optimizing applications and predicting risks, and ensuring Database Availability and stability are truly valuable for application DBAs.

The response time is centered on:

If you want to select a performance indicator to evaluate the advantages and disadvantages of the system, there is no doubt it should beResponse time. Response time is the first element of customer experience. All optimization efforts should be made to reduce the response time. The same is true for database systems. The ultimate goal of optimizing the system and SQL is to reduce the response time and process more requests per unit of time.

Database time model:

The response time is generally divided into Service time and Wait time. The Service time refers to the CPU usage time of the process, including the front-end process) and Backgroud process. Generally, we only focus on the CPU time occupied by the foreground process. There are many types of wait time. The most common ones are IO wait and concurrent wait. IO wait includes sequential read, scattered read, and log file sync. the concurrent wait mainly includes latch and enqueue. SQL execute elapsed time refers to the response time of SQL Execution by the user process, including CPU time and wait time.

The following is the time model of the Oracle database:

In the Oracle system, we can use the AWR or Statspack report to view the database time information:

Statistic Name Time (s) % Of DB Time
SQL execute elapsed time 3,062.17 91.52
DB CPU 2,842.08 84.95
Parse time elapsed 25.87 0.77
PL/SQL execution elapsed time 11.75 0.35
Sequence load elapsed time 7.55 0.23
Hard parse elapsed time 5.06 0.15
Connection management call elapsed time 3.13 0.09
Hard parse (sharing criteria) elapsed time 0.04 0.00
Repeated bind elapsed time 0.01 0.00
PL/SQL compilation elapsed time 0.00 0.00
DB time 3,345.74  
Background elapsed time 204.91  
Background cpu time 72.30  

DB time is the total time consumed by the entire database user process. It is the sum of the time from the first item to the tenth item, from SQL execute elapsed time to PL/SQL compilation elapsed time ), however, we will find that the sum of these ten times is greater than DB Time, because some Time information overlaps, for example, SQL execute elapsed time includes a large part of the DB cpu time. The background elapsed time and background cpu time are the time and cpu time consumed by the Oracle background process.

Database Response Time Analysis:

The response time of the database system is determined by four factors: CPU, IO, memory, and network.,CPU and I/O are the most important factors.In comparison, memory and network are much simpler, because the latency of memory access is usually less than 100 ns for a tuning System, 1 MS = 1000000 ns) is almost negligible compared to CPU and IO. Network latency is usually a constant. For example, in a data center, the network latency is generally less than 3 ms. If multiple data centers exist, network latency may exceed 20 ms. Therefore, for a distributed system, network latency must be considered.

Here, we do not consider the distributed system, and ignore the memory access latency, focus on the analysis of CPU and IO. Let's look at the AWR segment of the following database:

In this system, the db cpu accounts for 87.21% Of the total DB time, the User I/O accounts for 9.12% Of the total DB time, And the commit-related IO waits accounts for 2.35%, mainly log file sync ), CPU and IO occupy 96.68% Of the total DB time. Because db cpu accounts for a large proportion, this database system is of the CPU intensive type. Here, the db cpu is mainly used to execute SQL service time.

Let's look at the AWR segment of another database:

We can see that Commit and User I/O account for 81.46% of DB time, while DB CPU accounts for only 13.82%. Therefore, this database system is of the IO instensive type.

Physical read

Physical read refers to the process in which Oracle does not find the corresponding block in the buffer cache and needs to read the corresponding block from the IO subsystem. The corresponding IO is called Physical IO, the number of physical reads indicates the number of blocks read by physical IO. Generally, the IO subsystem is a slow disk, so the physical IO has a great impact on the overall response time. If a large number of physical IO occurs, the response time of the entire system will become very poor. The system's IO subsystem may be a file system, bare device, or ASM, and the underlying hardware may be SAN storage, NAS storage, or common SAS disks. In order to increase the response time, the cache layer is usually added between the physical disk and Oracle. For Oracle, physical IO is not necessarily a real access to the disk, but may be a file system cache, storage cache and so on.

Regardless of the IO subsystem, Oracle only cares about the physical IO response time. Through the AWR report, we can see the physical IO Response Time:

The average response time of db file sequential read is 3 ms, while that of db file scattered read is 4 ms, the average response time of logfile file sync is 3 ms. The Wait class of the first two is User I/O, which indicates the response time of the read operation of the User process. The wait class of logfile sync is Commit, the response time of the lgwr process to write redo. Because the user commit must complete the log file sync operation, it will directly affect the response time of the user process write operation.

We have an experience with the response time of physical IO. For Sequential read and Scattered read10 msThe response is normal, and the response latency of IO subsystem is too large if the value is greater than 10 ms. Therefore, when we measure the performance of the storage system, only IO with a response time of less than 10 ms is considered effective. There is an interesting phenomenon here, that is, the response time of sequential read and scattered read is almost the same, that is, the random IO reads 8 K data and the continuous IO reads K data, the response time difference is very small, this is caused by the mechanical characteristics of the disk. The delay time = seek time +

For the response time of log file sync, because the user commit must complete log file sync, the write operation response time of the entire system depends on the response time, in addition, from the perspective of the entire database system, log file sync is almost serial, so this response time has a great impact on write operations. Our experience must be ensured in5 msBelow, if the write operation of the entire system exceeds 5 ms, it will be seriously affected.

Logical read

Logical read is the process in which Oracle reads blocks from the buffer cache. The corresponding IO is called Logical IO. The number of Logical reads represents the number of blocks read by Logical IO..Because Oracle must first read the block into the buffer cache (except direct path read), the number of logical reads includes the number of physical reads. For an SQL statement, the number of logical reads is the benchmark for measuring its performance, rather than physical reads. Although the response latency of physical IO is much higher than that of logical IO, the number of physical reads changes frequently with the execution times, causing the block to be cached in the buffer cache ). This is also true for a system,Logical read should be the core of the database Performance Evaluation Model,We need to establish the correspondence between logical read and response time.

The response time of each logical read is a huge challenge. Because many actions are hidden behind each logical read, including physical read, wait events, CPU time, and so on. I have analyzed AWR reports of many databases and hope to establish a simplified model based on experience. We assume that if a database is fully tuned, the wait time except CPU time and IO should be as small as possible to be less than DB time 10% ). On this premise, we only care about the impact of CPU time and IO, and divide the system into three categories: CPU-intensive, IO-intensive, and hybrid:

1. IO-intensive

User I/O 85%
Db cpu 5%
Response time per logical read: 0.1-0.5 ms

2. CPU-intensive

Db cpu 85%
User I/O 10%
The response time for each logical read is less than 0.01 ms

3. Hybrid

User I/O 60%
Db cpu 20%
Response time per logical read: 0.05-0.1 ms

The above data is an empirical value calculated based on AWR reports of many typical databases. The formula is simple: DB time/logical READ = response time of each logical read. The differences between hardware and OS are not considered, so this value is not very accurate, but we can still find some rules: as the proportion of IO increases from 10% to 85%, the response time also ranges from 0.01ms to 0.5 ms.

Predict system bottlenecks

For databases, IO sub-systems have a great impact on performance. Under certain IO pressure, the response latency must be controlled within a reasonable range of 10 ms and 5 ms ). Because the IOPS that each disk can withstand is basically determined, for example, a 15 k sas disk can provide 150 IOPS with a response latency of no more than 10 ms, without the impact of cache, The IOPS of the entire storage subsystem is relatively easy to calculate. Before the system goes online, we can perform a lot of tests to establish a model of storage IOPS and response latency. In this way, we can predict the risk of inflection point in performance and determine the scale-up in advance. In the AWR report, we can obtain the number and response time of physical IO per second, which facilitates performance monitoring and trend warning.

It is relatively simple to evaluate the CPU capacity bottleneck. in Oracle, the CPU time calculation is the sum of the time consumed by each CPU. If there are 16 (CORE) CPUs, in theory, 3600x16 = 57600 s CPU time can be provided in an hour. If the CPU time does not exceed 57600 s, we can think that the system will not queue on the CPU and there will be no CPU bottleneck. However, in addition to CPU usage, the operating system also needs to use CPU resources to manage memory and process scheduling. The sys part of the CPU usage we see on the OS is the CPU resources occupied by the system, so we should consider retaining at least 10-20% of the CPU resources for the OS.

Impact of concurrent access on databases

Oracle is a Disk-based database. The design is based on the fact that most of the data is stored in external storage, and only a small part of the data is cached in the buffer. It is different from the KV cache such as Memcache, it is also different from In-memory databases such as timesten. Therefore, even if all data can be cached in the buffer, a large number of latch waits may occur during high-concurrency access. The most common case is the cache buffer chain. When a large number of concurrent accesses to the same piece of data, it is very likely that latch of the cache buffer chain will compete for use, which is also known as the "hot spot ".

Note that latch waits in Oracle are divided into two parts: spin and sleep. spin consumes cpu time, while sleep consumes the waiting time. Therefore, a large number of latch waits not only produce a large amount of waiting time, but also consume a large amount of CPU time.

Oracle is a database designed for concurrent operations. A large number of concurrent read/write requests may cause additional performance consumption. For example, to read a part of frequently modified data, Oracle uses undo information to construct a large number of CR blocks to ensure consistent read, and generates a large number of logical reads, this will consume additional CPU and response time.

Storage may also have hot issues and requires full optimization of the storage system in the early stage. The common means is to use RAID technology to distribute data on different disks, prevents "Hotspot" disks. Oracle ASM provides a Rebalance function that allows DBAs to redistribute stored data to eliminate hot spots.

In short, Oracle is a database system that provides a large number of concurrent reads and writes, but in many places, Oracle cannot adopt some serial control measures, such as latch, enqueue, and mutex, what we need to do is to minimize the impact of these serial control on the overall performance of the database.

Database Optimization Principles

Oracle optimization principles based on response time: Minimize Wait time (Wait time) and improve Service time). This is also the basic principle based on the Oracle wait event analysis method: try to eliminate the impact of various wait events on the system, so as to improve system performance and response time.

If the wait time of the database system besides CPU and IO exceeds 5% of the DB time, there may be some performance problems, and the DBA needs to analyze the wait event, optimize the system or application.

-EOF-

Note: Why do you want to write such a topic? Because I recently discussed with a colleague about the automatic SQL review on the machine, I wanted to create a simple model to develop an SQL review tool, by using tools and pre-built models, developers can determine whether the SQL has any performance risks. Previously, when we were doing SQL optimization, we only focused on whether the SQL itself was optimized and what the logical reads were. However, few people establish the relationship between logical read and response time. I am trying to answer this question.

Capacity planning and risk prediction are actually very meaningful propositions, but we are often limited to some specific technical details, ignoring the grasp of the entire system capacity. In fact, this is also very difficult. Maybe so far, I have not achieved the degree of building a "model", but I am trying to link these factors and provide some useful experience for everyone, I think it makes sense.

In this article, I mentioned a few meaningful experience values, which I calculated based on the information in many database AWR instances. Although it is not guaranteed to be completely accurate, I think it is basically reliable. We recommend that each DBA find this information from AWR and determine the type, bottleneck, and performance risks of the database. When faced with such problems as "Whether the hardware can meet performance requirements", "whether the system needs to be resized next year", and "whether the application will affect the system, we can use these experience values to give a judgment.

This proposition is only a periodic result, and I will continue to think about it. If you are interested, welcome to discuss this topic with me.

This article is reproduced from the Hello DBA blog. Original article: Oracle database performance model

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.