A text teaches you database performance tuning (attached to a large hospital real case)

Source: Internet
Author: User
Tags memory usage cpu usage

Objective

An engineer at Microsoft once had a very figurative metaphor for performance tuning: Peeling onions. I also very much agree, let us to a layer of a layer of the outside its mysterious veil.

Six major factors

The following are the most common diagrams we use to analyze database performance issues for our customers.

After reading this diagram, you have a basic concept of performance tuning. In general, we will analyze it in the following order:

Hardware capabilities

System size

Database internal factors

Software Environment

These 4 order can be adjusted or exchanged, but for the performance optimization of the system must start from the global . never go deep into an optimization of a SQL statement , because you may spend a lot of

Time. A SQL is optimized from 20s to 1s, but the entire system's card is still slow to exist.

The Last is

Business model and architecture

Code Design

Practical cases

No nonsense, open the whole, directly on the dry.

Date: January 2018 One day

Event: A hospital customer 4 o'clock in the afternoon suddenly appear large area of card slow. The whole system has serious problems, information center phone explosion, hospital engineers unprepared.

Fortunately we have a ' camera ' in the database, and the following is a look at what is sent from the surveillance video. And then fix it.

Hardware capability CPU

CPU usage is below 20% in the time period of the problem, normal.

Memory

The memory usage is normal, as shown in the image below.

Page life cycle

Available memory

Io

Io queue average is very low, about 15.48 has a instantaneous high, you can note that this period of time there is no batch write.

Overall, hardware resources are sufficient.

System size

When the problem occurs, the batch request per second is not an upward trend, but it has declined. This is because the system congestion, waiting, affecting the system's throughput.

Database internal factors wait

Slow statement

From the trend graph of sessions and slow statements, it is possible to see that the time of the problem and the customer's description are exactly the same, and we can conclude that the accident is indeed slow in the database.

What causes the slow

Check the time period of the statement in the run, you can find about 15.58 in the afternoon, the database began to appear more and more cmemthread waiting.

Up to 1900 pages and 16.08 minutes, there were up to 100 concurrent simultaneous occurrences of cmemthread waiting

What is Cmemthread waiting

Microsoft's official Description: Occurs when a task is waiting for a thread-safe memory object. The wait time may increase when multiple tasks attempt to allocate memory from the same memory object to cause contention.

This description is very obscure, and it is still completely unknown how the wait type is going to be, and how to deal with such problems.

In fact, the official description is a memory contention problem, but in fact the key to this problem is the contention of multiple tasks, which is actually the problem of concurrent execution.

Scene
    1. Occurs when a database compilation or recompilation is inserted into the plan cache when ad hoc execution plans are plans
    2. In a NUMA architecture, memory objects are partitioned by node

There are three types of memory objects (Global,per Numa node,per CPUs). SQL Server will allow fragmentation of memory objects so that only threads on the same node or CPU have the same underlying cmemobj, reducing thread interaction from other nodes or CPUs, improving performance and scalability. Reduce concurrent contention for memory

SELECTtype, pages_in_bytes, Case when(0x20 =Creation_options& 0x20) Then 'Global PMO. Cannot is partitioned by Cpu/numa Node. TF 8048 not applicable.' when(0x40 =Creation_options& 0x40) Then 'partitioned by CPU. TF 8048 not applicable.' when(0x80 =Creation_options& 0x80) Then 'partitioned by Node. Use TF 8048 to further partition by CPU'ELSE 'UNKNOWN'END fromsys.dm_os_memory_objectsOrder  byPages_in_bytesdesc

If you find that the memory overhead of partitioned by Node is in front, you can use trace FLAG 8048来 to reduce the cmemthread wait.

As you can see, the customer's partitioned by Node is relatively lean back, ranked at 14 bits.

3. Patches

This type of scenario is the most common. If a large number of Cmemthread waits are found in the system, prioritize whether the database is already installed with the latest patches

R2:FIX:SQL Server R2 performs poorly when most threads wait for the CMEMTHREAD wait type if the threads use TA ble variables or temp tables to load or update rowsets

2012, 2014 When you perform many special queries in SQL Server 2012 or SQL Server CMEMTHREAD wait

Hardware and Software Environment

The current version of the database is 11.0.5556.0 and the previously mentioned patch, the installed version is: 11.0.5623.0

What the code design is the statement that produced the wait

is similar to the following statement, at the highest time, with more than 100 concurrent.

SELECT     *  into  from where 1 = 2

Features are as follows:

1. Statement simple cost is less than 5 does not produce parallel

2. The SELECT INTO #temptable的形式 is used

As the above analysis, cmemthread wait is a concurrency problem, not a memory problem. When other scenarios do not work, we can reduce cmemthread wait by adjusting the wording of such statements.

Business model and architecture

At present, the system is the state of single-machine operation, which is actually very rare. There is a small number of OLAP and OLTP business mixes. Later, we will provide customers with a solution to plan read-write separation or load balancing. In

Solution installs the latest patches

At a minimum, you need to install fix for the pending issue that was previously issued. The recommendation is to install the latest SP4 patch directly to date.

modifying parameters

Optimize for ad hoc workloads changed from 0 to 1. Reduces the memory consumed by ad hoc queries for scenarios where an ad hoc execution plan is inserted into the plan cache by plans.

Increase the number of tempdb data files

The SELECT * into #temptable generates a lot of latch contention, preventing a large number of pagelatch latch contention after cmemthread wait for elimination. I've been through a lot of cases and solved a previous congestion,

There is a new wait behind, resulting in worse performance. Please remember that optimization is a long-term, gradual process.

Where to migrate the tempdb data files

The current part of the tempdb file is placed in S, generally divided on the D-disk. It is recommended that you migrate to the S-disk (above the storage) to increase the responsiveness of tempdb. If possible, using SSDs to maximize the performance of tempdb would be a good choice.

Code to optimize the program

The modified code is usually placed on the last side, because there are more cases involved. In the case of the previous means 80%, all can solve the problem. For the rest of the 20%, we need to check the logic in the program to see what these statements are all business generating. What conditions would trigger such a business. Statements similar to the following use stored procedures, or parameterized methods, to reduce the number of compilations and recompilation. In addition, this type of statement creates temporary tables concurrently, possibly by adjusting the settings of tempdb, speeding up the execution of such statements, and reducing the concurrent number of such statements at the same time.

Optimization effect

After several optimization methods, the next day began, no more cmemthread waiting.

Wait

Slow statement

Summarize

Through this document you should have fully learned the idea of tuning database performance. He told us how to do one step at a time to troubleshoot problems, just like peeling onions and peeling them off a layer.

Reference


Microsoft Official blog on this kind of wait principle and how to debug how it works:cmemthread and debugging them

SQL Server 2016 further optimizes the problem here, with detailed reference to SQL 2016–it Just Runs faster:dynamic Memory Object (CMemThread) partitioning

A text teaches you database performance tuning (attached to a large hospital real case)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.