Database Combat Case ————— Remember the problem of tempdb explosion

Source: Internet
Author: User

Objective

A lot of times the database's tempdb, log files, and so on, may cause disk space to be full, if the daily configuration is not in place, often lead to database failure, the business is forced to interrupt.

This file explosion is difficult to troubleshoot, some experienced operators may be unable to troubleshoot specific reasons, resulting in problems can not be completely resolved.

Scenario Description

The customer system is relatively stable, using 5 machines to do the AlwaysOn high Availability group, fully realize the read and write separation. The disk has also been planned, and the main library daily operations tempdb requires less than 20G, so the disk where tempdb resides is only configured with 100 g of space.

This is a case where the customer suddenly receives a monitoring alarm that shows that tempdb is running out of disk space and the available space is continuously reduced until exhausted.

Compared to the drama, this customer has just done a patrol database in the morning, stable, nothing unusual.

Then I initially decided that this must be a special operation or the application configuration error caused by the problem.

In-depth indicators analysis file look at the problem

The tempdb explosion is bound to accompany the growth of the file, first we look at the growth of the tempdb file.

  

It can be seen that the allocation space of tempdb starts to surge at 14 O ' 50, and attentive friends will find that this is 1 g to 6 g growth, because the customer's tempdb is configured with 16 data files:

  

  

Note: Why configure so many tempdb files see: Expert Diagnostics Optimization Series------------------to cool tempdb

What caused the growth

Causes tempdb to burst cause many statements use temporary tables, statement ordering, CHECKDB, and so on, but these can be reflected in the statement. So let's analyze the statement below.

Note: Many friends who have used the SQL expert cloud platform or tools may not be aware of some of the details, in fact, many details (such as the above tempdb file growth trend and the following statement allocation space) are designed to solve some difficult problems.

  

The first two indicators in the statement user-allocated space (MB) and the internal object allocation space (MB) refer to the use of tempdb.

Note: User allocation space may be more temporary table use, internal object allocation space may be a sort or hash join operations (other use of the consumption can be see the previous link to the article)

  

The larger the allocation space, the more the statement consumes the tempdb resource.

There are two ways to find out exactly what the operation is causing the tempdb to burst, you can look directly at the point-in-time statement, or you can find it in the high to low order of tempdb resource consumption!

To get a thorough look at the problem with tempdb, here's the second way.

So let's analyze the situation where the statement is consumed by the tempdb resource:

  Step 1: First we sort by the user object allocation space:

After troubleshooting, the user space allocation is relatively high in the CDC job, the server does run the CDC job of these libraries. Some of the other operations of the user allocation of space is relatively small, so this is not the cause of the problem!

 Step 2: Then we sort by the internal object allocation space:

 

It is found that the most consuming space is the operation of CHECKDB, but the point of time is not corresponding, so this is not the cause of the problem.

Continue troubleshooting:

In this statement, where the consumption is ranked in the third, we find that waiting for resource fgcb_add_remove (which can be easily understood as a large number of file auto-growth, here is 16 tempdb files), and the use of the internal object space is also very high, And we also found that there are multiple sessions that perform such high-consumption operations at the same time.

Continue in Depth:

 

The appearance of the performance counter is also consistent with the previous signs.

  

Troubleshooting conclusions

Comprehensive indicators of the phenomenon, you can analyze the system at 14:57 P.M. after the left to perform tempdb high-consumption operations, the statement itself is a nearly thousands of rows involving a large number of table connection sorting operations such as complex stored procedures, the tempdb caused by the problem of the main responsibility for the explosion, What's more, it can be seen from the identity of the session that this is not a single execution of a statement, but rather a large concurrent operation at a particular time.

Communicate with the relevant business people, found that this is a group of similar reports of the large consumption operation, because of the function of the adjustment, and the program personnel one error operation and wrong point to the main library in the cluster, and caused the problem.

--------------Blog Address-----------------------------------------------------------------------------

Original address: http://www.cnblogs.com/double-K/

If there is reprint please keep the original address!

----------------------------------------------------------------------------------------------------

Summarize

The result of the problem is often simple and relatively easy to solve, but the comprehensive indicators in-depth analysis of the problem is worth and every technical personnel to discuss, which is why a long case to analyze a small point of reason.

  

Rethinking: The server in this case the architecture design is relatively complete, has done a read and write separation can easily put such a large operation to the secondary server, and can do load balancing, then if your stand-alone server also has similar such a report operation, what will you do?

----------------------------------------------------------------------------------------------------

Note: This article is original, welcome reprint, please in the article page obvious location give this article link!
If you think this article is not bad please click on the lower right corner of the recommendation , thank you very much!

  

Database Combat Case ————— Remember the problem of tempdb explosion

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.