Talking about an annoying deadlock tracking experience

Source: Internet
Author: User
Tags microsoft sql server 2005

Recently, when I first moved to a new company, I suddenly experienced a large volume of interruptions in the production database at night and lasted for nearly an hour. In the event of an accident, I was not at the scene, I missed the opportunity to directly obtain information. After that, my boss asked me to trace the cause, so the difficult troubleshooting process began.

At first, I thought it was caused by an exception in the running of a database JOB or a large deadlock caused by the spam statement written in the program. So I took the collected trace information for local analysis, according to the collected trace information, the database encountered a lock at 19:49:28, and the system cancel it, which was three consecutive times. After that, most of the database connections were Abort.

 

The preliminary estimate should be a deadlock. The first thought was due to the database update statement, so I checked whether there was a JOB running at the corresponding time in the Agent, and the results did not match, then, analyze whether the trace file contains long Update, Insert, or Delete statements that have been run for this period of time. After a long period of time, the trace file does not find the long Update, Insert, or Delete statements ..., investigation long query, still not, crazy sweat...

Trace File Analysis and analysis can not locate the specific statement Trace file only captures records that run for more than 2 seconds or read more than 10000), it seems that the problem is not that simple; it is impossible to find the murderer based on the Trace file information, so I checked both the Windows Log and the database error log, and did not find any exceptions... I have not found any information, and I am worried that my job will not be guaranteed)

I also asked some cool people if I wanted to think about it. It seems that there is little hope to find out the problem through the existing materials on hand. I can only find a new path; since it must be caused by a deadlock, it indicates that there must be inconsistent access or competition for resources in the database, so we can start with the deadlock and first clear the error log file of the current database, then open the 1204 and 1222 tracking signs and wait for the fish to hook up.

 
 
  1. DBCC errorlog 
  2. DBCC TRACEON (1204, 1222, -1); 
  3. DBCC tracestatus 

After collecting data for a few days, prepare to collect the network, copy ERROR. LOG from the server to the local machine, open it with UE, read the line carefully, and find the following information:

 
 
  1. Deadlock encountered .... Printing deadlock information 

It was a deadlock. I finally found the real culprit. I used UE to search for it. I don't know. I was shocked by a check and a bunch of deadlocks...

The deadlock information is as follows:

View the page information:

 
 
  1. DBCC TRACEON (3604)   
  2. DBCC PAGE('XXXX',1,22664690,3) WITH TABLERESULTS  
  3. DBCC TRACEOFF (3604) 

The page information is normal.

Based on the information, the Input Buf information, in turn, queries the Trace file, the Input Buffer can only store 255 bytes, the information is not displayed completely, and can only be searched in turn ):

 
 
  1. select top 10 * from fn_trace_gettable('F:\perfmon\Trace_xxxx_20110406A.trc',default)   
  2. where TextData like '%@SMESSAGE varchar(8000),@SINMATERECID varchar(8000),@SREFRECID varchar(8000)%' 

 

 
 
  1. select top 10 * from fn_trace_gettable('F:\perfmon\Trace_xxxx_20110406A.trc',default)   
  2. where TextData like '%@P0_ varchar(7),@P1_ datetime,@P2_ datetime,@P3_ varchar(7),@P4_ datetime%' 

 

It is found that the deadlock occurs on the Rec_Main table. One is modification, and the other is query. The data volume of this table is more than 700 million, and the other is sweaty .....

 

But it is strange that the information displayed by the deadlock is some simple modification of a single record) and the query operation will not cause so many deadlocks. Continue to the first step...

After tracing, we find that these two operations frequently record traces for one day in the database ):

 
 
  1. select COUNT(0) from fn_trace_gettable('F:\perfmon\Trace_xxxx_20110406A.trc',default)   
  2. where TextData like '%@P0_ varchar(7),@P1_ datetime,@P2_ datetime,@P3_ varchar(7),@P4_ datetime%' 

 

 
 
  1. select COUNT(0) from fn_trace_gettable('F:\perfmon\Trace_xxxx_20110406A.trc',default)   
  2. where TextData like '%@SMESSAGE varchar(8000),@SINMATERECID varchar(18),@SREFRECID varchar(8000)%' 

The number of deadlocks in a day is about 20, and there is only one record at the Update, but why is there such a frequent deadlock? Besides the large data volume of the table
In addition to super abnormal queries, this is the main cause.) Is there any other reason?

Finally, I couldn't find the problem, and reflected the problem to Microsoft. After several days of communication with Microsoft, the final result was that the database Bug was vomiting, so I was lucky ), microsoft replied as follows:

Problem Analysis:

==========

Your database version is: Microsoft SQL Server 2005-9.00.4207.00 (Intel IA-64) Enterprise Edition, SP3 CU1

I checked the deadlock information you provided found that, as you said, your deadlock problem is caused by SQL2005 Bug, for details can refer to the document:

Solution:

================

You can install SP3 CU6 separately or install SP4 directly to solve this problem.

SP3 CU6: http://support.microsoft.com/kb/974648>

SP4: http://www.microsoft.com/downloads/en/details.aspx? FamilyID = b953e84f-9307-405e-bceb-47bd345baece>

Your SQL Server is of the IA64 version, so select the correct installation file

------- Supplement ---

Although Microsoft has provided a possible reason, it does not know that it is not reliable), but based on the current system situation, the boss does not recommend that you fix the patch before regular maintenance ), so it was not difficult for me to submit an improvement plan with the above tracking information. So I sent a mail to the boss:

Based on this information, we can know:

The deadlock mainly occurs on the Rec_main table, which has more than 7 million data records;

It is mainly caused by a deadlock between an Update statement and a Select statement. These two statements run frequently and the query statement is very long as an attachment. Such queries cannot use indexes );

The main problem lies in the complicated query and table data volume, as well as the bug mentioned by Microsoft );

Currently, such deadlocks often occur in our databases.

To handle this situation, we recommend the following:

Whether the table information can be separated to reduce the data volume;

Can developers improve this query statement;

Query whether withnolock can be added), and the Update statement can be added with withrowlock );

For other partition tables ).

Finally, we moved some of the data in the table, modified the query statement, and added withnolock.) The problem was solved. For the Microsoft solution, see the next decomposition)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.