Some developers often reflect the following situation: I have a statement or a job that ran for half an hour yesterday. How can I run for two hours today?
Is there a problem with the database?
The abnormal running time of database statements is actually a complicated situation. Because the data is constantly changing, a good statement may be executed tomorrow.
This scenario cannot be completely retraced within the estimated time. Even if there is backup data at the time, there is no way to know and create the server pressure at that time.
But now we don't want to investigate the cause of yesterday's statement running time exception, but to find out the cause of the current statement running exception. The current situation is still in progress,
Therefore, we can troubleshoot the problem based on the current situation of the statement;
In fact, there are many issues to consider:
1. Check whether the index is normal (whether the index is damaged or whether an index has been deleted );
2. Check whether the statistical information is out of date;
3. Whether the statement execution plan is offset (related to indexes, statistics, and data volume );
4. Check whether the statement has a bug;
5. Whether or not a blocking occurs;
6. Whether the system resources encounter bottlenecks;
.........
It is difficult for us to solve these problems if we consider all these situations. Generally, we need to solve this problem in a quick way. The following methods are mainly used for troubleshooting: 5 and 6.
Because these two aspects are the most common.
Let's simulate the troubleshooting process:
1. Create test tables and data
USE [master]GO/****** Object: Table [dbo].[a] Script Date: 01/17/2012 16:46:34 ******/SET ANSI_NULLS ONGOSET QUOTED_IDENTIFIER ONGOSET ANSI_PADDING ONGOCREATE TABLE [dbo].[a]( [id] [int] IDENTITY(1,1) NOT NULL, [name] [varchar](100) NULL) ON [PRIMARY]GOSET ANSI_PADDING OFFGOinsert into a values('aa'),('bb'),('cc')
2. Manufacturing congestion: open two sessions and run the following statements respectively.
--Session 1use mastergobegin tranupdate A set name='abc' where id=2--rollback
--Session 2select * from a
Because the update Statement of session1 cannot be submitted, the process of session2 will be blocked at this time.
3. Analysis and troubleshooting:
First, we need to query whether the database is blocked at this time:
--Blockedselect * from sys.sysprocesses with(nolock) where blocked<>0
We can see the blocking record, 53 blocks 56, and the blocked resource is dbid 1 file 1 page 307;
Next, we need to know the blocking and blocked statements in two ways:
A. DBCC inputbuffer
B. SYS. dm_exec_ SQL _text
Method 1 compared with Method 2:
Advantage: method 1 can display the Statement of the inactive session. method 2 can only check the active SESSION (through sp_who2 active can show whether the session is active );
Disadvantage: method 1 can only be queried one by one, and method 2 can be queried together;
Method 1:
--No1:dbcc inputbuffer(53)godbcc inputbuffer(56)
Method 2:
-- NO2: Select S. session_id, R. blocking_session_id, S. host_name, S. login_name, databasename = db_name (R. database_id), R. command, R. status, current_execute_ SQL = substring (T. text, R. statement_start_offset/2 + 1, case when statement_end_offset =-1 then Len (T. text) else (R. statement_end_offset-statement_start_offset)/2 + 1 end), S. program_name, S. status, S. cpu_time, memory_usage_kb = S. memory_usage * 8, S. reads, S. writes, S. transaction_isolation_level, C. connect_time, C. last_read, C. last_write, c.net _ transport, C. client_net_address, C. client_tcp_port, C. local_tcp_port, R. start_time, R. wait_time, R. wait_type, R. last_wait_type, R. wait_resource, R. open_transaction_count, R. transaction_id from sys. dm_exec_sessions s left join sys. dm_exec_connections C on S. session_id = C. session_id left join sys. dm_exec_requests R on S. session_id = R. session_id and C. connection_id = R. connection_id outer apply sys. dm_exec_ SQL _text (R. SQL _handle) twhere S. is_user_process = 1 -- if this condition is not restricted, query all processes (system and user processes) and S. session_id in (53,56)
We can see that one or two statements of the method can be found, and method 2 can only find one statement;
Here, we can determine that the reason for slow statement execution is blocked. Let's check the reason for blocking. You can view the following statement:
select request_session_id,resource_type,db_name(resource_database_id) as DBName,resource_description,request_mode,request_type,request_status from sys.dm_tran_locks where request_session_id in(56,53)order by request_session_id
As you can see, 56 is in the wait status, and it is waiting to get a share lock at, but it is occupied by an exclusive lock at (grant stands
Resources have been obtained and are running), So 56 must wait until the exclusive lock on 53 is released before it can continue to run; so we turn to investigate the reason why 53 exclusive lock is not released; it may be
53. Other resources are occupied by other processes and are waiting for other processes to release the lock. It may also be because the update statement updates too much data and takes a long time.
The lock can be released in a timely manner. There is also our current situation where no transaction has been submitted (which can be seen directly in the Statement). Blocking troubleshooting methods are similar.
What if the statement is not blocked by other statements? The reason we need to further find out is wait. We already have wait-related queries. Let's check them.
More specific information:
-- wait & lockselect lo.request_session_id as [Session],DB_NAME(lo.resource_database_id) as Dbname,lo.resource_type as [Type],lo.resource_description,lo.request_mode,lo.request_owner_type,lo.request_status,case when lo.resource_type='OBJECT' then OBJECT_NAME(lo.resource_associated_entity_id) when lo.resource_associated_entity_id IS NULL OR lo.resource_associated_entity_id=0 then NULL else OBJECT_NAME(p.object_id) end as Associated_Entity,wt.blocking_session_id,wt.resource_descriptionfrom sys.dm_tran_locks lo with(nolock)left join sys.partitions p with(nolock)on lo.resource_associated_entity_id=p.partition_idleft join sys.dm_os_waiting_tasks wt with(nolock)on lo.lock_owner_address=wt.resource_addresswhere lo.request_session_id>50and lo.request_session_id=56 order by [Session] ,[TYPE]
As you can see above, 56 encountered a wait when obtaining the shared resource. Of course, the wait here is still blocked by 53, but there are multiple reasons to wait. Let's check
Current wait information:
--current wait infoselect wait_type,COUNT(0) as num_waiting_tasks,SUM(wait_duration_ms) as total_wait_time_ms from sys.dm_os_waiting_tasks with(nolock)where session_id>50group by wait_typeorder by wait_type
Here we can see the lock wait (wait_type), there are many resource types of waiting, the value focuses on:
Memory: cmemthread, resource_semaphore
Cmemthread:
Description and cause: indicates a problem with the plan cache (a large number of plans are added or removed );
Solution: Use parameterized queries or set mandatory database parameterization)
Resource_semaphore:
Description and cause: memory-intensive queries cannot obtain the requested memory; other processes consume too much memory;
Solution: Add appropriate indexes or increase memory for the database
IO: io_completion, async_io_completion, writelog, pageiolatch _*
CPU: cxpacket, sos_scheduler_yield
Cxpacket:
Description and reason: Parallel Processing wait type, parallel synchronization wait;
Solution: you can modify the parallelism value (or disable it;
Sos_scheduler_yield:
Description and reason: After the task is executed to the end of the time segment, the dispatcher can run the task for other tasks;
Solution: CPU with better processing capability
Network: async_network_io, db1__send
Async_network_io: The NIC bandwidth is saturated or the client cannot remove the result in time;
Db1__send: the network bandwidth is insufficient to support the volume of image transactions or the volume of image databases exceeds the limit;
Lock blocking: LCK _*
We can count the 20 most waiting types in our database:
--total wait infoselect top 20 wait_type,SUM(waiting_tasks_count) waiting_tasks_count,SUM(wait_time_ms)as total_wait_time_ms,SUM(signal_wait_time_ms) as total_signal_wait_time_msfrom sys.dm_os_wait_stats with(nolock)where wait_type not in --system wait type('LAZYWRITER_SLEEP','REQUEST_FOR_DEADLOCK_SEARCH','SQLTRACE_BUFFER_FLUSH', 'XE_TIMER_EVENT','FT_IFTS_SCHEDULER_IDLE_WAIT','LOGMGR_QUEUE','CHECKPOINT_QUEUE', 'SLEEP_TASK','BROKER_IO_FLUSH','BROKER_TASK_STOP','BROKER_TO_FLUSH','BROKER_EVENTHANDLER')group by wait_typeorder by total_wait_time_ms desc
Through this, we can see what aspects of DB wait are mainly concentrated. If it takes a long time to wait for CPU, Io, memory, lock, etc., it means we
The database needs to be optimized in some aspects.