Original: SQL Server performance Tuning (i)--judging system resource bottleneck from waiting state
View the status of all SQL Server tasks at that time (sleeping, runnable, or running) through the DMV
2005, 2008 provides the following three view Tudon detailed query:
DMV |
Use |
Sys.dm_exec_requests |
Returns information about each request that is executed in SQL Server, including the current wait state |
Sys.dm_exec_sessions |
The corresponding row is returned for each authenticated session. This diagram is a server-wide view. This view can first be traced to the server load |
Sys.dm_exec_connections |
Returns information about the connection established with the instance of SQL Server and details of each connection |
Sys.sysprocesses is for backwards compatibility, so it is recommended to use the above 3 DMV.
There is also a DMV:sys.dm_os_wait_stats that can return the number of waits and wait times for all waiting states since SQL Server started. is a cumulative value.
1. Lck_xx Type:
If SQL Server often has a blocking occurrence, you will often see a wait state that begins with "Lck_":
Wait status |
Description |
Lck_m_bu |
Waiting to get Bulk update lock (BU) |
Lck_m_is |
Waiting to get intent shared lock (IS) |
Lck_m_iu |
Waiting to get Intent update lock (IU) |
Lck_m_ix |
Wait for Intent exclusive lock (IX) |
Lck_m_rin_nl |
Waits for a null lock on the current key value and an insert range lock between the current cut and the previous key |
Lck_m_rin_s |
Waits for a shared lock on the current key value and an insert range lock between the current key and the previous key |
Lck_m_rin_u |
Waits for an update lock on the current key value and an insert range lock between the current key and the previous key |
Lck_m_rin_x |
Waits for an exclusive lock on the current key value and an insert range lock between the current key and the previous key |
lck_m_rs_s |
Waits for a shared lock on the current key value and a shared range between the current key and the previous key |
Lck_m_rs_u |
Waits for an update lock on the current key value and a shared range lock between the current key and the previous key |
lck_m_rx_s |
Waits for a shared lock on the current key value and an exclusive range lock between the current key and the previous key |
lck_m_rx_s |
Waits for a shared lock on the current key value and an exclusive range lock between the current key and the previous key |
Lck_m_rx_u |
Waits to get an update lock on the current key value and an exclusive range lock between the current key and the previous key |
Lck_m_rx_x |
Waits for an exclusive lock on the current key value and an exclusive range lock between the current key and the previous key |
lck_m_s |
Waiting to acquire a shared lock |
Lck_m_sch_m |
Wait for schema modification lock |
lck_m_sch_s |
Waiting to get schema share lock |
Lck_m_siu |
Wait for shared Intent update lock |
Lck_m_six |
Waiting to get shared intent exclusive lock |
Lck_m_u |
Wait for update lock |
Lck_m_uix |
Wait for update Intent exclusive lock |
Lck_m_x |
Waiting for an exclusive lock |
2, pageiolatch_x and Writelog:
In the data page of the cache pool, SQL Server locks the pages of memory in order to synchronize multiuser concurrency. The difference is that the addition is latch (lightweight lock), not lock.
If it happens Pageiolatch type of Wait, SQL Server must be waiting for a I/O the completion of the action. If such a wait occurs frequently, the disk speed does not meet the requirements and has become a bottleneck for SQL Server.
pageiolatch_x is the most common of the two major categories: Pageiolatch_sh and PAGEIOLATCH_EX,pageiolatch_sh: Often occurs when a user is trying to access a data page, while SQL Server reads the page from disk to memory. Indicates that the memory is not large enough to trigger SQL Server to do a lot of reading the page, causing a disk read bottleneck. There is a bottleneck in this. Disk is just a byproduct of memory pressure.
PAGEIOLATCH_EX : often occurs when users make changes to the data page. When SQL Server writes back to the disk. means that the speed of writing can't keep up. This is not directly related to memory.
Writelog : Another wait state related to disk, waiting for write logging, means that the write speed is obviously not keeping up.
3, Pagelatch_x:sqlserver in order to solve when inserting data, to the physical layer of the insertion conflict, so the introduction of another type of page Latch:pagelatch, when a task to modify the page, it must first apply for an ex latch. You can modify the contents of the page only by getting this. Since the modification of the data page is done in memory, the time should be very short and negligible. and Pagelatch only in the modification process only appears, so the life cycle should be very short, if there is, the explanation: 1, SQL Server does not have obvious memory and disk bottlenecks. 2. The application sends a large number of concurrent statements to modify the same table. and the design and user business logic makes these changes are concentrated on the same page, or a few pages, become hot page, usually on the OLTP system appears more. 3, this bottleneck can not be solved by improving the hardware configuration, only by modifying the table design or business logic, so that the changes scattered, improve concurrency.
Mitigation Methods for Hot page:
(1), for a data column to build a clustered index, rather than the identity of the field, the same time the insertion has the opportunity to spread to different pages.
(2) If you must build a clustered index on the identity field, it is recommended that you build several partitions on one of the other columns.
4. Pagelatch on tempdb:
The database not only adds latch when the data page is modified, but also adds latch on the system page of the data file, such as Sgam, PFS, and GAM page changes. Sometimes it can become a system bottleneck.
When creating a new table requires allocating space, SQL Server modifies the SGAM, PFS, and GAM pages, and the assigned pages are marked as used, so the pages are modified. In tempdb, however, this operation is concurrent and repeated. The hot performance of the data page is mitigated by adjusting the table design. Workaround for this:
1. Create the tempdb file with the same number of CPUs, and the same size, so that you can evenly distribute the pressure.
2, strictly prevent the tempdb space exhaustion. Prevent automatic growth when one of the files grows, destroying the average distribution.
3. You can use Sp_helpfile to view file information.
5, other resources waiting:
1, latch_x:
(1) A previous task has an access out-of-bounds exception, and SQL Server forces the task to terminate, but does not completely release the resources it has requested. Make it an orphan. The resources behind it are blocked. Just open the SQL Server log file (errorlog) to see if there has been an access violation problem, but generally cannot be resolved from the user level, only restart the server to resolve.
(2), at the same time, other resource bottlenecks, such as memory, thread invocation, disk, and so on, while latch wait just a derivative wait.
(3), when a data file space is exhausted, do automatically grow, the same point in time can only have one user task can do file autogrow action, other tasks must wait.
(4), in some special cases, it is possible that SQL Server itself did not handle the concurrency synchronization, the use of a more optimized algorithm, so that users are more likely to encounter the wait, some patches have fixed such problems.
The general wait is derived from other problems, first of all to check whether SQL Server is running healthily. Whether any exceptions have occurred. Whether there are other resource bottlenecks.
2, Async_network_io (network_io:2000):
This wait state occurs when SQL Server has the data ready, but the network does not have enough sending speed to keep up, so SQL Server data is not stored anywhere.
(1) This situation is generally not a database problem, tuning the database configuration will not be a big help.
(2) The bottleneck of the network layer is of course a probable cause: Is it really necessary to consider the need to return so much data?
(3) application-side performance issues can also lead to async_network_io waiting in SQL Server. If you see this type of wait, check the health of your application and check if your application needs to request such a large result set from SQL Server.
3. Memory-related wait states:
When a user task requests memory for a temporary application, some special wait states appear:
Coemthread/sos_reservedmemblocklist/resource_semaphore_query_complie
If you see these statuses on the DMV, verify that SQL Server has a memory bottleneck.
4, Sqltrace_x:
For busy SQL Server, it can have a negative effect on turning on the SQL Trace. If such a wait occurs, you should stop collecting SQL Trace immediately unless you have to.
6. Last bottleneck: Many tasks are in runnable state:
If this is the case, it proves that many tasks can be run but not running.
sys.dm_exec_requests/ sys.sysprocesses Status column, reflects the current state of all the tasks, if you see a lot of status is runnable, that should be serious, the normal SQL Server even if very busy, and should not often see runnable, even the state of running should not be a lot.
If you don't have a warning like 17883/17884, there are a number of runnable tasks that can occur for two reasons:
(1), SQL Server CPU utilization is close to 100%, there is really not enough CPU to handle the user's concurrent tasks in time. You should optimize the statements or applications that consume the most CPU resources, or add CPU
(2), SQL Server CPU usage is not high, less than 50%. At this time check sys.dm_exec_requests task_state column, will find a lot of runnable state. Because SQL Server has a more lightweight synchronization resource in addition to lock and latch: Spin lock (Spin lock). Spin: Some synchronization resources that do not occur for a long time, SQL Server chooses to have the thread wait a little bit on the CPU, rather than having the CPU resources.
You can use DBCC SQLPERF (spinlockstats) to view it.
On 2005 of 64-bit SQL Server, when memory is abundant, many execution plans are cached, and colleagues cache many execution plan security contexts. In memory clerk, using Tokenandpermuserstore, it is easy for concurrent users to encounter a spin lock called a mutex when it is relatively large. can refer to: http://suppot.microsoft.com/kb/927396. This problem is only easy to occur when the security context caches too much, so it is safe to execute the following statements on a regular basis and have no bad effect on the overall performance of the system:
DBCC Freesystemcache (Tokenandpermuserstore)
SQL Server can also be started with-t4618 and-t4610, allowing SQL Server to use a different cache management mechanism.
It is said that 2008 has been improved and is not prone to spin locks.
7, Summary:
What period of user request:
1, the client sends the request instruction to SQL Server, passes through the network layer, SQL Server receives.
In this step, if the instruction is longer, or more, it will affect the speed that SQL Server accepts.
2, SQL Server syntax, semantic check, compile, generate a new execution plan, or find the cached plan reuse: This step consumes more resources:
L CPU: Do check, compile, build plan all need to calculate, this step consumes CPU resources more, especially when the instruction is complex.
L Memory: For very long in clauses or composed of tens of thousands of, hundreds of thousands of statements, to spend very large memory, mainly using stolen memory, for the 32-bit system is very tense. These wait conditions typically occur: Cmemthread/sos_reservedmemblocklist/resource_semaphore_query_compile, or 701 error.
Schema lock on the table (schema Lock): To prevent modifications to the schema at compile time. If concurrency is high, it can cause blocking.
• Search in memory when SQL Server confirms that the execution plan for the thread is available. Spin locks may be generated.
3. Operation Instruction:
When you wait for the execution plan, you go into the run phase and use the most resources. There are a lot of things to do in this step:
(1), SQL Server first for the command to run the application memory.
If you need to execute a lot of instructions at the same time, you may encounter difficulties in memory and usually meet the following: Resource_semaphore_ the start of the wait state.
(2), if the data to be accessed is found not in memory.
To tell the data from disk read to memory, if you find that the memory does not have enough free pages to hold all the data, but also do the memory and paging action, free enough space to put the data. Usually the simple wait state is: pageiolatch_x.
(3), according to the execution plan, scan or seek the memory of the data page, the execution needs to process the records to find out. This step requires a variety of locks to be applied to enable transaction isolation. Usually causes blocking, which begins with Lck_.
(4), instructions may also have to do some connection or calculation work (SUM, max, sort, etc.)
This step mainly uses the CPU.
(5), depending on the content of the instruction, the execution plan, and the amount of data, SQL Server may also create objects in tempdb that hold temporary tables, table variables, help with joins, sort, and so on.
Tempdb bottlenecks can occur at this time.
(6) If the instruction needs to modify the data record, SQL Server modifies the contents of the page in the memory buffer.
Because the object is in memory, disk writes are not triggered, but because the same page is modified, it is easy to cause a pagelatch_x wait state.
(7), if the instruction has data modification, before committing the transaction, SQL Server must write the corresponding log records in sequence to the log file. If the instantaneous log volume is too large, a writelog wait state will appear.
(8), return the result set to the client: When the result is obtained, SQL Server puts the result set in the output cache, and the client takes all the result sets. instruction is not over. If the dataset is too large, it can cause too much network interaction. Easy to appear at this time: Async_network_io wait state.
The above action should be in the Sqlos first to get a worker/thread, and then also queued Scheduler, run on the CPU.
L SQL Server All the workers are busy with their own things, will wait, you can see the waiting state is 0x46 (Umsthread). And the value of sys.dm_os_schedulers.work_queue_count is not equal to 0.
L succeeded in getting the worker, but in scheduler and waiting for the other workers, when the state is runnable, and sys.dm_os_schedulers.runnable_tasks_count>1.
L Get Scheduler, enter running state, if CPU is consumed very much, the phenomenon of high CPU usage will occur.
L Experiencing performance problems, checking out the sys.dm_exec_requests type of DMV is helpful in finding a problem.
SQL Server Performance Tuning (i)--judging system resource bottleneck from waiting state