Original: SQL Server Tuning Series basics (Parallel operations Summary)
Objective
Last three articles we describe how to view query plans, as well as some of the commonly used join operators, Union operators optimization techniques.
In this paper, we analyze the parallel operation of SQL Server, as a multi-core computer prevalent today, SQL Server will adjust its own query plan in time to adapt to the expansion of hardware resources, make full use of hardware resources, maximize performance.
Gossip less, directly into the topic of this article.
Technical preparation
As in the previous few, the SQL SERVER2008R2 version was used to parse a more concise case library (Northwind) from Microsoft.
One, parallel operators
In our daily written T-SQL statements, not all of the optimal execution plan is the same, the formation of the optimal execution plan needs to be evaluated in many ways, mostly based on the statistical information formed by SQL Server itself, and then the formation of multiple execution plans to evaluate, and then choose the best way to execute.
While SQL Server evaluates the statistics based on the contents of the library, it also depends on the hardware resources currently running, and sometimes it thinks that the optimal scheme may not be supported by the current hardware resources, such as memory limit, CPU limit, IO bottleneck, etc., so the execution plan should depend on the underlying hardware.
When SQL Server discovers that a processing data set is larger and consumes more resources, but when there are multiple CPUs in the hardware, SQL Server attempts to use a parallel approach, splitting the dataset into several, and processing several threads at the same time to improve overall efficiency.
In SQL Server, you can set the number of CPUs available to SQL Server by using the following methods
The default SQL Server automatically selects the number of CPUs, which, of course, does not rule out certain scenarios, such as high concurrency in a production environment, to prevent SQL Server from monopolizing all CPUs, so it provides the interface for that configuration.
There is also a system parameter, which is known as the MAXDOP parameter, and can also change the configuration of this system parameter, which can also control the number of parallel per operator (remember: here is each operator, not all ), let's look at the parameter
The number of settings for this parallel operator, which specifies the maximum number of parallelism per operator, so sometimes we look at the DMV view sys.dm_os_tasks, which looks at the number of system tasks, and most likely see the amount of thread data that is greater than the degree of parallelism, which means that the thread data may exceed the degree of parallelism. The reason is that two operators have re-partitioned the data and are assigned to different threads.
It is recommended to use the default setting best if there is no special case.
Let's take a group example to understand the parallel operation
There are several advantages of improving performance by parallel operation:
- Does not depend on the number of threads, automatically add or remove threads at run time, to achieve a performance optimal value under the condition of guaranteeing the normal throughput rate of the system
- Able to adapt to tilt and load balancing, such as a thread running slower than other threads, the number of scans or runs of the thread will be automatically reduced, while other fast-running threads will correspondingly increase the number of tasks, so the total execution time will be reduced smoothly, rather than a thread blocking the overall performance.
Let's give an example to explain in detail
The
Parallel plan generally applies to the larger data volume table, the small table adopts serial efficiency is the highest, so here we create a new test large table, and then insert some test data, we insert 250000 rows, the overall table more than 6500 pages, the script is as follows
--Create a new table, create a primary key, form a clustered index create TABLE BigTable ( [KEY] int, DATA int, PAD CHAR (+), CONSTRAINT [PK1] PRIMARY KEY ([KEY]) GO-- BULK INSERT test data 250000 rows set NOCOUNT on DECLARE @i intbegin TRAN SET @i= 0 While @i<250000 BEGIN INSERT BigTable VALUES (@i,@i,null) SET @i [email protected]+1 IF @i%=0 BEGIN Commit TRAN BEGIN TRAN endend commit Trango
Let's execute a script for a simple query
SELECT [Key],[data]from BigTable
Here for this query script, without any filter conditions, there is no need to adopt parallel scanning, because the way to get the data by serial scanning is faster than the parallel scan, so here is the clustered scan method, we add a filter to see
SELECT [Key],[data]from bigtablewhere DATA<
For this T-SQL statement that has a filter, here is the logical way of using parallel operations, the clustered index is also a parallel scan, because my computer is 4 logic CPUs (actually 2 physical cpu,4 threads), so here is the 4 thread parallel scan four times table, Each thread scans a portion of the data and then summarizes it.
This is a total of 4 threads, where thread 0 is the dispatch thread, which is responsible for scheduling all other threads, so it does not perform a scan, and thread 1 to thread 4 performs the scan of this 1000 line! Of course, there is less data, some threads are assigned 0 tasks, but the total number of scans is 4 times, so these 4 threads scan the table in parallel.
Perhaps the result above is relatively simple, and some thread tasks have not been allocated full, we have to find a relatively slightly complex statement
SELECT MIN ([DATA]) from BigTable
This execution plan is quite simple, we analyze from right to left, then execute as:
4 Parallel clustered index scans-->4 threads in parallel gets out before the minimum number of thread-to-execute 4 minimum-sum-of-execution stream aggregation gets the minimum value in 4 numbers-the output result item.
Then 4 threads, one stream aggregate per thread, gets the minimum number of current threads
Then, the four least-valued operator with the next "degree of parallelism" is aggregated into a single table
And then the next is the flow aggregation, from this 4 rows of data to get the minimum value, to output, about the flow aggregation we have described in the previous article
The above is a standard multi-threaded parallel operation process.
In the process above, because we are using the parallel clustered index to scan the data, 4 threads are basically apportioning the amount of the task, which means that each thread scans the amount of data basically equal, and below we put a thread in a busy state to see the SQL Server will not dynamically split the task into a few other busy threads.
We are here to add a large data scale, the script is as follows
SELECT [Key],[data],[pad] into Bigtable2from BigTable
Let's write a query with a large number of statements, make it occupy a thread, and here we force the designation to run with only one thread
SELECT MIN (b1.[ key]+1
The above code want to run out of results, I this computer configuration estimate less than five minutes, and we also force serial operation, speed imaginable,
We then execute the above statement to get the minimum value to see the execution plan
SELECT MIN ([DATA]) from BigTable
In the execution plan, we see the number of threads scanned by the clustered index
As you can see, thread 1 has reduced the number of data by nearly four, and the amount of data scanned is increased sequentially from thread 1 to thread 4.
The above statement clearly specifies that MAXDOP is 1, which in theory can only affect one thread, and why are these threads affected? In fact, the reason is very simple, my computer is the physical CPU only two cores, so-called thread number is only hyper-threading, so the non-traditional sense of the true 4-core number, so the threads are mutually affected.
Let's look at an example of a parallel join operation, and we'll look at how a parallel nested loop leverages the resources
SELECT B1. [KEY],B1. Data,b2. DATA from BigTable B1 JOIN BigTable2 b2on B1. [KEY] =B2. [KEY] WHERE B1. DATA<
In the above statement, we have a clustered index in the key column in the BigTable, and the data column does not exist in the query condition, so this is definitely a clustered index scan
Look at the execution plan.
We analyze the process in turn, combining the execution plan analysis of the text with more accurate, left-to-right analysis
The first step is to use the full table through the clustered index scan to get the data, because here the parallel clustered index Scan, we look at the number of parallel threads and the number of scans
Four thread scans, where thread 3 gets 100 rows of data.
The 100 rows of data are then reassigned, where each thread is assigned an average of 25 rows of data
In this case, the results we are going to get are all divided into 4 threads, each thread is assigned 25 rows of data, and the next step is to give the nested loop connection, because we need to get the data rows from BigTable2 in the above statement, so we have chosen a nested loop, Scan BigTable2 to get the data in turn.
For Nested loop join operators, you can refer to my second article.
We know that this is the number of loops on the outside, which means that there are 4 threads executing nested loops in parallel. If the threads of each line is 25 rows, the internal table will be executed.
4*25=100 times.
Then, after the nested scan gets the results, the next step is to summarize the results of each thread execution through the parallel operators and then output
The above procedure is the execution flow of a parallel nested loop. Fully utilize the hardware resources of the four core.
Reference documents
- Microsoft Books Online logical operators and physical operator references
- Refer to the book SQL. server.2005. Technical Insider "series
Conclusion
This article first to this bar, the article is short, easy to understand, follow-up on the parallel operation of a part of the content, follow-up articles, this article mainly introduces the parallel operators in the query plan, next we will add some of the SQL Server parallel operations, Then analysis of our daily write additions and deletions to these operators to optimize the items, interested in advance attention, about the SQL Server performance tuning of the content is very broad, followed by the analysis in the article.
If you have any questions, you can leave a message or private messages, and look forward to an in-depth study of your child's shoes with SQL Server. Learn together and progress together.
At the end of the article, we give a link to the previous article
SQL Server Tuning Series Basics
SQL Server Tuning Series Basics (Summary of common operators)
SQL Server Tuning Series Basics (Union operator summary)
If you read this blog, feel that you have something to gain, please do not skimp on your " recommendation ".
SQL Server Tuning Series basics (Parallel operations Summary)