SQL Server Tuning Series basics (Parallel operations Summary)

Last Update:2017-12-22 Source: Internet

Author: User

Tags bulk insert logical operators

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective

Last three articles we describe how to view query plans, as well as some of the commonly used join operators, Union operators optimization techniques.

In this paper, we analyze the parallel operation of SQL Server, as a multi-core computer prevalent today, SQL Server will adjust its own query plan in time to adapt to the expansion of hardware resources, make full use of hardware resources, maximize performance.

Gossip less, directly into the topic of this article.

Technical preparation

As in the previous few, the SQL SERVER2008R2 version was used to parse a more concise case library (Northwind) from Microsoft.

One, parallel operators

In our daily written T-SQL statements, not all of the optimal execution plan is the same, the formation of the optimal execution plan needs to be evaluated in many ways, mostly based on the statistical information formed by SQL Server itself, and then the formation of multiple execution plans to evaluate, and then choose the best way to execute.

While SQL Server evaluates the statistics based on the contents of the library, it also depends on the hardware resources currently running, and sometimes it thinks that the optimal scheme may not be supported by the current hardware resources, such as memory limit, CPU limit, IO bottleneck, etc., so the execution plan should depend on the underlying hardware.

When SQL Server discovers that a processing data set is larger and consumes more resources, but when there are multiple CPUs in the hardware, SQL Server attempts to use a parallel approach, splitting the dataset into several, and processing several threads at the same time to improve overall efficiency.

In SQL Server, you can set the number of CPUs available to SQL Server by using the following methods

The default SQL Server automatically selects the number of CPUs, which, of course, does not rule out certain scenarios, such as high concurrency in a production environment, to prevent SQL Server from monopolizing all CPUs, so it provides the interface for that configuration.

There is also a system parameter, which is known as the MAXDOP parameter, and can also change the configuration of this system parameter, which can also control the number of parallel per operator (remember: here is each operator, not all), let's look at the parameter

The number of settings for this parallel operator, which specifies the maximum number of parallelism per operator, so sometimes we look at the DMV view sys.dm_os_tasks, which looks at the number of system tasks, and most likely see the amount of thread data that is greater than the degree of parallelism, which means that the thread data may exceed the degree of parallelism. The reason is that two operators have re-partitioned the data and are assigned to different threads.

It is recommended to use the default setting best if there is no special case.

Let's take a group example to understand the parallel operation

There are several advantages of improving performance by parallel operation:

Does not depend on the number of threads, automatically add or remove threads at run time, to achieve a performance optimal value under the condition of guaranteeing the normal throughput rate of the system
Able to adapt to tilt and load balancing, such as a thread running slower than other threads, the number of scans or runs of the thread will be automatically reduced, while other fast-running threads will correspondingly increase the number of tasks, so the total execution time will be reduced smoothly, rather than a thread blocking the overall performance.

Let's give an example to explain in detail

Parallel plan is generally applied to the large data volume table, small table using serial efficiency is the highest, so here we create a new test large table, and then insert some test data, we insert 250000 rows, the overall table more than 6500 pages, the script is as follows

--Create a new table, create a primary key, form a clustered index create TABLE BigTable (   [KEY] int,   DATA int,   PAD CHAR ($),   CONSTRAINT [PK1] PRIMARY KEY ([key])) go--BULK INSERT test data 250000 rows set NOCOUNT on DECLARE @i intbegin TRAN    SET @i=0 while    @i<250000    BEGIN       INSERT BigTable VALUES (@i,@i,null)       SET @[email protected]+1       IF @i%1000=0       begin          Commit TRAN          begin TRAN       endend    COMMIT Trango

Let's execute a script for a simple query

SELECT [Key],[data]from BigTable

Here for this query script, without any filter conditions, there is no need to adopt parallel scanning, because the way to get the data by serial scanning is faster than the parallel scan, so here is the clustered scan method, we add a filter to see

SELECT [Key],[data]from bigtablewhere data<1000

For this T-SQL statement that has a filter, here is the logical way of using parallel operations, the clustered index is also a parallel scan, because my computer is 4 logic CPUs (actually 2 physical cpu,4 threads), so here is the 4 thread parallel scan four times table, Each thread scans a portion of the data and then summarizes it.

This is a total of 4 threads, where thread 0 is the dispatch thread, which is responsible for scheduling all other threads, so it does not perform a scan, and thread 1 to thread 4 performs the scan of this 1000 line! Of course, there is less data, some threads are assigned 0 tasks, but the total number of scans is 4 times, so these 4 threads scan the table in parallel.

Perhaps the result above is relatively simple, and some thread tasks have not been allocated full, we have to find a relatively slightly complex statement

SELECT MIN ([DATA]) from BigTable

This execution plan is quite simple, we analyze from right to left, then execute as:

4 Parallel clustered index scans-->4 threads in parallel gets out before the minimum number of thread-to-execute 4 minimum-sum-of-execution stream aggregation gets the minimum value in 4 numbers-the output result item.

Then 4 threads, one stream aggregate per thread, gets the minimum number of current threads

Then, the four least-valued operator with the next "degree of parallelism" is aggregated into a single table

And then the next is the flow aggregation, from this 4 rows of data to get the minimum value, to output, about the flow aggregation we have described in the previous article

The above is a standard multi-threaded parallel operation process.

In the process above, because we are using the parallel clustered index to scan the data, 4 threads are basically apportioning the amount of the task, which means that each thread scans the amount of data basically equal, and below we put a thread in a busy state to see the SQL Server will not dynamically split the task into a few other busy threads.

We are here to add a large data scale, the script is as follows

SELECT [Key],[data],[pad] into Bigtable2from BigTable

Let's write a query with a large number of statements, make it occupy a thread, and here we force the designation to run with only one thread

The above code want to run out of results, I this computer configuration estimate less than five minutes, and we also force serial operation, speed imaginable,
We then execute the above statement to get the minimum value to see the execution plan

SELECT MIN ([DATA]) from BigTable

In the execution plan, we see the number of threads scanned by the clustered index

As you can see, thread 1 has reduced the number of data by nearly four, and the amount of data scanned is increased sequentially from thread 1 to thread 4.

The above statement clearly specifies that MAXDOP is 1, which in theory can only affect one thread, and why are these threads affected? In fact, the reason is very simple, my computer is the physical CPU only two cores, so-called thread number is only hyper-threading, so the non-traditional sense of the true 4-core number, so the threads are mutually affected.

Let's look at an example of a parallel join operation, and we'll look at how a parallel nested loop leverages the resources

SELECT B1. [KEY],B1. Data,b2. DATA from BigTable B1 JOIN BigTable2 b2on B1. [KEY]=B2. [KEY] WHERE B1. data<100

In the above statement, we have a clustered index in the key column in the BigTable, and the data column does not exist in the query condition, so this is definitely a clustered index scan

Look at the execution plan.

We analyze the process in turn, combining the execution plan analysis of the text with more accurate, left-to-right analysis

The first step is to use the full table through the clustered index scan to get the data, because here the parallel clustered index Scan, we look at the number of parallel threads and the number of scans

Four thread scans, where thread 3 gets 100 rows of data.

The 100 rows of data are then reassigned, where each thread is assigned an average of 25 rows of data

In this case, the results we are going to get are all divided into 4 threads, each thread is assigned 25 rows of data, and the next step is to give the nested loop connection, because we need to get the data rows from BigTable2 in the above statement, so we have chosen a nested loop, Scan BigTable2 to get the data in turn.

For Nested loop join operators, you can refer to my second article.

We know that this is the number of loops on the outside, which means that there are 4 threads executing nested loops in parallel. If the threads of each line is 25 rows, the internal table will be executed.

4*25=100 times.

Then, after the nested scan gets the results, the next step is to summarize the results of each thread execution through the parallel operators and then output

The above procedure is the execution flow of a parallel nested loop. Fully utilize the hardware resources of the four core.

Reference documents

Microsoft Books Online logical operators and physical operator references
Refer to the book SQL. server.2005. Technical Insider "series

Conclusion

This article first to this bar, the article is short, easy to understand, follow-up on the parallel operation of a part of the content, follow-up articles, this article mainly introduces the parallel operators in the query plan, next we will add some of the SQL Server parallel operations, Then analysis of our daily write additions and deletions to these operators to optimize the items, interested in advance attention, about the SQL Server performance tuning of the content is very broad, followed by the analysis in the article.

If you have any questions, you can leave a message or private messages, and look forward to an in-depth study of your child's shoes with SQL Server. Learn together and progress together.

At the end of the article, we give a link to the previous article

SQL Server Tuning Series Basics

SQL Server Tuning Series Basics (Summary of common operators)

SQL Server Tuning Series Basics (Union operator summary)

SQL Server Tuning Series basics (Parallel operations Summary)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More