SQL Server parallel operation learning summary tutorial

Source: Internet
Author: User
Tags commit join logical operators memory usage

Technical preparation

Similar to the previous articles, based on SQL Server2008R2, Microsoft uses a simpler case Library (Northwind) for parsing.

I. Parallel operators

In our daily written T-SQL statement, not all the optimal execution plan is the same, the formation of its optimal execution plan needs to be evaluated in many aspects, most of them are based on the statistical information generated by the SQL Server itself, and then the multiple execution plans are evaluated to select the optimal execution method.

When the SQL Server evaluates the statistics based on the library content, it also needs to refer to the currently running hardware resources. Sometimes it deems that the optimal solution may not support the current hardware resources, for example: memory restrictions, CPU restrictions, IO bottlenecks, etc. Therefore, the performance of the execution plan depends on the underlying hardware.

When SQL Server finds that a processed dataset is large and consumes a large amount of resources, but the hardware has multiple CPUs, SQL Server tries to use parallel methods, splits a dataset into several threads for simultaneous processing to improve overall efficiency.

In SQL Server, you can set the number of available CPUs of SQL Server as follows:

 

 

By default, SQL Server automatically selects the number of CPUs. Of course, in some cases, such as in a high-concurrency production environment, this configuration interface is provided to prevent SQL Server from occupying all CPUs exclusively.

Another system parameter is the well-known MAXDOP parameter. You can also modify this system parameter configuration. This configuration can also control the parallel number of lines for each operator (remember: Here is the number of lines for each operator, instead of all), let's check this parameter.

 

 

This number of parallel operators specifies the maximum number of parallel lines for each operator, so sometimes we use the DMV view sys to view the number of system tasks. dm_ OS _tasks is used to view the data volume of threads that are larger than the degree of parallelism. That is to say, the data volume of threads may exceed the degree of parallelism, because the two operators re-divide the data and allocate it to different threads.

Unless otherwise specified, we recommend that you use the default settings.

Let's take a grouping example to understand parallel operations.

 

 

Parallel computing has the following advantages to improve performance:

Independent of the number of threads, threads are automatically added or removed at runtime to achieve an optimal performance value while ensuring the normal throughput of the system.

It can adapt to skew and load balancing. For example, if a thread runs slowly than other threads, the number of jobs to be scanned or run by this thread will automatically decrease, other fast-running threads increase the number of tasks accordingly, so the total execution time will be reduced smoothly, rather than a thread blocking the overall performance.

The following is an example.

Parallel plans are generally used in tables with large data volumes. Small tables use the highest serial efficiency. So here we create a large test table and insert some test data. We insert 250000 rows, the overall table contains over 6500 pages. The script is as follows:

 

-- Create a table and create a primary key to form a clustered index

Create table BigTable

(

[KEY] INT,

Data int,

Pads CHAR (200 ),

CONSTRAINT [PK1] primary key ([KEY])

)

GO

-- Insert 250000 rows of test data in batches

SET NOCOUNT ON

DECLARE @ I INT

BEGIN TRAN

SET @ I = 0

WHILE @ I <250000

BEGIN

INSERT BigTable VALUES (@ I, @ I, NULL)

SET @ I = @ I + 1

IF @ I % 1000 = 0

BEGIN

COMMIT TRAN

BEGIN TRAN

END

END

COMMIT TRAN

GO

 

Let's execute a simple query script.

SELECT [KEY], [DATA]

FROM BigTable

 

 

If such a query script does not have any filtering conditions, parallel scanning is not necessary, because the speed of data obtained by serial scanning is faster than that obtained by parallel scanning, so here we use the clustered scan method. Let's add a filter condition.

SELECT [KEY], [DATA]

FROM BigTable

Where data & lt; 1000

 

 

For this screening condition of the T-SQL statement, here SQL Server decisive adoption of parallel operations, clustered index is also parallel scanning, because my computer has four logical CPUs (two physical CPUs and four threads), here we use four threads to scan the table four times in parallel, and each thread scans part of the data, then summarize.

 

 

A total of four threads are used here. Thread 0 is the scheduling thread and is responsible for scheduling all other threads. Therefore, it does not perform scanning, thread 1 to thread 4 executes the scan of these 1000 rows! Of course, there is a small amount of data here. Some threads allocate 0 tasks, but the number of scans is always four, so these four threads scan the table in parallel.

The above results may be relatively simple, and some thread tasks are not fully allocated. We will look for a relatively complex statement.

Select min ([DATA])

FROM BigTable

 

 

This execution plan is quite simple. We analyze the plan from the right to the left, and execute it as follows:

4 parallel clustered index scans --> 4 threads get the minimum number of the current thread in parallel --> execute 4 minimum number summaries --> execute stream aggregation to get the minimum value of 4 --> output result items.

 

 

Then there are four threads, and each thread gets the minimum number of the current thread by one stream aggregation.

 

 

Then, the four minimum values are aggregated into a table through the next "parallelism" operator.

 

 

The next step is stream aggregation. Obtain the minimum value from the four rows of data and output the data. We have already introduced stream aggregation in the previous article.

 

 

The above is a standard multi-thread parallel operation process.

In the above process, because we use parallel clustered index to scan data, the four threads basically spread the task volume evenly, that is to say, the data volume scanned by each thread is basically equal, next we will keep a thread busy to see if SQL Server will dynamically spread the task to several other non-busy threads.

Let's add a big data table. The script is as follows:

SELECT [KEY], [DATA], [PAD]

INTO BigTable2

FROM BigTable

Let's write a query with a large number of statements to occupy one thread, and we force specify to run only one thread here

Select min (B1. [KEY] + B2. [KEY])

FROM BigTable B1 cross join BigTable2 B2

OPTION (MAXDOP 1)

If the above code is to run out of the results, it is estimated that the computer configuration is less than five minutes, and we also force serial operations, the speed can be imagined,

We then execute the above statement to get the minimum value and view the execution plan

Select min ([DATA])

FROM BigTable

In the execution plan, we can see the number of threads for clustered index scanning.

 

 

As you can see, thread 1 has reduced the number of data points by nearly four, and from thread 1 to thread 4, the amount of data scanned increases sequentially.

The preceding statement clearly specifies that MAXDOP is 1. Theoretically, it may only affect one thread. Why are these threads affected? In fact, the reason is very simple. My computer has only two physical CPUs, and the so-called number of threads is only Hyper-Threading. Therefore, the non-traditional sense is the true number of 4 cores, so threads are mutually affected.

Let's look at an example of parallel connection operations. Let's look at how parallel nested loops use resources.

SELECT B1. [KEY], B1.DATA, B2.DATA

FROM BigTable B1 JOIN BigTable2 B2

ON B1. [KEY] = B2. [KEY]

WHERE B1.DATA <100

In the preceding statement, the Key column in BigTable has a clustered index, but the DATA column in the query condition does not exist. Therefore, we must scan the clustered index to find the DATA.

View execution plan

 

 

This process is analyzed in sequence, and the execution plan analysis based on the text is more accurate, from the right to the left

 

 

The first step is to use the full table to obtain data through clustered index scan. Because the parallel clustered index scan is used here, we can see the number of parallel threads and scans.

 

 

Four thread scans. Here thread 3 obtains 100 rows of data.

Then, the 100 rows of data are reallocated to the thread, where each thread is evenly allocated to 25 rows of data.

 

 

At this point, the results we want to obtain have been divided into four threads for joint execution. Each thread has allocated 25 rows of data. The next step is to hand over the nested loop connection, because the preceding statement needs to obtain data rows from BigTable2, a nested loop is selected here to scan BigTable2 to obtain data in sequence.

For nested loop join operators, refer to my second article.

 

 

We know this is the number of external loops, that is, four threads execute nested loops in parallel. If each thread is evenly divided into 25 rows, the internal table is executed.

4*25 = 100 times.

 

 

Then, after the execution is complete and the nested scan obtains the result, the next step is to summarize the results of each thread through parallel operators and then output

 

 

The above process is a parallel nested loop execution process. Fully utilizes quad-core hardware resources.

References

Logical operators and physical operators in Microsoft books online

Reference books: SQL. Server.2005. technical insider series

Conclusion

This article is here first. It is a little short and easy to understand. There is still some content about parallel operations in the future. It will be added in subsequent articles. This article mainly introduces parallel operators in the query plan, in the next article, we will add some parallel operations in SQL Server, and then analyze the optimization items for adding, deleting, and modifying these operators that we have written on a daily basis. If you are interested, please pay attention to them in advance, SQL Server Performance tuning involves a wide range of content, which will be analyzed in subsequent articles.

If you have any questions, you can leave a message or send a private message. We are always waiting for you to join SQL SERVER for further research. Learn together and make progress together.

 

Tutorial 2

Technical preparation

Similar to the previous articles, based on SQL Server2008R2, Microsoft uses a simpler case Library (Northwind) for parsing.

Content

Before the article starts, let's review the parallel operations introduced in the previous article. Let's take a look at the parallel operation statements introduced at the end of the article:

SELECT B1. [KEY], B1.DATA, B2.DATA
FROM BigTable B1 JOIN BigTable2 B2
ON B1. [KEY] = B2. [KEY]
WHERE B1.DATA <100

 

 

The above is a detailed execution plan, which is executed from the right to the left. One of the areas in the figure above is interesting, that is, the data obtained after the clustered index scan, the process of re-allocating the task again

(Repartition Streams), that is, the figure above will re-allocate the obtained 100 rows of data to various parallel threads.

In fact, the 100 rows of data scanned by the index can be directly thrown into the nested loop for execution. It re-allocates the task to execute the nested loop in parallel to maximize the use of hardware resources!

But this introduces another drawback: after executing the nested loop, you need to resummarize the results, that is, the following (Gather Sreams) operator.

Let's take a look at the execution plan of this statement without parallel execution.

SELECT B1. [KEY], B1.DATA, B2.DATA
FROM BigTable B1 JOIN BigTable2 B2
ON B1. [KEY] = B2. [KEY]
WHERE B1.DATA <100
Option (maxdop 1)

 

 

This is an authentic serial execution plan.

Compared with the preceding parallel execution plan, you will find that SQL Server makes full use of hardware resources to form a parallel plan, isn't it very handsome!

If you do not feel the charm of the SQL Server parallel execution plan, let's take an example and look at the following statements:

 

 

 

SELECT BIG_TOP. [KEY], BIG_TOP.DATA, B2.DATA
FROM
(
Select top 100 B. [KEY], B. DATA
FROM BigTable B
ORDER BY DATA
) BIG_TOP,
BigTable2 B2
WHERE BIG_TOP. [KEY] = B2. [KEY]

 

First, let's analyze the preceding statement. We have added the TOP 100... order by data keyword to the external table. This keyword is very interesting ....

Because we know that this statement is to obtain the sort by the DATA keyword, and then obtain the meaning of the first 100 rows...

1. Sort DATA by DATA? Which of the following threads arrange their own? Then you have arranged them all together... isn't it necessary to sort them again !!

2. How can I obtain the first 100 rows of data through multiple threads? Suppose I have four threads scanning each thread to obtain 25 data records? Is the result correct?

3. Our goal is to establish a parallel nested loop connection between the external table and the above 100 rows of data, because we can make full use of resources. How can we achieve this?

The above problems show how powerful SQL Server will generate a powerful execution plan for us.

 

 

The above execution plan has solved the three problems described above. We will analyze the solutions to these problems in sequence.

The first question is about parallel sorting.

It is preferred to obtain data from the table by means of clustered index scanning in parallel.

 

 

Then, sort the data in various threads in parallel and obtain the values of the first few columns. We know that our target is to obtain the first 100 rows, it obtains redundant data, that is, each thread sorts its own data.

Then obtain the previous data and exchange it in the round robin mode to obtain part of the data.

 

 

The second question is about parallel acquisition of the first 100 rows of data.

We know that to obtain the first 100 rows of data, we must summarize the data of each thread and obtain the first 100 rows of data through comparison. This is required, in this step, the SQL Server re-aggregates the data together.

 

 

The third problem is that the next step is to connect the 100 rows of data to the external table to obtain the result. The nested loop connection method is used here. In order to make full use of resources and improve performance, SQL Server has to evenly Distribute the 100 rows of data to each thread for execution. Therefore, a split task operator is used to Distribute the flow (Distribute Sreams) task.

 

 

Therefore, after this step, the hardware resources of the system are fully utilized. Then, the next step is to associate the nested loop to obtain the results, resummarize the results, and then output

 

 

We can see that the above process goes through SQLServer: first split (parallel scan) -- and then parallel (get TOP 100 ....) -- split (for parallel nested loops) -- Parallel (for merging results)

In short, SQL Server uses various splits and summaries after various evaluations when running statements to fully utilize hardware resources and achieve optimal performance! This is the essence of SQL Server parallel operations.

Of course, everything has advantages and disadvantages. We use this statement to compare the advantages and disadvantages of serial and parallel SQL Server.

The following is a serial execution plan:

 

SELECT BIG_TOP. [KEY], BIG_TOP.DATA, B2.DATAz
FROM
(
Select top 100 B. [KEY], B. DATA
FROM BigTable B
ORDER BY DATA
) BIG_TOP,
BigTable2 B2
WHERE BIG_TOP. [KEY] = B2. [KEY]
Option (maxdop 1)

 

 

 

Execution Plans for serial execution: simple, atmospheric, no complicated splits, various summaries, and parallelism.

Let's compare the different items of the two, first compare each parameter value of a T-SQL statement:

 

 

The former is serial, and the latter is parallel.

CPU consumption for Serial compilation: 2. CPU consumption for parallel compilation: 10

Memory consumption for Serial compilation: 184; memory consumption for parallel compilation: 208

Serial compilation Time: 2. Parallel compilation Time: 81

The disadvantages of parallel processing are as follows: 1. CPU consumption, 2. Memory consumption for compilation, 3. Longer compilation time

Let's take a look at the advantages of parallelism:

Serial memory usage (1024) and parallel memory (448) in the figure above)

Advantage: parallel execution consumes less memory

Of course, there is also a more important advantage: faster execution!

 

 

Parallel execution is adopted, and the execution time is increased from 218 MS to 187 ms! The data volume is small, and the performance of my machine is poor, so the improvement is not obvious!

During the execution of parallel operations, there is also an operator that is frequently encountered: bitmap operator, which we will also introduce here

For example:

SELECT B1. [KEY], B1.DATA, B2. [KEY]
FROM BigTable B1 JOIN BigTable2 B2
ON B1.DATA = B2.DATA
WHERE B1. [KEY] <10000

Here we obtain data with the Key column less than 10000 rows in a large table.

 

 

The preceding execution statement introduces bitmap computing.

In fact, the goal of bitmap calculation is very simple: filter in advance, because we need to obtain more than 10000 rows of data in the results, we use parallel scanning in the following threads to obtain data. Because of the large amount of data, each thread has different time to obtain data during execution. To avoid the overall congestion caused by slow execution of a thread, the index introduces bitmap operations, first, filter some obtained results and output them to the previous hash match for complete execution.

For more information about bitmap operators, see: http://msdn.microsoft.com/zh-cn/library/bb510541

Conclusion

This article is a continuation of the previous parallel operation. The two articles introduce the principles and usage of parallel operations in SQL Server, this article is about parallel operations. Next we will add the SQL Server content about index utilization and dynamic indexes. I believe that many people familiar with database products are familiar with indexing, however, it may not be clear how some SQL Server statements use indexes. We will analyze this article in the next article to learn about the indexing method and optimization techniques. If you are interested, please pay attention to it in advance, SQL Server Performance tuning involves a wide range of content, which will be analyzed in subsequent articles.

If you have any questions, you can leave a message or send a private message. We are always waiting for you to join SQL SERVER for further research. Learn together and make progress together.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.