In-depth analysis of SQL Server parallel execution principles and practices (i)

Source: Internet
Author: User
Tags joins

In the mature leading enterprise database system, parallel query can be said to be a big weapon, in some scenarios he can significantly improve the corresponding time of the query, improve the user experience. such as SQL Server, Oracle, etc., MySQL is not yet implemented, PostgreSQL has implemented parallel scanning in 2015, believing they are also moving toward a more robust enterprise-class database. The implementation of parallel execution in RDBMS is probably the same, this article will be a detailed analysis of SQL Server in detail the principle of parallel execution and some practices.

Prepare knowledge

Hardware environment-before we dive into the principle of parallelism, we need some preparation knowledge to understand parallelism later. First of all, the hardware environment, social information construction, the popularization of the Internet, the development of hardware technology ... Although the performance of our hardware equipment is no longer the magic of Moore's law, but already quite rich, storage on the 15K rpm of the hard disk, SSD,PCI-E SSD makes the disk as the "ultimate" bottleneck of the database system has been a certain relief, and the memory of hundred g memory is not the "characteristics" of the small machine, PC server is already common. In the processing capacity, due to the limitations of the current physical technology, single CPU processing capacity to improve the difficulty, the system architecture has been moving towards a multi-core architecture, such as the 8-way CPU server has a certain use cases. In short, our hardware resources are more and more abundant, In this paper, the parallel query query is mainly for the above-mentioned multiple CPUs to work together to improve the Cpu-bound response time in SQL queries. (Although the response time of the query can actually be improved in parallel scanning (scan), this is not the case)

SQL Server related--schedulers,tasks,workers about "tasks". Schedulers refers to a logical CPU in a hardware environment identified by Sqlos in SQL Server, which is the basic unit for managing SQL Server compute work, and it provides "threads" (workers) for allocating the appropriate "CPU" resources for the calculation. Tasks are actually a unit of work in SQL Server, essentially a function pointer (c + +) that points to the code that needs to be executed. For example, LazyWriter is actually going to invoke the corresponding function (sqlmin! LazyWriter ()). Workers If tasks is a point of work, then Workers is handling the work that it binds to the Windows thread for calculation. OK, now say a SQL request SQL request=task+ The worker should be able to understand it. To deepen our understanding, we can look at Figure 1-1

Figure 1-1

Finally back to our parallel query, is actually multiple tasks on multiple schedulers running simultaneously, improve response speed!

About Sqlos, Microsoft introduced in SQL Server 2005, can be said to be a major breakthrough, like Michael Stonebraker (2014 Turing Award winner) The early paper "Operating System support for Database Management "as described in, Sqlos, more than the OS knowledge of databases, interested friends to search under their own. Confined to the space here is not in depth sqlos, give a simple architecture diagram you feel the next J 1-2

Figure 1-2

OK to prepare the knowledge, we go into parallel query execution bar

Parallel Related concepts

Serial execution Plan-before we understand the parallel execution, we understand the knowledge of the serial execution plan. In fact, it is very simple, about the implementation of the plan itself here is not introduced, just introduce the important elements related to this article.

A serial execution plan is actually done by a single thread (thread), a single execution context (execution context)

The execution context refers to some of the information that is used to execute in the execution plan, such as the object ID, such as the temporary table used in the execution plan, and so on. is actually a single (core) CPU at work in execution, 2-1

Figure 2-1

The parallel execution plan, as the name implies, is that multiple (core) CPUs are working simultaneously to improve the response time of the Cpu-bound , which has multiple threads, multiple execution contexts, but also consumes more resources. But for disk Io,sql The server considers wood helpful, and we look at it through a simple example:

Note: The AdventureWorks used in the article can be searched and downloaded on the Internet.

---serial execution

Select COUNT (*)

From dbo.bigtransactionhistory option (MAXDOP 1)

--Parallel execution

Select COUNT (*)

From Dbo.bigtransactionhistory option (MAXDOP 2)

By observing the execution plan of the two simple statements above, it can be found that the estimated subtree cost (resource consumption) is actually a single CPU compared to a dual CPU only half the CPU is estimated, and the IO estimate is unchanged. 2-1-a

Figure 2-1-a

Let's really go into the parallel execution plan, where each parallel execution plan has a primary exchange operation (root Exchange), the leftmost gather stream operator in the graphics execution plan (as described later) (SQL Server has a graphical execution plan, no small partner, I envy you, J. And all parallel execution plans have a fixed serial part-all the left-hand part of the leftmost gather stream operator, which contains all the actions of the consumer in the gather, followed by the production-consumption model, which is controlled by the main thread thread zero. At the same time its execution context is also shown in context Zero,2-2

Figure 2-2

Parallel regions, as the name implies, are all right-side parts of root exchange, while parallel parts may have multiple branches

(Branches), each branch can be executed at the same time (the branch has its own tasks), and the branch itself can be either parallel or serial. However, the branch does not use the main thread thread zero. About the branch you can look at the relevant execution plan properties using the following statement (SQL Server 2012 and later versions can be displayed) 2-3

2-3-2, my maximum degree of parallelism is set to 4, there are three branches, and the number of threads I use here is

4*3=12, plus a main thread thread zero this parallel query I used a total of 13 threads.

SelectA.productid,count_big (*) asrows fromDbo.bigproduct asaInner  JoinDbo.bigtransactionhistory asb onA.productid=B.productidwhereA.productidbetween   +  and  theGroup  byA.productidOrder  byA.productid

Figure 2-3-1

Figure 2-3-2

While parallel and serial differences are connected, you can understand the 2-4

The main thread in the serial is actually his threads of execution.

Figure 2-4

In Figure 2-3, the parallel table scan is aggregated, which is actually equal to two sequential table scan aggregations. This refers to the parallel table scan, which is slightly expanded: parallel scanning in SQL Server2005 and previous versions is actually

"Parallel page supplier" and the units scanned by each thread are data pages, which are scanned for how many rows of data are contained in the data page, while in SQL Server 2008 and later are "Range enumerators" The data scanned by each thread is actually determined by the interval distribution of the data, and now there is the "access_methods_dataset_parent" latch waiting that we see in 08 and later, and you may wonder how the high version brings new problems, in effect, in the "Para Llel page Supplier "The problem is also many, such as the snapshot isolation level of the data page scan confirmation problem, and so on, is always more efficient. It is important to note that the parallel scan only supports roll forward scanning (forward) and does not support reverse scanning (backward) Interested friends can test themselves according to the code below.

/************************* parallel scan related test *********************************/

Only forward scan can be parallel----parallel scan

 Use [adventureworks2008r2]GOSelectColorCOUNT(1) from [bigproduct]Group  byColoroption(Querytraceon8649)---Parallel scan only forwardGoCREATE nonclustered INDEX [Inx_color]  on [dbo].[bigproduct](         [Color] desc----ASC can parallel desc can not) with(drop_existing=  on) on [PRIMARY]---Modify Index Collation (ASC,DESC)GOSelectColorCOUNT(1) from [bigproduct]Group  byColoroption(Querytraceon8649)---Parallel scan only forward

/************************* parallel scan related test *********************************/

See here a lot of friends may have doubts, the above is serial, is parallel, is more tasks, in the end when to use serial, when parallel, with how many tasks ah, actually SQL Server instance level has two settings: Parallel threshold "cost threshold for Parallelism "That is, when the subtree cost (the estimated resource consumption mentioned earlier) is greater than the setpoint, the optimizer will trigger a parallel execution plan. Two: Maximum degree of parallelism (max degree of parallelism) used to set the branch ( Branches) How many CPUs can work at the same time. Here's a simple picture for everyone. 2-4

Figure 2-5

Now that the basic concepts of parallel correlation have been introduced, let's go into parallel.


As the name implies, is the exchange, in parallel refers to the threads between the data exchange, which is only in parallel execution of the case will have the operator. There are three exchanges operations in SQL Server, gather Streams, repartition Streams, as well as distribute Streams its counterpart, is the aggregation, redirection, distribution of data between threads, operator 3-1 shows

Figure 3-1

In fact, each of the above operators is a "collection" of two operators. Here is the producer and consumer action we mentioned earlier. The right side of the operator is the producer, the generated data is put into packets, and the left is the consumer, and the data is removed from packets. 3-2

Figure 3-2-1

Figure 3-2-2

Here explains our common in SQL Server Cxpacket Wait, many students will see this wait as some threads in parallel work fast, some threads work slowly and the difference, this pair, but the argument is not professional, in fact, according to the producer Consumer Model analysis is:

Cxpacket Waits=class EXchange PACKET

For producers: All Buffer packets have been filled, unable to continue production (padding)

For consumers: All of the buffer packets are empty, no data can be consumed

In the producer and consumer model, the data is flowing through a certain set of rules from the producer to the consumer, and SQL Server has the following 5 rules

Broadcast: Broadcast, when the volume of data is small, all of the producer's data is broadcast to all consumers. Ali's friends in the Distributed middleware small table broadcast should have a lot of applications, right J

Hash: A hash function will make the data one or more fields Hashiha (simple as modulo), according to different values to fill the different buffer packets for consumers to use.

Round Robin, as the name implies, adds data sequentially to each buffer packets.

Demand before the data is sent, and this is the consumer pull data from the producer, such as a consumer pull a constant data, this is only used in the partition table.

Range, as the name implies, is populated with different buffer packets based on the distribution of certain fields in the data, which is generally used in parallel indexing and statistical information reconstruction.

With the use of data by producers and consumers, what do you think it is? Yes, is the data ordered, and in SQL Server, Exchange merges Exchange and Non-merge exchange separately, The corresponding data in the exchage when the requirements of order, of course, everyone knows the orderly cost of merge exchange is significantly higher than the disorder

3-3 is shown

Figure 3-3

Parallel joins

Finally let's talk about the parallel join.sql server in SQL Server for three basic join:nested loop joins, merge join, hash join support, we illustrate their pros and cons, in the future of parallel optimization, You might use it. Here I don't repeat the basic algorithm of the three kinds of join links, do not understand the students under the Wikipedia

Parallel Merge Join 4-1

The key (two tables) participating in the join IS hashed and matched at the same time.

Parallel merge joins have almost no advantages, there are many drawbacks, the first is the need for merge exchange, the second increase in the CPU itself is not very useful for performance (refer to the corresponding algorithm), and the merge Join also increases the chance of a parallel deadlock. So we should try to avoid the merge join in the actual production.

Figure 4-1

Parallel Hash Join

Build phase, if the amount of data is small, all broadcast to all threads 4-2-1, otherwise the join key is hashed, matched, 4-2-2

It should be noted here that the parallel hash join is non-merge exchanges, and as the CPU increases, the performance is linearly extended (specific reference to the corresponding algorithm), but in the Parallelhash join it should be noted that due to statistical information problems, The hash join resulting from the complex join is constructed, and the probing phase is likely to overflow due to insufficient memory, thus slowing down the query. About overflow can use tracking, extended event snapping, and so on.

Figure 4-2-1

Figure 4-2-2

Parallel nested loop Join

Appearances multiple threads run simultaneously (such as scan data), while the inner table is serially matched on each thread.

Parallel nested loop join can solve many tricky problems for us in real production, such as reducing exchange usage in parallel plans, using less memory, and so on. But it should also be noted that due to the skewed data distribution, nested Loop pre-reading and other conditions caused a limited performance increase, but the consumption is compared to the serial increase and so on. and SQL Server optimizer itself is "do not like parallel loop nesting", sometimes we need specific wording to achieve him. 4-3

Figure 4-3

Finally, we will mention the bitmap filter, SQL Server is not a bitmap index, this is also quite criticized, but the actual SQL in the implementation of the possibility of using the map to filter.

Simply say SQL Server bitmap filtering implementation method (specifically, you can also Wikipedia bloom filter)

Implementation: By constructing a bit array of length x (bit array) (all bits of 0), the set to be matched is mapped to the corresponding point in the array by a hash function (corresponding bit is 1), and the corresponding bit in the bit array is 1 when judging whether a value exists. This process is done by the SQL Inside the server itself. 4-4-1, figure 4-4-2 I hash an existing array and search for "Goku ..." in it. is in the array

Figure 4-4-1

Figure 4-4-2

Well, SQL Server related to the parallel knowledge to introduce so many, the back of time for everyone to bring relevant practical applications.

Read more about inside SQL Server

Public number Insidesqlserver and my Sina Weibo @shanksgao

In-depth analysis of SQL Server parallel execution principles and practices (i)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.