What is parallelism?
We have heard from a young age, "more people, more power," "Good people," and so on, the core of the idea is to assign a task to many people, so that everyone only need to do a few things to complete the task. More importantly, if the extra person is specifically responsible for assigning jobs, the time to complete the task can be significantly reduced.
Count the sugar Beans
Imagine you're facing a jar filled with all kinds of sugar beans and asking for the number of books. Let's say you can count up to five per second, more than 10 minutes to count the 3,027 sugar beans in this box.
If you have four friends to help you do this task. You have a variety of strategies to arrange this task, so let's emulate the strategy that SQL Server will take to accomplish this task. You and 4 friends sit around a table, candy box in the center, with a spoon out of the box to take out the sugar beans to the people to count. Each friend also has a pen and paper to record the number of sugar beans.
Once a person loses and the box is empty, they give you their paper. When you collect each person's count, and then add all the numbers together is the number of sugar beans. The task is finished. About 1-2 minutes, the efficiency of the completion increased by more than four times times. Of course, four people accumulate is also about 10 minutes or more (because more out of the distribution and accumulation process). This task is a good demonstration of the advantages of parallelism, and there is no additional work to be done.
Using SQL Server to complete the "number of sugar beans"
Of course, SQL Server doesn't count the sugar beans in the jar, so I'll let it count the rows in the table. If the table is small then execute plan 1:
Figure 1 Serial execution plan:
This query plan uses a single process, as if it were a number of sugar peas. The plan itself is simple: the stream aggregation operator is responsible for counting the number of rows received from the index scan operator, and then counting the total row count. Similarly, if there are very few sugar beans in the box, the time to allocate the sugar beans will be much lower, but the statistical steps will be less efficient, because the amount of time is much higher relative to the large number of sugar beans. So when the table is large enough, the SQL Server optimizer can choose to add more threads to execute plan 2:
Figure 2 Parallel counting plan
The yellow arrow icon in the three operator on the right indicates the introduction of multithreading. Each thread is assigned a portion of the work, and then the sub-branch work is gathered together to become the final result. As with the example of a previous manual sugar bean, a parallel plan is likely to increase the speed of completion because multithreading is better at counting.
How does parallelism work?
Imagine if SQL Server does not have built-in support for parallelism. Perhaps we can only manually divide parallel queries to achieve performance optimizations, and then run the allocated streams separately to access the server independently.
Figure 3 Manually allocating parallelism
Each query must write a separate query that separates the rows of the table to ensure that all table data is queried. Fortunately, SQL Server can complete each separate thread within a single processing unit, and then receive three partial result sets in about One-third of the time. Naturally, we also need extra time to combine three result sets.
Execute multiple serial plans in parallel
Recall the parallel query plan shown in 12, and then assume that SQL Server allocates three additional threads to query at run time. In summary, regenerate the parallel plan to show that SQL Server runs three separate serial plan flows (this means I'm not very precise about what I'm doing.) )
Figure 4: Multi-serial scheduling
Each thread is assigned one of three branch, and finally converges to the gather Streams (stream aggregation) operator. Note that there is only a stream aggregation operator with a yellow parallel arrow in this diagram, so this operator is the only operator in this plan that interacts with multithreading. There are two reasons why this generic strategy is appropriate for SQL Server. First, all the necessary execution of the serial plan SQL code already exists and has been optimized for many years and published online. Second, the orientation of the method is appropriate: If more threads are called, SQL Server can easily add additional plans to allocate more threads.
The number of additional threads is assigned to each parallel plan, which is referred to as the degree of parallelism (abbreviated to DOP). SQL Server chooses DOP before the query starts, and then changes the degree of parallelism without having to schedule a recompile. Maximum DOP for each parallel region is determined by the number of logical processing units available in SQL Server (physical core)
Parallel scan and Parallel page support
The problem in Figure 4 is that each index scan operator will go through each row of the entire input set. If not corrected in time, the plan will produce the wrong result set and may take more time. The manual parallel example avoids this problem by using the WHERE clause.
SQL Server does not use the same method because the allocation work assumes that each query receives an equal amount of available resources evenly, and that each data row requires the same processing. In a simple example, such as counting the number of rows in a table, this assumption may work well (when there is no other activity on the same server), and the query that three queries may return is also completely equal.
Unlike assigning a fixed number of rows to each thread, SQL Server uses the function of the storage engine called "Parallel page Supplier" to allocate the number of rows to the thread. In the query plan is not see "Parallel page Supplier", because it is not part of the query processor, but we can expand the figure 4来 image of the display of his connection mode:
Figure 5:parallel Page Supplier
The key point here is the demand-based (demand-based) architecture, which provides a batch of rows in response to out-of-the-box requests to threads that need more work. In contrast to the case of the sugar bean, Parallel Page Supplier is like the process of using a spoon to pull the beans out of the jar. There is only one spoon to prevent two people from going to count the same beans. and other threads will have more beans to compensate.
Note the use of parallel Page Supplier does not prevent existing optimizations like pre-read scans (reading data in advance on the hard disk). In fact, this pre-reading is more efficient in this case than a single thread, which is the underlying physical scan instead of the three separate manual parallel examples we saw earlier.
Parallel Page Supplier also does not restrict index scanning; SQL Server uses it to read a data schema in conjunction with multithreading. The data schema may be a heap, a clustered index table, or an index, and the operation can be a scan or a lookup. If the latter (lookup) is more efficient, consider an index lookup operation like a partial scan, such as it can find the first qualifying row and then sweep the end of the face range.
Execution context
Similar to the manual parallel example, SQL Server uses a lightweight construct called the "execution context" to implement parallelism in parallel with the creation of a standalone connected serial query.
An execution context comes from a part of the query plan, which is generated by filling in the details after the plan is recompiled and optimized. These details include reference objects (such as temporary tables in batches) and run-time parameters, and local variables, until they are run. This is not the start of the talk, Microsoft's white paper due to the detailed introduction.
SQL Server runs a parallel plan by deriving a DOP execution context for each of the parallel regions of the query plan, using a separate thread to run the part of the serial plan that is contained in the context. To help understand the concept, the three execution contexts are shown in Figure 6, and each color distinguishes the scope of the execution context. Although not clearly shown, a parallel Page Supplier is used to coordinate index scans to avoid repeated reads.
Figure 6: Parallel plan execution context
To more specifically observe the abstract concept, Figure 7 shows the information contained in a parallel row count query, in the SSMS option, "Actual Execution plan" (actual execution plan), open the left extension +.
Figure 7: Parallel plan row count
Two picture contrast, the line processing number one is 31 is 113443. The information comes from the Properties window, by clicking the operator (or link line) and then pressing F4, or right-click the property. Right-click the operator or line and select the properties of the popup menu.
In the illustration on the right we can see the number of rows per thread and Total row count, note that two threads handle a similar number of rows (around 40000), but the third thread value handles 32000 rows. As mentioned above, the demand-based architecture depends on the time factor of each thread and the processor load, and so on, in a timely manner is the light load of the machine will also have an imbalance phenomenon.
The figure on the left shows the process of three result nodes being collected together, summarizing the result set for each process. Its elements are the number of parallel threads of execution.
Schedulers, Workers, and tasks
This article has so far been consistent with the ' thread ' and ' worker ' understandings. Now we need to define more precisely, as follows.
Schedulers
A scheduler represents a logical processor in SQL Server, or a physical CPU, perhaps a processing core, or one of several hardware threads running on one core (Hyper-threading). The main purpose of the scheduler is to allow SQL Server to precisely control thread scheduling, rather than relying on the Windows operating system's generic algorithm. Each scheduler ensures that only one coordinated execution thread is running (as far as the operating system is concerned) within a specified time. The important benefit of this is that the context switch is reduced and the number of calls to the Windows kernel is reduced. The three parts of the serial cover the internal details of task scheduling and execution.
About task scheduling can be viewed in the DMV (sys.dm_os_schedulers).
Workers and Threads
A SQL Server worker thread is an abstract representation of a single operating system thread or an optical fiber. Few systems run fiber-mode task scheduling, so most documents use a work thread to emphasize that for most practical purposes, a worker is a thread. A worker thread binds a specific schedule. Information about worker threads can be viewed through dmvsys.dm_os_workers.
Tasks
You can define tasks like this:
A task represents a unit of a thread that is dispatched by SQL Server. A batch can map one or more tasks. For example, a parallel query will be executed by multiple tasks.
Extending this simple definition, a task is a job that is run by the SQL Server worker thread. A batch containing only one serial execution plan is a single task and will be executed by a single-connection-provided thread (from start to finish). In this case, the execution must wait for another event (such as reading from the hard disk) to complete. A single thread is assigned a task and then cannot run other task cells until it is fully completed.
Execution context
If a task describes the work done, an execution context is where the work takes place. Each task runs within an execution context and is identified in the exec_context_id column in Dmvsys.dm_os_tasks (you can also see the execution context using the Ecid column in the sys.sysprocesses view)
Interchange operator
Briefly, we have seen SQL Server execute a parallel plan by executing multiple instances of a serial plan concurrently. Each serial plan is a separate task that runs its own thread independently within its own execution context. Eventually, the result of these threads becomes the component of the interchange operator, which is to connect the execution context of the parallel plan. In general, a complex query plan can contain multiple serial or parallel regions that are connected by a swap operator.
So far, we have seen that there is only one form of join operator called Stream aggregation, but it can continue to appear as two other evolutionary forms:
Figure 8: Switching logic operators
These forms of exchange operators are moving rows within one or more threads, assigning separate rows to multiple threads. Operators of different logical forms either introduce new serial or parallel regions, or assign redirected rows to interfaces in two parallel regions.
Not only can split, merge, redirect rows on multi-threaded, but also can do the following things:
Use v different policies to determine the route of the output input line.
You can keep the order of the input rows, if desired.
Much of this flexibility stems from its internal design, so we'll look at that first. The flexibility comes from its internal design, so we have to look first
Interchange operator Interior
There are two completely different sub-components of the interchange operator:
Producer, connecting threads on the input side
Consumers, connecting threads on the output side
Figure 9 shows a magnified view of a stream aggregation operator (Figure 6)
Figure 9: Flow Aggregation internal construction
Each producer collects its input lines and wraps the inputs into one or more in-memory caches. Once the cache is full, the producer pushes it into the consumer. Each producer and consumer runs the same thread as its connection execution context (as if the color of the connection implies). The consumer's switch operator reads a row of data from the cache when it is required by the parent operator (as in this example, the red Shadow data stream aggregation).
One of the main benefits is that complexity is typically related to multiple threads that share data, and these threads are handled by an internal operator of SQL Server. In addition, the non-commutative operators in the plan are fully serially executed and do not need to be concerned about these issues.
The swap operator uses caching to reduce overhead, and in order to achieve control of the basic kind of flow (for example, to prevent fast producers from being too much faster than slow consumers). Allocate buffers precisely, as the different buffers of the exchange change, whether or not you need to preserve the order, and decide how to match the data rows of producers and consumers,
Route Line
As mentioned above, a switching operator can determine which particular row data a producer should match. This decision relies on the type of chunking specified by the interchange operator. And there are five optional types,
Type |
Describe |
Hash |
Most commonly, consumers are selected by calculating a hash function on one or more columns of the current row. |
Round robin |
Each new row is sent to the next consumer according to a fixed sequence |
Broadcasting |
Each line is sent to all consumers. |
Request |
Each line is sent to the consumer of the first request. This is the only type of segmentation that travels through the consumer's internal exchange Vlasov. |
Range |
Each consumer is assigned a non-overlapping range value. The specific input columns are divided into ranges that determine which rows the consumer obtains. |
The request and range split types are more uncommon than the previous three, and are generally visible only in query plans that manipulate partitioned tables. The request type is used to collect the partition's connection to assign the partition ID to the next worker thread. For example, when creating a partitioned index using a range split type, if you want to find out which type to look for in a query plan:
Figure 10: Swap operation split Type
Keep Input Order
An interchange operator can select a configuration to preserve the sort order. When the rows entered in the plan are sorted, they are useful for subsequent operators (follow the starting sort, or as a sorted sequence that is read from the index). If the interchange operator is not kept in the order, the optimizer will have to introduce additional sort operators after the exchanger needs to re-establish the sort. Normal request sort input operators include flow aggregation, fragmentation, and merge joins. Figure 11 shows a sort operation that requires a flow reassignment:
Figure 11: Re-allocation flow with reserved order
Note that the merge interchange itself does not sort, and it requires the input rows to be sorted. Merge interchanges are less efficient than non-reserved orders, and are subject to certain performance problems.
Maximum degree of parallelism
Official guidance from Microsoft:
650) this.width=650; "title=" image "src=" http://images2015.cnblogs.com/blog/306519/201608/ 306519-20160822151125870-116943825.png "alt=" image "Width=" "height=" 246 "border=" 0 "style=" margin:0px;padding:0 px;border:0px; "/>
Please follow these guidelines:
1. The server has 8 or fewer processors, using the following configuration where n equals the number of processors: Maxdop=0 to N.
2. For servers with NUMA configuration, the MAXDOP should not exceed the number of CPUs assigned to each NUMA node.
3. The MAXDOP value of Hyper-Threading enabled servers should not exceed the number of physical processors. The default is 0 to indicate that the database engine is self-allocating.
650) this.width=650; "title=" image "src=" http://images2015.cnblogs.com/blog/306519/201608/ 306519-20160822151128198-799324261.png "alt=" image "Width=" 497 "height=" 317 "border=" 0 "style=" margin:0px;padding:0 px;border:0px; "/>
Understanding and using parallelism in SQL Server