[Original]-Parallel Query & Bitmap

Source: Internet
Author: User
Degree of Parallelism (Degree of Parallelism)When a query is processed in parallel, SQL Server allocates multiple threads for the query, and each thread uses one CPU for operations. Degree of Parallelism is the number of threads allocated by SQL Server for parallel queries. It also shows the number of CPUs used for Parallel Processing in this parallel query. Exchange Oprators)In the execution plan of a query statement, parallel operations and serial operations are usually combined. Parallel operations require that the input data stream be divided into multiple (degree of parallelism) parts and allocated to each thread for parallel processing. Parallel queries include exchange operator, which is used to manage the execution of parallel plans. Distribute SteamsThe execution plan is displayed as Parallelism/Distribute Steams. Generally, if a serial operation is followed by a parallel operation, the parallel operation receives an input stream from the previous serial operation. Distribute Steams is an operation that distributes a single input stream to multiple output streams in parallel queries. For example, a serial table scan generates an output stream of 4000 records. If this table scan is followed by a parallel operation, a Distribute Steams operation is required between the two operations, distribute the input stream to each thread in parallel operation. If degree of parallelism is 4, SQL Server distributes the stream of this 4000 record into four roughly equivalent streams based on the keyword segment and serves as the input of the four threads respectively. After the Distribute Streams operation, records in each input stream will appear in an output stream, and the record content and format will not change. SQL Server automatically retains the relative sequence of records in the input stream in the output stream. Generally, Hashing is used to determine which output stream each input record is distributed. Gather StreamsThe execution plan is displayed as Parallelism/Gather Streams. After the parallel operation is completed, the output results are retained in multiple streams. Gather Streams collects multiple output Streams of parallel operations into a single stream. In the process of merging multiple Streams, the recorded content and format will not change. If all input Streams are sorted, the output stream of Gather Streams is also sorted. Redistribute Streams/Repartition StreamsThe execution plan is displayed as Parallelism/Repartition Streams. In a series of parallel operations, after a parallel operation, you may need to re-Combine each stream to enter the next parallel operation. For example, Merge Join and Hash Join both require two inputs. In the parallel plan of these operations, each parallel thread should have two independent inputs and execute the Merge Join and Hash Join Operations respectively. After the Parallel Merge/Hash Join operations, use the Gather Streams operation to obtain the complete Merge/Hash Join results. If there is a series of parallel plans before Merge/Hash Join, the stream owned by each parallel thread may not meet the "independent input" condition when the Merge/Hash Join is entered, in this case, you need to reorganize the streams that enter the Merge/Hash Join, so that the parallel threads can independently complete Part Of The Merge/Hash Join. Each record in the input stream will appear in one of the output Streams after the Repartition Streams operation. The record content and format will not change. If all input Streams are sorted, the output Streams of Repartition Streams are also sorted. ExampleIn this example, two tables are used: TblBuyerItem (UserID NVARCHAR (40) not null, OrgID NVARCHAR (40) not null, ItemID NVARCHAR (40) not null) clustered Index: OrgID and ItemID of the table. The number of records is 1.5 million. # Alert_asn_shipoverdue (org nvarchar (40) not null, item nvarchar (40) not null, vendor nvarchar (40) not null) This is a temporary table with no PK and no index, the number of records is 150. The executed SQL statement is as follows (Clustered Index of TblBuyerItem is not used): SELECT a. UserID
FROM TblBuyerItem
Inner join # alert_asn_shipoverdue B ON B. ORG = a. OrgID AND B. ITEM = a. ItemID
For example, if the OPTION (merge join) Execution Plan is used (if some steps are removed, click here to enlarge the execution plan ):
Part 1:

Part 2: In the execution plan, there is a small yellow circle in the lower right foot of the icon, and there are three parallel small arrows in it, it indicates that the operation is executed in parallel. The stream data stream in parallel processing is shown in. Note: The Source image is lost. Replace it with the figure found by MSDN.

In the example, when degree of parallelism is set to 3. The execution process is as follows: Execute table scan on # alert_asn_shipoverdue, and then according to the values of key column ORG and ITEM, for Parallel Merge Join operations, the # alert_asn_shipoverdue data is distributed to three streams, which are assumed to be numbered B1, B2, and B3. Each of the three threads creates a bitmap for the obtained stream and then sorts it. Next, the three threads run Clustered Index Scan on TblBuyerItem in parallel. After the operation is complete, each thread obtains an output stream, assuming the numbers are a1', A2 ', and a3 '. Currently, A1 ', A2', and A3 'cannot match B1, B2, and B3 into A1'-B1, A2 '-B2, and A3'-B3, perform Merge Join independently. Therefore, Repartition Streams is performed on a1', A2 ', and a3' based on the value of key column OrgID and ItemID, reassembles A1 ', A2', and A3 'For the next Merge Join and uses bitmap to filter records. Assuming that the restructured stream is numbered A1, A2, A3, at this time, the three threads use A1-B1, A2-B2, A3-B3 as the input respectively, execute the Merge Join operation. Before the Merge Join operation, both inputs must be sorted. Therefore, each stream has an Sort operation before it enters the Merge Join operation. Finally, the Gather Streams operation collects and merges records from the output Streams of the three threads to obtain the complete Merge Join result record set. Parallel query does not save memory and CPU resource overhead, because streaming distristri, Repartition, and Gather consumes more resources. However, Parallel query may save the query execution time. Bitmap)In the preceding example, you can see the Bitmap/Bitmap Create operation. In semi-TO-semi Join, the two tables may have a large number of mismatched records. In the preceding example, the # alert_asn_shipoverdue record is only 150, while the TblBuyerItem record is 1,500,000. If two tables are directly joined by Merge, the 1,500,000 record of TblBuyerItem must be sorted, this is a very costly operation. Of the 1,500,000 records, only a small number of records meet the matching conditions. Before the Join operation, Bitmap quickly filters data to reduce the overhead of the Join operation. Bitmap in Merge JoinThe preceding example shows that after the Parallelism/Distribute Streams operation, each thread obtains an input stream, and then each thread creates a bitmap for its own stream. Let's assume that the stream number allocated to thread 1 is B1. After a Bitmap is created, it contains a series of values of 0 and 1. The value of 1 indicates that it corresponds to this bit, and the stream has corresponding records; A bit of 0 corresponds to this bit, and no corresponding record exists in the stream. During the Parallelism/Repartition Streams operation, each record is retrieved cyclically from the input stream and the output stream to which the record should enter is determined first. Assume that a record is determined to be placed in the output stream numbered A1. This output stream A1 matches stream B1 and is allocated to thread 1, as two inputs of thread 1 Merge Join. Next, SQL Server uses this record to query the bitmap of stream B1 corresponding to output stream A1. If the corresponding bit value is 0, it indicates that this record cannot have a matching record in B1. Therefore, this record is ignored and will not be placed in output stream A1; if the corresponding bit value is 1, it indicates that this record may have a matching record in B1, and this record is placed in output stream A1, exact matching will continue in the subsequent Merge Join. It is still unclear about how to create bitmap. the conjecture should be roughly the following process: using the hash algorithm for operations, during the query optimization, the bitmap size is determined based on the number of possible unique values of the Join field ORG and ITEM in # alert_asn_shipoverdue. This process is similar to determining the number (and size) of Hash table buckes in hash Join. During execution, the bitmap size value is used to create a bitmap, and each bit is initialized as 0. Then, for the # alert_asn_shipoverdue loop, hash the combined values of each ORG and ITEM. The obtained hash value corresponds to a single bit in bitmap and sets this bit to 1. Based on the above bitmap creation process, when the Parallelism/Repartition Streams operation is performed, a record is determined to enter the output stream, and the bitmap corresponding to the stream can be found. For the value of this record OrgID and ItemID, use the same hash algorithm when bitmap is created, and use the obtained hash value to find the corresponding bit. Next, let's take a look at the effect of bitmap in the example (click to enlarge): When you scan TblBuyerItem, Row Count is 1,543,050. In the Repartition Streams operation, WHERE :( PROBE ([Bitmap1002]) = TRUE) indicates the use of bitmap, Row Count 3,138. In addition, we can see that the example SQL uses 8 CPUs for actual execution. Note: In the above example, # alert_asn_shipoverdue is very small, and the create and probe processes of bitmap are actually similar to Hash Join. In fact, when SQL Server automatically selects Join Type, it uses Hash Join (refer to [Join Type instance description] and uses the same example ). When Hash Join is used in the serial mode, it takes more than 4 seconds. The preceding example takes 2 seconds. When the preceding example does not use parallel processing (MAXDOP 1, it takes more than 1 minute and 30 seconds. In Merge Join, before the branch of outer input enters Merge Join, there must be an Sort operation before SQL Server can use bitmap. This is because the Sort operation completes the processing of the entire outer input branch, starts the execution of the inner input branch, and then enters the Merge Join operation. In this way, the bitmap created on the outer input branch can be used only when the inner input branch is executed. Without this Sort operation, SQL Server starts to process both the outer input branch and the inner input branch, so bitmap cannot be used. We can perform a verification in the preceding example. If # alert_asn_shipoverdue has a clustered index (ORG, ITEM), what will happen when the SQL statement in this example is executed? After the clustered index is created on (ORG, ITEM), the execution plan is shown (Click here to enlarge it): For the # alert_asn_shipoverdue Table Scan, it is changed to Clustered Index Scan, the resulting stream is sorted by ORG and ITEM. Currently, the stream before the Parallelism/Distribute Streams operation is sorted by ORG and ITEM. Therefore, the output Streams of Distribute Streams are also sorted by ORG and ITEM, therefore, the outer input branch does not need the Sort operation before Merge Join, so bitmap cannot be used. From the execution plan diagram, we can see that the inner input branch, after the Table Scan of TblBuyerItem, the size of the arrow remains unchanged until the Merge Join operation, which indicates that in a series of operations on this branch, the number of records is basically unchanged. It is the detailed information of Table Scan and Parallelism/Repartition Streams of TblBuyerItem (click to enlarge): In this verification, the average execution time for not using bitmap is 1 minute 10 seconds. Bitmap in Hash JoinIn general, bitmap in Hash Join is the same as that in Merge Join. When constructing a hash table on build input (outer input), it is processed in parallel. Constructing a hash table also creates bitmap for each build input stream. After the entire build phase is completed, start the probe phase. Therefore, bitmap can be used for execution in the probe phase. The subsequent operations are exactly the same as the Probe Bitmap operations in Merge Join. SQL Server only uses bitmap in Parallel Query. It is estimated that the Probe Bitmap is combined with the Repartition operation in Parallel Query, which can significantly reduce the cost. In Serial Query, the extra Probe Bitmap may not significantly improve the Query execution. OK, It's over now. the following describes the execution results of the example statements in various situations: see MSDN-Parallel Query Processing MSDN-Bitmaps in Microsoft SQL Server 2000 MSDN-Logical and Physical Operators.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.