Preface
Whether it is a blog park or CSDN, see a lot of friends on the database understanding, understanding or not break through a bottleneck, and this bottleneck is often just a layer of window paper, over which you will see a new world.
04, 05 When doing the project, with SQL Server 2000, the core table (most of the key functions used frequently every time to use) to reach 8 million data volume, long ago looked up some related tables, some reached more than 30 million, disk use of fiber disk, 100G space, Transfer data must be backed up weekly, or 100G space will be full in a week, and the system remains very good for several years now. I've also heard of a friend's SQL Server 2000 database working in a dozens of TB environment, high concurrency, and I'm still a long way from being able to manage this level.
When I was a SQL Server, I thought that it would not perform as well as Oracle, when it comes to big data processing. Oneself all the way, in the local ***og recorded a lot of optimization aspects of the post, to the wrong, there is no time series of finishing out, this article will join the concept of a little bit together, give everyone a reference. By looking at the data to understand the various concepts mentioned in the actual verification of the summary, it is possible to understand the database step-by-step deep understanding down.
I only understand SQL Server 2000, but this does not prevent me in Oracle, MySQL SQL tuning, product architecture, because in the database theory principle, the major database access is not big, a deep understanding of the database, will not affect your architectural design ideas become bad, The opposite brings you deeper thinking. riccc:2007.06.26
notes on the implementation plan
In SQL Server Query Analyzer, choose Show Execution Plan from the Queries menu, run the SQL query statement, In the results window there is a grid, execution Plan, and Messages three tab. Look at the execution plan in graphical form, in order from right to left, which is also the order of execution. Each icon in the execution plan represents an action, with one or more inputs for each operation, and one or more outputs. Input and output can be either a physical data table, an indexed data structure, or some intermediate result set/data structure during execution. Moving the mouse over the icon displays specific information about the operation, such as the logical and physical operation name, the number and size of records, I/O cost, CPU cost, and the specific expression of the operation (parameter argument). When you move the mouse over the connection arrows, the number of records and the size of the output result set is displayed on the start side of the arrow, which can generally be interpreted as input to the end of the arrowhead.
Also some additional notes on the implementation plan: 1. The information displayed in the execution plan is the result of an "assessment", not 100% accurate information, such as the number of records taken from statistics, I/O costs, and the CPU cost from the results of the evaluation based on statistical information during the execution plan generation process. 2. The execution plan is not necessarily accurate and is affected by the accuracy of the statistics maintained by SQL Server, while the SQL statement compile time may be different from the environment (memory usage, CPU status, etc.) at which the time of execution occurs.
The evaluation of statistics, I/o costs and CPU costs, the compilation and execution of SQL statements are no longer in depth. In addition, although the execution plan is not necessarily accurate, it is still the most important basis for SQL statement analysis, as you can understand that, in most cases, SQL Server executes in this way.
description of the Join method
In the database, SQL statements such as Ta***ea inner join Ta***eb, Ta***ea left out join Ta***eb arePerform a JOIN operation? What algorithm does SQL Server use to implement join operations for two table data?
There are three ways of SQL Server 2000: nested loop, merge, hash. Oracle also uses these three ways, but Oracle chooses to use the nested loop in a somewhat different way than SQL Server, and the memory management mechanism is not the same as SQL Server, so looking at the execution plan, Oracle nested loop uses a lot of The merge and hash methods are relatively small, in SQL Server, the merge and hash method is very common.
Take SQL Server 2000 as an example to illustrate these three ways, interspersed with some of the primary uses of the execution plan.
1. Nested loop Join
1.1 Sample SQL
Select ... from Ta***ea inner join Ta***eb on ta***ea.col1=ta***eb.col1 where ta***ea.col2=? and ta***eb.col2=?
There is no index in Ta***ea, and a primary key (clustered index) is established on col1 in Ta***eb.
1.2 algorithm pseudo-code description
foreach Rowa in Ta***ea where ta***ea.col2=?
{
Search ROWSB from Ta***eb where Ta***eb.col1=rowa.col1 and ta***eb.col2=?;
if (rowsb.count<=0)
Discard Rowa;
Else
Output Rowa and ROWSB;
}
The join operation has two inputs, the above example ta***ea is outer input for the outer loop; Ta***eb is innerInput, used for looping inside. The following is a description of the execution plan for SQL Server to complete this procedure.
1.3 View execution plan methods move to the front of the article.
1.4 Implementation Steps
The following is an example SQL execution plan diagram. The right side of the nested loop operation, located above is outer input, bitBelow is inner input. You cannot determine the outer input and inner input relationships based on which table in the join appears before, but must be determined from the execution plan, because SQL Server automatically chooses which one to use as inner input.
) performs a ta***e scan operation on the Ta***ea. The input to this operation is the data in the Ta***ea table, which is located on the disk, loaded into memory during operation, and the output is the outer input for the qualifying recordset, which will be B). In this operation, the ta***ea.col1= condition will be used.
b) Perform the nested loop operation described in the above pseudo-code. For each output record in a), perform step c).
c) Perform clustered Index seek operation on Ta***eb. This operation is performed within the nested loop loop, and the input is the clustered index data for the Ta***eb table. It uses Ta***eb.col1=rowa.col1 and ta***eb.col2=? These two conditions, from the TA ***eb's clustered index, select the result that matches the condition.
D) Constructs a return result set. From the output of the nested loop, organize the fields specified in select to construct the final output result set.
1.5 Step-by-step instructions
The above example uses a clustered index for inner input, which looks at the nonclustered index and strengthens the execution planAbility to understand and analyze.
Modify the primary key on the TA***EB col1 to nonclustered mode, and the SQL statement execution plan for the example is as follows:
Previous three steps a), B), c) as in 1.4, one thing to note is that step c is performed with the index seek operation, which differs from clustered index seek. The root node of a clustered index is each actual data record, and the root node of a nonclustered index is a reference to the clustered index root-node key value(if the table has a clustered index), or the actual dataRecord a reference to a ROWID(refers to a table that does not have a clustered index, which is called the heap table). Clustered Index seek execution , the actual physical data record has been loaded into memory, and after the index seek operation, the actual physical data record is not loaded, but only the root node data of the nonclustered index, which contains only the indexed field data and the referenced clustered index key value or ROWID. SQL Server uses the indexed field values from the nonclustered index root node data in this step to match the record (Rowa) associated fields in outer input to determine if the results are eligible and, if so, to save the nonclustered index root node data structure to nested The output data structure of the loop operation, and creates a bookmark (bookmark) indicating that the referenced data needs to be obtained from this bookmark when necessary.
d) Perform a bookmark lookup operation. The output of the nested loop operation is a memory data structure that, before sorting out the output result set of the entire query statement from this memory data structure, needs to deal with the previous bookmark reference problem, the bookmark Lookup The operation is to obtain specific record data based on the clustered index key value referenced in the bookmark or ROWID.
e) Filter filtering operation. Looking back at the previous actions, just using the index field (ta***eb.col1) of the nonclustered index to match the associated field of outer input when performing nested loop, so far no ta***eb.col2= has been used. This is done by using ta***eb.col2=. Filter the output of the bookmark lookup.
Look at the careful people who come here may have a few questions, 1. Ta***ea.col2=? Why is there a filter operation? 2. Why does the filter operation not appear in 1.4? The explanations are as follows: 1. In Ta***ea above is the TA***E scan operation, is directly to each actual data scan, in this scanning process can use ta***ea.col2=? This condition is filtered to avoid an additional filter operation. Move the mouse to the Ta***e scan operation, from the parameters of the message (Argument) can be seen in the ta***ea.col2=, the conditions have been applied. 2. As mentioned earlier, the root node of the clustered index is the actual data record, the execution of clustered index seek, the final scan to the actual data record, in this process using ta***eb.col2=? This condition also avoids an additional filter operation. This is why there is no filter operation in 1.4.
f) Constructs a return result set. Same as 1.4 step D).
1.6 Nested loop conditions of use
For any join operation, if the nested loop usage condition is met, SQL Server during query optimization willCost of nested loop(I/o costs, CPU costs, etc.) to determine whether to use this join party based on the evaluation resultsExpression The condition of using nested loop method is: a) outer input record number is not small, preferably under 1000-2000, generally more than 3000 is difficult to say, basic not to choose nested loop. b) as a table of inner input, there is an index available for this query.
This is because the number of outer input records is small, meaning that the outer loop is smaller, and inner input has an index available, which means that it is very efficient to search for a matching record in the inner input table in the loop, even if inner The input table has an actual record count of millions of. Based on these two conditions, the nested loop is highly efficient, with a minimum of memory and CPU consumption in three join modes(except for unreasonable enforcement of the specified nested loop method).
Additional instructions on usage conditions: The number of records for outer input does not refer to the actual number of records in the outer input table, such asExample SQL, if Ta***ea has maintenance statistics on col2 (there are col2 indexes or statistics that are maintained separately), and ta***ea.col2=? 's condition values conform to Sarg(searchable parameters), SQL Server will be able to query the compile timeSufficient statistics and conditional values are used to evaluate the number of records that match the criteria, and the records for which the query execution time conforms to the condition ta***ea.col2= are applied to the outer loop. Inner an index is available in the input table that refers to the field used in the inner input table to be associated with the Oute r input Table(one or more word segments) can hit an index (some or all of these fields appear in front of an indexed field).
It does not say that SQL Server 100% will choose nested loop if the above conditions are met. Because SQL Server's query optimizer is based on cost evaluation, SQL Server chooses a different join method if the cost of other scenarios is better than this. For example, if the qualifying index on inner input is a nonclustered index, SQL Server may need an additional bookmark lookup operation to get the actual record data, if the inner input table has a very large amount of data, the index fragmentation is high, May cause the bookmark lookup cost to be very high, SQL Server will try to evaluate the other join scenarios.
1.7 Force specified nested loop mode
Using the Loop keyword implementation, such as Ta***ea inner Loop join TA***EB, forces SQL Server to use nested loopLine this join operation. Or, use the option option, such as Ta***ea inner join TA***EB option (loop join)
The nested loop algorithm has its scope, within which efficiency is the highest, and beyond this range the efficiency is poor, unless you are very confident, do not arbitrarily force the specified join method.
The next step is no longer described in detail like above.
2. Merge Join
Merge Join the first step is to make sure that the two associated tables are sorted by the associated fields. If the associated field has an index available and the collation is consistent, the merge join operation can be done directly, otherwise SQL Server needs to first enter the associated table by the associatedLine once sort(That is, on the two inputs before the merge join, you may need to perform a sort operation before the merge join).
After the two tables are sorted by the associated fields, the merge Join operation starts with a record from each table, if the associatedcondition, the result set is placed; otherwise, a record with a small associated field value is discarded, and a record is removed from the table corresponding to the record to continue matching until the end of the entire loop.
When you perform a merge join on a many-to-many association table, you typically need to work with temporary tables. For example, if a join B uses a merge join, if there are a set of values for the associated field, there are multiple records in A and B A1, A2 ... An, B1, B2 ... Bn, for each record in a A1, A2 ... An, all equal records must be B1 in B, B2 ... bn for one match. This way, the pointer needs to be moved from B1 to Bn multiple times, each time it needs to read the corresponding B1 ... bn record. Will B1 ... The BN record is pre-read into the memory temp table, faster than reading from the original data page or disk.
The merge Join operation itself is very fast, but the ordering before the merge join can be quite time-consuming(SQL Server consumes mostmemory and CPU operation, one is big data sort, one is big data hash operation, this refers to the query plan inside the sort andHas h related operations, such as hash join, distinct operation using hash algorithm, etc., rather than referring to your SQL order bykeywords), especially for records with very large data volumes, resulting in a very high query cost for the merge join. For the numberAccording to a very large table, the merge join is one of the fastest join methods if the associated field of the merge join can use a clustered index. Therefore, the optimization scheme is a good design correlation relationship and the index structure of the table at the table structure design level, the SQL statement makes full use of the index, minimizes the sorting operation before the merge join, and reduces the bookmark lookup operation.
In general, if the nested loop condition is not met, the evaluation of the merge Join method is considered. The choice of the merge join is primarily to consider the amount of data for both inputs, and whether the associated field can hit the index, respectively. For example, Ta***ea join Ta***eb, where the associated field can hit an index in two tables, and the amount of data exceeds the selection of the nested loop, the merge Join method is considered. Of course, if the amount of data ta***ea and Ta***eb is too large to make the estimate expensive, the merge join is discarded and the hash join is evaluated.
Use the inner Merge join or the option (merge join) to force the merge Join method.
3. Hash join
The hash join has two inputs: build input(also called outer input) and probe input(also called inner input), not only For Inner/left/right joins and so on, such as Union/group by and so on will also use a hash join to operate, in the group by build input and probe input are the same record set.
With nested loop, build input is located above the execution plan and probe input is located below.
The hash join operation is done in two phases: the build (construct) stage and the probe (probing) phase.
Build phase
This stage mainly constructs hash ta***e. In operations such as Inner/left/right join, the associated field of the table is used as the hash key; In the group by operation, the Group by field is the hash key; in union or some other operations that remove duplicate records, the hash Key includes all the select fields.
The build operation takes each row of records from the build input input, and uses the hash function to generate the hash value for the row record associated field value, which corresponds to the hash buckets (hash table) in the hash ta***e. If a hash value corresponds to more than one hash buckts, these hash buckets are concatenated using the linked list data structure. When the ta***e of the entire build input has been processed, all the records in build input are referenced/associated by the hash buckets in the hash ta***e.
Probe stage
At this stage, SQL Server takes every row of records from the probe input input, also records the value of the associated field, uses the same hash function in the build phase to generate the hash value, and, based on the hash value, constructs the hash ta*** from the build stage. E to search for the corresponding hash bucket. In order to resolve the conflict in the hash algorithm, the hash bucket may link to other hash bucket,probe actions to search the hash bucket on the entire conflict chain to find matching records.
For details of the hash algorithm, you can view some data about the data structure. Hash algorithm is mainly used for large data volume search, in order to avoid every time as the merge join in all the data in the search match, through the appropriate hash function, first to search the data according to hash key to establish a hash value as an index, in the search, The hash value is first positioned to a smaller search range, and then the range is searched for matching results to improve efficiency.
SQL Server takes a table with a smaller amount of data as build input, trying to make the hash ta***e constructed from build input fully in memory so that the probe phase matching operation is done in memory entirely, so that the hash Join is called In-memory Hash join.
If the number of build input records is very large and the built hash ta***e cannot be accommodated in memory, SQL Server divides build input and probe input into multiple partition sections (partition), respectively. Each partition includes a separate, paired build input and probe input, so that a large hash join is split into separate, mutually non-affected hash joins, and each partition's hash join can be done in memory. SQL Server saves the sliced partition file on disk, each time a partition's build input and probe input are loaded into memory for a hash join. This hash join is called Grace Hash join, using the Grace hash join algorithm.
Along with the hash join operation of big data, there will be standard external merge sorts, multiple merge levels, multiple partitioning steps, multiple Partitioning Levels,sql server may also use algorithms such as recursive Hash join, or other optimization methods.
Hash joins are typically used for large data volumes, such as data from a table in a join to a certain extent or cannot be loaded into memory at once, and if your associated field is not able to hit the index in two join tables, it is also handled using a hash join. In general, hahs join processing is expensive and is one of the top killers of database server memory and CPU, especially when it comes to partitioning(DataThe amount of memory is too large, or the concurrent access is high, causing the current processing thread to not get enough memory, the amount of data is notpartitions may also be required for large cases), in order to complete all the partitioning steps as soon as possible, a large number of asynchronous I/O operations will be used, soA single one by one-thread period can cause multiple disk drives to be busy, which is likely to block the execution of other threads.
Use inner hash join or option (hash join) to force the use of the hash join method.
recommended
There are three join methods, all of which have two inputs. Basic principles of Optimization: 1. Avoid hash joins of big data and try to convert them to highMerge join, nested loop join. Possible uses include table structure design, index tuning design, SQL optimization, and business design optimization. For example, the use of redundant fields, the results of statistical analysis with the service run to the static table, the appropriate redundant table, using AOP or similar mechanisms to synchronize updates. 2. Minimize the amount of data in the join two inputs. The common problem with this is that the condition does not conform to Sarg(Light has a lot of skill to play with), and the internal conditions of subqueries are inadequate (SQL is too complex SQL Server query optimizer is often foolish, the conditions written outside the subquery are not used inside the subquery, affecting the subquery internalEfficiency of the sub-query and join time). Also is the design, the business side tries to limit these two inputs the amount of data.
For the optimization of business design, refer to a previous post: system analysis Design a join problem solution of the sentiment heavy depending on the business analysis design.
Supplement (2007.06.27): About SQL Server 2005
Roughly looking at SQL Server 2005, the execution plan display does have some difference, but the main part or the principle is not Many, there is not much deviation. In the example SQL above, when using a nonclustered index above ta***eb, the execution plan diagram for SQL Server 2005 is as follows:
A major difference is that SQL Server 2000 below the bookmark lookup operation, shown below 2005 as a RID lookup operation + A nested loops operation implementation, in fact, this is also very well understood, can be said to show that the execution plan is more reasonable, When you see this operation, you know that it is getting actual data through a looping mechanism to TA***EB.
Another point is that after you move the mouse over the icon of the execution plan, some changes to the pop-up message, such as 2005, show the output list of each operation, and the above article basically uses the word "output data structure" to express. By looking at the output list, you can better understand the reason for the existence of the RID lookup (Bookmark lookup) operation.
Finally, the 2005 can be displayed in the graph to save the execution plan, you can open and then view the analysis graphically, this is not in the 2000 below, 2000 can save the execution plan text. These small features are useful for analyzing SQL performance and are more intuitive to analyze on the graphical interface. Reprinted from: http://www.folo.cn/user1/3665/archives/2007/37177.html
Analysis of database Join mode