Physical optimization
Algebraic optimization changes the order and combination of operations in a query statement, without involving the underlying access path
There are many access schemes for a query statement, their execution efficiency is different, only the algebra optimization is not enough
Physical optimization is to select an efficient and reasonable operation algorithm or access path, to obtain an optimized query plan
Optimization of Access path selection based on heuristic rules
I. Heuristic rules for selecting actions:
For small relationships, use a full-table sequential scan, even if there is an index on the selection column
For large relationships, heuristic rules are:
For queries where the selection criteria is the main code = value
Query results are at most one tuple, you can select the main code index
A generic RDBMS will automatically establish a master code index.
For queries where the selection condition is non-primary property = value, and the selection column has an index
The number of tuples to estimate query results for
If the scale is small (<10%) you can use the Index scan method
Otherwise, use full-table sequential scanning
For selection criteria is a non-equivalent query on a property or a range query, and the selection column has an index
The number of tuples to estimate query results for
If the scale is small (<10%) you can use the Index scan method
Otherwise, use full-table sequential scanning
For conjunction selection criteria connected with and
If there are composite indexes that involve these attributes
Preferential use of combined index scanning method
If there is a generic index on some properties
You can use the index scanning method described in [example 1-C4]
Otherwise, full-table sequential scanning is used.
For disjunction selection criteria that are connected with or, a full-table sequential scan is generally used
Second, the heuristic rules of connection operation:
1. If 2 tables have been sorted by connection attribute
Select Sort-Merge method
2. If a table has an index on the connection property
Choosing an Index connection method
3. If none of the above 2 rules apply, one of the tables is small
Choose the hash Join method
. You can choose a nested loop method, and select the smaller table, which is exactly the number of blocks (b) Less, as the appearance (Outer loop table).
Reason:
Set the connection table R and s respectively occupy the number of blocks for BR and BS
The number of memory buffer blocks used by the connection operation is K
Assigning K-1 blocks to appearances
If R is outward, the number of blocks accessed by the nested loop method is:
br+ (br/(K-1)) *bs
Obviously, a table with a small number of blocks should be chosen as the appearance
I. Statistical information
Cost-Based optimization method to calculate the execution cost of various operation algorithms is closely related to the state of the database.
Cost-Based Optimization
The statistics stored in the data dictionary are required by the optimizer:
1. For each basic table
Total number of tuples in the table (N)
Tuple Length (L)
Number of blocks occupied (B)
Number of overflow blocks occupied (BO)
2. 对基表的每个列
The number of different values in the column (m)
Selection rate (f)
If the distribution of the different values is uniform, the f=1/m
If the distribution of different values is uneven, the selection rate for each value = the number of tuples with that value/n
The maximum value of the column
The minimum value of the column
Whether an index has been established on this column
Index type (b + Tree index, hash index, clustered index)
3. Index to indexes (e.g. B + Tree index)
Number of layers indexed (L)
Number of different index values
Index selection cardinality s (with S tuples with an index value)
Leaf node number of index (Y)
Ii. Examples of cost estimates
Cost estimating formula for full table scanning algorithm
If the base table size is B, the cost of the full table scan algorithm cost=b
If the selection criteria is a code = value, then the average search cost COST=B/2
- Cost estimating formula for index scanning algorithm
If the selection criteria is Code = value
If [Example 1-C2], the primary index of the table is used
For a B + tree, the number of layers is L, you need to access the B + tree from the root node to the leaf node L block, plus the base table in which the tuple is located, so cost=l+1
If the selection criteria involves non-code attributes
such as [Example 1-C3], if the B + Tree index, the selection criteria is equal comparison, S is the selection base of the index (with S tuples meet the conditions)
In the worst case, the tuple that satisfies the condition may be saved on a different block, at which point the Cost=l+s
If the comparison condition is >,>=,<,<= action
Assuming that half of the tuples meet the criteria, they will access half of the leaf nodes.
Access half of the table storage block by index COST=L+Y/2+B/2
Cost estimating formula for Nested loop join algorithm
The cost of nested loop join algorithms has been discussed in 9.4.1 cost=br+ (br/(K-1)) *bs
If you need to write the result of the connection back to disk,
cost=br+ (br/(K-1)) *bs + (Frs*nr*ns)/mrs
Where FRS is a connection selectivity (join selectivity), which represents the scale of the number of join result tuples
Mrs is the block factor that holds the result of the connection, representing the number of result tuples that can be stored in each block.
Cost estimating formulas for sorting-merging join algorithms
cost=br+bs+ (Frs*nr*ns)/mrs if the join table has been ordered according to the connection properties.
If you must sort the files
The cost of sorting in the cost function is required
The cost for sorting files that contain B blocks is approximately (2*b) + (2*B*LOG2B)
More complex queries, especially those involving joins and nesting
Do not put all the optimized tasks on the RDBMS
The optimization rules of RDBMS should be found to write SQL statements suitable for RDBMS auto-optimization
Understand the specific query plan representation, analyze the actual execution strategy of the query
Database-Physical optimization