Database-Physical optimization

Source: Internet
Author: User

Physical optimization

Algebraic optimization changes the order and combination of operations in a query statement, without involving the underlying access path
There are many access schemes for a query statement, their execution efficiency is different, only the algebra optimization is not enough
Physical optimization is to select an efficient and reasonable operation algorithm or access path, to obtain an optimized query plan

Optimization of Access path selection based on heuristic rules

I. Heuristic rules for selecting actions:

    1. For small relationships, use a full-table sequential scan, even if there is an index on the selection column
      For large relationships, heuristic rules are:

    2. For queries where the selection criteria is the main code = value
      Query results are at most one tuple, you can select the main code index
      A generic RDBMS will automatically establish a master code index.

    3. For queries where the selection condition is non-primary property = value, and the selection column has an index
      The number of tuples to estimate query results for
      If the scale is small (<10%) you can use the Index scan method
      Otherwise, use full-table sequential scanning

    4. For selection criteria is a non-equivalent query on a property or a range query, and the selection column has an index
      The number of tuples to estimate query results for
      If the scale is small (<10%) you can use the Index scan method
      Otherwise, use full-table sequential scanning

    5. For conjunction selection criteria connected with and
      If there are composite indexes that involve these attributes
      Preferential use of combined index scanning method
      If there is a generic index on some properties
      You can use the index scanning method described in [example 1-C4]
      Otherwise, full-table sequential scanning is used.

    6. For disjunction selection criteria that are connected with or, a full-table sequential scan is generally used

Second, the heuristic rules of connection operation:
1. If 2 tables have been sorted by connection attribute
Select Sort-Merge method
2. If a table has an index on the connection property
Choosing an Index connection method
3. If none of the above 2 rules apply, one of the tables is small
Choose the hash Join method

. You can choose a nested loop method, and select the smaller table, which is exactly the number of blocks (b) Less, as the appearance (Outer loop table).
Reason:
Set the connection table R and s respectively occupy the number of blocks for BR and BS
The number of memory buffer blocks used by the connection operation is K
Assigning K-1 blocks to appearances
If R is outward, the number of blocks accessed by the nested loop method is:
br+ (br/(K-1)) *bs
Obviously, a table with a small number of blocks should be chosen as the appearance

I. Statistical information
Cost-Based optimization method to calculate the execution cost of various operation algorithms is closely related to the state of the database.

Cost-Based Optimization

The statistics stored in the data dictionary are required by the optimizer:
1. For each basic table
Total number of tuples in the table (N)
Tuple Length (L)
Number of blocks occupied (B)
Number of overflow blocks occupied (BO)

2. 对基表的每个列

The number of different values in the column (m)
Selection rate (f)
If the distribution of the different values is uniform, the f=1/m
If the distribution of different values is uneven, the selection rate for each value = the number of tuples with that value/n
The maximum value of the column
The minimum value of the column
Whether an index has been established on this column
Index type (b + Tree index, hash index, clustered index)
3. Index to indexes (e.g. B + Tree index)
Number of layers indexed (L)
Number of different index values
Index selection cardinality s (with S tuples with an index value)
Leaf node number of index (Y)

Ii. Examples of cost estimates

Cost estimating formula for full table scanning algorithm

If the base table size is B, the cost of the full table scan algorithm cost=b

If the selection criteria is a code = value, then the average search cost COST=B/2

    1. Cost estimating formula for index scanning algorithm
      If the selection criteria is Code = value
      If [Example 1-C2], the primary index of the table is used
      For a B + tree, the number of layers is L, you need to access the B + tree from the root node to the leaf node L block, plus the base table in which the tuple is located, so cost=l+1
      If the selection criteria involves non-code attributes
      such as [Example 1-C3], if the B + Tree index, the selection criteria is equal comparison, S is the selection base of the index (with S tuples meet the conditions)
      In the worst case, the tuple that satisfies the condition may be saved on a different block, at which point the Cost=l+s

If the comparison condition is >,>=,<,<= action
Assuming that half of the tuples meet the criteria, they will access half of the leaf nodes.
Access half of the table storage block by index COST=L+Y/2+B/2

    1. Cost estimating formula for Nested loop join algorithm
      The cost of nested loop join algorithms has been discussed in 9.4.1 cost=br+ (br/(K-1)) *bs
      If you need to write the result of the connection back to disk,
      cost=br+ (br/(K-1)) *bs + (Frs*nr*ns)/mrs
      Where FRS is a connection selectivity (join selectivity), which represents the scale of the number of join result tuples
      Mrs is the block factor that holds the result of the connection, representing the number of result tuples that can be stored in each block.

    2. Cost estimating formulas for sorting-merging join algorithms
      cost=br+bs+ (Frs*nr*ns)/mrs if the join table has been ordered according to the connection properties.
      If you must sort the files
      The cost of sorting in the cost function is required
      The cost for sorting files that contain B blocks is approximately (2*b) + (2*B*LOG2B)

More complex queries, especially those involving joins and nesting
Do not put all the optimized tasks on the RDBMS
The optimization rules of RDBMS should be found to write SQL statements suitable for RDBMS auto-optimization
Understand the specific query plan representation, analyze the actual execution strategy of the query

Database-Physical optimization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.