Query optimization
Has a very important position in the relational database system.
Relational query optimization is a key factor affecting RDBMS performance
Because the semantic level of the relational expression is very high, the relational system can analyze the query semantics from the relational expression and provide the possibility of executing the query optimization.
The advantage of query optimization is not only that the user does not have to consider how best to express the query for better efficiency, but also that the system can be better than the "optimization" of the user program
(1) The optimizer can get a lot of statistics from the data dictionary, and the user program is difficult to obtain this information
(2) If the physical statistics of the database change, the system can automatically re-optimize the query to select the appropriate execution plan. Programs must be rewritten in a non-relational system, and rewriting programs are often less likely to be used in real-world applications.
(3) The optimizer can consider hundreds of different execution plans, and programmers generally can only consider a limited number of possibilities.
(4) The optimizer includes a number of complex optimization techniques that are often only available to the best programmers. The automatic optimization of the system is equivalent to enabling everyone to have these optimization techniques
The RDBMS calculates the execution cost of various query execution strategies through a cost model, then chooses the least expensive execution scheme
Centralized database
The cost of execution mainly includes:
Number of disk access blocks (I/O cost)
Processor Time (CPU cost)
Memory overhead for queries
I/o cost is the most important
Distributed database
Total cost =i/o cost +CPU cost + memory Cost + communication cost
[example 3 ] for elective 2 course. Expressed in sql: Select Student from STUDENT,SC WHERE student =sc and sc = ' 1000 student Records, 10000 selected course record of which elective 2 course recorded as x
The system can use a variety of equivalent algebraic expressions to complete this query Q1=πsname (Σstudent =sc ∧sc = ' 2 ' (STUDENTXSC)) Q2=πsname (Σsc = ' 2 ' (Student SC)) q3=πsname (Studentσsc. Cno = ' 2 ' (SC))
One, the first situation
Q1=πsname (ΣSTUDENT.SNO=SC. Sno∧sc.cno= ' 2 ' (STUDENTXSC))
1. Calculating Generalized Cartesian product
The practice of connecting each tuple of student and SC:
Mount as many blocks of a table as possible in memory, such as the student table, and set aside a tuple that holds another table, such as the SC table.
Each tuple in SC is connected to each tuple in the student, and after the connected element is assembled, it is written to the intermediate file.
Reads a block from SC and connects to an in-memory student tuple until the SC table finishes processing.
Read into several block student tuples and read into a SC tuple
Repeat the above process until you have finished processing the student table
2. Make a selection operation
Read in the concatenated tuple, select the record that satisfies the requirement by selecting criteria
Assume that memory processing time is ignored. Time spent reading intermediate files (same as writing intermediate files) requires 5x104s
A tuple that satisfies a condition assumes only 50, which can be placed in memory
3. As a projection operation
The result of the 2nd step is projected on the sname and the final result is obtained.
The total time to execute the query in the first case ≈105+2x5x104≈105s
All memory processing time is negligible
Ii. Second situation
Q2=πsname (σsc.cno= ' 2 ' (Student SC))
1. Calculate Natural connections
To perform a natural connection, read the student and SC table policy unchanged, the total number of read blocks is still 2100 blocks spent in s
The result of a natural connection is significantly less than the first case, which is 104
Write these tuple time as 104/10/20=50s, for the first case of 1 per thousand
2. Reads the intermediate file block, performs the choice operation, spends the time also is 50s.
3. The 2nd step result projection output.
The second case is the total execution time ≈105+50+50≈205s
Iii. Third situation
Q3=πsname (studentσsc.cno= ' 2 ' (SC))
1. First select the SC table, just read the SC table, access to 100 blocks takes time to 5s, because the conditions of the tuple only 50, do not need to use intermediate files.
2. Read the student table and connect the read-in student tuple to the in-memory SC tuple. Also just read the student table A total of 100, take time to 5s.
3. Projecting the connection result into the output
The third case is the total execution time ≈5+5≈10s
If there is an index on the CNO field of the SC table
The first step does not have to read all SC tuples and simply read the cno= ' 2 ' of those tuples (50)
Access to the index block and SC to meet the conditions of the data block about a total of up to eight blocks
If the student table is also indexed on SNO
The second step does not have to read all the student tuples
Because the SC records meet the criteria only 50, involving a maximum of 50 student records
The number of blocks to read the student table can also be significantly reduced
The total access time will be further reduced to a few seconds
Transform algebraic expression Q1 into Q2, Q3,
That is, select and connect the operation, the first choice, so that join the tuple can be greatly reduced, which is the algebraic optimization
In the Q3
SC table selection operation algorithm has full table scan and index scanning 2 methods, after preliminary estimation, index scanning method is superior
For connections to the student and SC tables, using indexes on the student table, the index join is less expensive, which is the physical optimization
Algebraic optimization strategy
: Improve query efficiency by equivalent transformation of relational algebraic expressions
Equivalence of relational algebraic expressions: The result is the same as replacing the corresponding relationship in two expressions with the same relationship
The two relational expressions E1 and E2 are equivalent and can be recorded as E1≡e2
Common equivalent transformation rules:1.The connection, the Cartesian product Exchange Law set E1 and E2 are relational algebraic expressions,Fis the condition of the connection operation, there is e1xe2≡e2xe1 E1 e2≡e2 E1 E1 e2≡e2 E12.The combination of the connection, Cartesian product and the e1,e2,e3 is the relational algebraic expression, F1 and F2 are the conditions of the connection operation, there are (E1xe2) xe3≡e1x (e2xe3) (E1 E2) e3≡e1 (E2 E3) (E1 E2) e3≡e1 (E2 E3)3.The law of the connection of the projection ((e)) ≡ (e) Here, E is the relational algebraic expression, Ai (i=1,2,..., n), Bj (j=1,2, ..., m) is a property name and {A1,a2,...,an} constitutes a subset of {B1,B2,...,BM}.4.The Chosen string law ((e)) ≡ (e) Here, E is the relational algebraic expression, F1, F2 is the selection condition. The selection of the string-Connection law indicates that the selection criteria can be merged. So you can check all the conditions once.5.Select the commutative law of the projection operation σF((E)) ≡ (σF(E)) selection criteriaFOnly the attribute A1,...,an is involved. IfFThere are more general rules for attributes that do not belong to A1,...,an B1,...,BM: (σF(E)) ≡ (σF((E)))6.Choose the Exchange law with the Cartesian product ifFis a property in E1, then (e1xe2) ≡ (E1) xE2 ifF=F1∧F2, and F1 only involves attributes in E1, F2 only involves attributes in E2, then the equivalent transform rule above1,4,6Available: (e1xe2) ≡ (E1) x (E2) if F1 only deals with attributes in E1, F2 properties that involve E1 and E2, there is still (E1xe2) ≡ ((E1) xE2) It makes the partial selection Do before Descartes.7.Select and assign the E=E1∪E2,E1,E2 with the same attribute name, then σF(E1∪E2) ≡σF(E1) ∪σF(E2)8.The assignment law of choice and difference operation if E1 has the same attribute name as E2, then σF(E1-E2) ≡σF(E1)-σF(E2)9.Choosing a distribution law for natural connections σF(E1 E2) ≡σF(E1) ΣF(E2)FOnly the public properties of E1 and E2 are involved..The distribution law of projection and Cartesian product E1 and E2 are two relational expressions, A1,...,an is the E1 attribute, B1,...,BM is E2 property, then (E1xe2) ≡ (E1) x (E 2)One by one .E1 and E2 have the same attribute name, then (e1∪e2) ≡ (E1) ∪ (E2)
Heuristic Optimization of Query tree
Typical heuristic rules:
1. The selection operation should be done as first as possible. In the optimization strategy this is the most important and basic one of the
2. The projection and selection operations are performed simultaneously (pipelining technology)
If there are several projection and selection operations, and they all operate on the same relationship, You can do all of these operations while scanning this relationship to avoid duplicate scan relationships and to avoid storing intermediate relationships
3. Combine the projection with the binocular operation before or after it (pipelining technology)
4. Combining certain choices with the Cartesian product to be performed in front of it becomes a join Operation
5. Find common subexpression
If the result of this recurring subexpression is not a significant relationship and it is much less time to read the relationship from external memory than to calculate the subexpression, It is advantageous to calculate the common subexpression first and write the result to the intermediate file
when the query is a view, the expression that defines the view is a common subexpression
follow these heuristic rules and apply the equivalent transformation formula of 9.3.1 to optimize the algorithm of the relational expression.
Algorithm: Optimization of relational Expressions
Input: Query Tree of a relational expression
Output: Optimized query tree
Method:
(1) Use equivalent transformation rule 4 shape as σf1∧f2∧ ... ∧FN (E) transforms to Σf1 (Σf2 (... (ΣFN (E))).
(2) For each selection, use the equivalent transform rule 4~9 to move it to the tip-of the tree as much as possible.
(3) The general form of the equivalent transform rule 3,5,10,11 is used for each projection to move it to the tip-of the tree as far as possible.
Note:
Equivalent transform rule 3 causes some projections to disappear
Rule 5 divides a projection into two, one of which may be moved to the leaf side of the tree
(4) uses the equivalent transformation rule to combine the selection and projection of the string into a single selection, a single projection or a selection followed by a projection. Enables multiple selections or projections to be executed at the same time, or complete
(5) in a single scan to group the inner nodes of the resulting syntax tree. Each binocular operation (x,, ∪,-) and all its direct ancestors for a set (these direct ancestors are (σ,π operations).
if its descendants until the leaves are all monocular operations, they are also incorporated into the group
but when the binocular operation is a Cartesian product (x), and the following is not the choice to make an equivalent connection to it, then the selection and the binocular operation can not be composed of the same group, the Monocular operations are divided into a group of
例[5] 查询语句:检索学习课程名为MATH的女学生学号和姓名。该查询语句的关系代数表达式如下: πS#,SNAME(σCNAME=’MATH’∧SEX=’F’(C SC S))上式中, 符号用π、σ、×操作表示,可得下式 πS#,SNAME(σCNAME=’MATH’∧SEX=’F’(πL (σC.C# = SC.C#∧SC.S# = S.S#(C×SC×S))))此处L是C、SC、S中全部属性,去除重复属性。
Query optimization of database-relational database system