relational database query processing steps
Four steps to query processing:
- Query analysis
- Query check
- Query optimization
- Query execution
1. Query analysis
Lexical and grammatical analysis is the main analysis.
First, the entire SQL statement is scanned, according to the data in the database data marker, identify the SQL keyword, relationship name and attribute name and other phrases, check whether there is a lexical error. After the completion of the check statement is arranged in accordance with the rules of grammar, if qualified to enter the next step, or direct error.
2. Query check
The check mainly includes semantic check and integrity check.
The basic analysis flow of SQL is similar to the compiler's analysis process, except for the extra SQL special checks.
In semantic analysis, the "word" in the lexical analysis is checked according to the database object name, relationship name, attribute name and view, procedure and function defined by the data dictionary, and the existence and validity of the word is examined. At the same time, for view operations, you need to convert the operation of the views to the operation of the base table.
The integrity check is mainly to check the user's permission ability, if the user does not have permission or insufficient permissions to refuse to execute the statement.
The SQL statement is converted to the corresponding relational algebra expression after the query check.
Instance:
Search the student number and grade of the course number C2
SQL statements:
SELECT Sno,grade from WHERE CNO='C2'
Relational algebra Expressions:
Πsno, GRADE (σcno='C2'
3. Query optimization
Each query will have execution policy and operation algorithm, query optimization is based on the database and SQL statements to choose the best way to query. There are two general query optimizations: Algebraic optimization and physical optimization.
4. Query execution
The optimizer generates a query execution plan based on the optimization strategy, and the code generator executes the plan compilation and execution and returns the results.
The flowchart for the entire process is as follows:
How relational databases are optimized
- Algebraic optimization
- Physical optimization
Algebraic optimization
After executing the query check, the SQL statement gets the query tree, which is the relational algebraic expression, and the algebraic optimization is the equivalent transformation relation algebra expression, which can improve the query efficiency.
Principles of algebraic optimization
- Make selection and projection first
- Preprocessing connection Operations
- Projection combined with binocular operation
- Choose to combine the Cartesian product
- Find common sub-expressions
1. Make selection and projection first
In the case of SQL statement query, you must make a selection operation, determine the operand, if the SQL statement also has a projection operation together, the projection operation can also be seen as a choice operation, in order to avoid repeated scanning SQL statements, projection and selection at the same time.
2. Preprocessing connection operations
Connection operations are very expensive, and you can index or sort the data in order to speed up the connection operation.
3. Projection combined with binocular operation
The projection operation combines the previous or subsequent binocular elements without the need to scan the relationship in order to remove some field two.
An operation that is applied to a relationship when the monocular operation. Binocular operation is the operation of two relationship time.
Selection, projection is the monocular operator, and the connection and Cartesian product are binocular operators.
4. Select Combine Cartesian operations
Combining the selection operation with the Cartesian operation, the connection operation (especially the equivalent connection) is much more time-saving than the Cartesian product.
5. Find common sub-expressions
It is calculated that everyone will use the result of the expression and reduce the repetition calculation.
3rd and 4 plainly is to use a single-mesh operation to combine the binocular operation, the single-mesh operation has a choice and projection, the choice is based on conditions to select the option to match the criteria, the projection is the property of the filter data. The projection can be combined with a join or a Cartesian product, and the two tables are first synthesized into a table in the filter properties. The selection only needs to be combined with the Cartesian product, since the connection is the result of the selection and does not need to be selected.
Relational algebra optimization Steps
- Constructing a query tree
- Rule Transformation Query Tree
1. Construct the query tree
Transform the keywords in the SQL statement into relational algebra operators, and the following are the corresponding relationships:
where-Select Operation-σ
select-projection Operation-∏
from-Cartesian product-χ
Then constructs the tree structure, the leaf node expresses the relation, the internal node algebra operation. The query operation is done from the bottom up.
Physical optimization
Algebraic optimization is only to change the order and combination of operations in the query statement, not involving the underlying access path, the small database can also be used, but for large databases also need data access optimization, so the development of physical optimization, select efficient and reasonable operation algorithm and access path.
Three optimization algorithms for physical optimization:
- Rule-based heuristic optimization
- Optimization based on cost estimation
- Optimization of both combinations
1. Rule-based heuristic operation
- Select action Heuristic rule
- Heuristic rules for connection operations
Heuristic rules for selecting actions:
For small relationships, use a full-table sequential scan, even if there is an index on the selection column
For large relationships, heuristic rules are:
- For queries where the selection criteria is the main code = value
Query results are at most one tuple, you can select the main code index
A generic RDBMS will automatically establish a master code index.
- For queries where the selection condition is non-primary property = value, and the selection column has an index
The number of tuples to estimate query results, if the scale is small (<10%) you can use the Index scan method
Otherwise, use full-table sequential scanning
- For selection criteria is a non-equivalent query on a property or a range query, and the selection column has an index
The number of tuples to estimate query results, if the scale is small (<10%) you can use the Index scan method
Otherwise, use full-table sequential scanning
- For conjunction selection criteria connected with and
If you have a composite index that involves these attributes, take precedence over the combined Index scan method
If you have a generic index on some properties, you can have the first four methods of scanning
Otherwise, full-table sequential scanning is used.
- For disjunction selection criteria connected with or
Generally use full-table sequential scanning
The heuristic-based optimization method is simple to implement and the optimization cost is small, which is suitable for interpreting the execution system. (links to interpreting execution and compilation executions and compilers: http://lavasoft.blog.51cto.com/62575/187229)
However, the system for compiling execution is not applicable.
Optimization based on cost estimation
The cost-based optimization method is closely related to the state of the database to calculate the execution cost of various operation algorithms.
The statistics stored in the data dictionary are required by the optimizer:
1. For each basic table:
Total number of tuples in the table (N)
Tuple Length (l)
Number of blocks occupied (B)
Number of overflow blocks occupied (BO)
2. Basic columns for each table
The number of different values in the column (m)
Selection rate (f)
If the distribution of the different values is uniform, the f=1/m
If the distribution of different values is uneven, the selection rate for each value = the number of tuples with that value/n
The maximum value of the column
The minimum value of the column
Whether an index has been established on this column
Index type (b + Tree index, hash index, clustered index)
3. Index to indexes (e.g. B + Tree index)
Number of layers indexed (L)
Number of different index values
Index selection cardinality s (with S tuples with an index value)
Leaf node number of index (Y)
Database principle-relational database query and optimization