Database principle-relational database query and optimization

Source: Internet
Author: User

relational database query processing steps

Four steps to query processing:

    1. Query analysis
    2. Query check
    3. Query optimization
    4. Query execution

1. Query analysis

Lexical and grammatical analysis is the main analysis.

First, the entire SQL statement is scanned, according to the data in the database data marker, identify the SQL keyword, relationship name and attribute name and other phrases, check whether there is a lexical error. After the completion of the check statement is arranged in accordance with the rules of grammar, if qualified to enter the next step, or direct error.

2. Query check

The check mainly includes semantic check and integrity check.

The basic analysis flow of SQL is similar to the compiler's analysis process, except for the extra SQL special checks.

In semantic analysis, the "word" in the lexical analysis is checked according to the database object name, relationship name, attribute name and view, procedure and function defined by the data dictionary, and the existence and validity of the word is examined. At the same time, for view operations, you need to convert the operation of the views to the operation of the base table.

The integrity check is mainly to check the user's permission ability, if the user does not have permission or insufficient permissions to refuse to execute the statement.

The SQL statement is converted to the corresponding relational algebra expression after the query check.

Instance:

Search the student number and grade of the course number C2

SQL statements:

SELECT Sno,grade     from  WHERE CNO='C2'

Relational algebra Expressions:

Πsno, GRADE (σcno='C2'

3. Query optimization

Each query will have execution policy and operation algorithm, query optimization is based on the database and SQL statements to choose the best way to query. There are two general query optimizations: Algebraic optimization and physical optimization.

4. Query execution

The optimizer generates a query execution plan based on the optimization strategy, and the code generator executes the plan compilation and execution and returns the results.

The flowchart for the entire process is as follows:

How relational databases are optimized
    1. Algebraic optimization
    2. Physical optimization

Algebraic optimization

After executing the query check, the SQL statement gets the query tree, which is the relational algebraic expression, and the algebraic optimization is the equivalent transformation relation algebra expression, which can improve the query efficiency.

Principles of algebraic optimization

    1. Make selection and projection first
    2. Preprocessing connection Operations
    3. Projection combined with binocular operation
    4. Choose to combine the Cartesian product
    5. Find common sub-expressions

1. Make selection and projection first

In the case of SQL statement query, you must make a selection operation, determine the operand, if the SQL statement also has a projection operation together, the projection operation can also be seen as a choice operation, in order to avoid repeated scanning SQL statements, projection and selection at the same time.

2. Preprocessing connection operations

Connection operations are very expensive, and you can index or sort the data in order to speed up the connection operation.

3. Projection combined with binocular operation

The projection operation combines the previous or subsequent binocular elements without the need to scan the relationship in order to remove some field two.

An operation that is applied to a relationship when the monocular operation. Binocular operation is the operation of two relationship time.

Selection, projection is the monocular operator, and the connection and Cartesian product are binocular operators.

4. Select Combine Cartesian operations

Combining the selection operation with the Cartesian operation, the connection operation (especially the equivalent connection) is much more time-saving than the Cartesian product.

5. Find common sub-expressions

It is calculated that everyone will use the result of the expression and reduce the repetition calculation.

3rd and 4 plainly is to use a single-mesh operation to combine the binocular operation, the single-mesh operation has a choice and projection, the choice is based on conditions to select the option to match the criteria, the projection is the property of the filter data. The projection can be combined with a join or a Cartesian product, and the two tables are first synthesized into a table in the filter properties. The selection only needs to be combined with the Cartesian product, since the connection is the result of the selection and does not need to be selected.

Relational algebra optimization Steps

    1. Constructing a query tree
    2. Rule Transformation Query Tree

1. Construct the query tree

Transform the keywords in the SQL statement into relational algebra operators, and the following are the corresponding relationships:

where-Select Operation-σ

select-projection Operation-∏

from-Cartesian product-χ

Then constructs the tree structure, the leaf node expresses the relation, the internal node algebra operation. The query operation is done from the bottom up.

Physical optimization

Algebraic optimization is only to change the order and combination of operations in the query statement, not involving the underlying access path, the small database can also be used, but for large databases also need data access optimization, so the development of physical optimization, select efficient and reasonable operation algorithm and access path.

Three optimization algorithms for physical optimization:

    1. Rule-based heuristic optimization
    2. Optimization based on cost estimation
    3. Optimization of both combinations

1. Rule-based heuristic operation

    • Select action Heuristic rule
    • Heuristic rules for connection operations

Heuristic rules for selecting actions:

For small relationships, use a full-table sequential scan, even if there is an index on the selection column
For large relationships, heuristic rules are:

    • For queries where the selection criteria is the main code = value

Query results are at most one tuple, you can select the main code index
A generic RDBMS will automatically establish a master code index.

    • For queries where the selection condition is non-primary property = value, and the selection column has an index

The number of tuples to estimate query results, if the scale is small (<10%) you can use the Index scan method
Otherwise, use full-table sequential scanning

    • For selection criteria is a non-equivalent query on a property or a range query, and the selection column has an index

The number of tuples to estimate query results, if the scale is small (<10%) you can use the Index scan method
Otherwise, use full-table sequential scanning

    • For conjunction selection criteria connected with and

If you have a composite index that involves these attributes, take precedence over the combined Index scan method
If you have a generic index on some properties, you can have the first four methods of scanning
Otherwise, full-table sequential scanning is used.

    • For disjunction selection criteria connected with or

Generally use full-table sequential scanning

The heuristic-based optimization method is simple to implement and the optimization cost is small, which is suitable for interpreting the execution system. (links to interpreting execution and compilation executions and compilers: http://lavasoft.blog.51cto.com/62575/187229)

However, the system for compiling execution is not applicable.

Optimization based on cost estimation

The cost-based optimization method is closely related to the state of the database to calculate the execution cost of various operation algorithms.

The statistics stored in the data dictionary are required by the optimizer:
1. For each basic table:
Total number of tuples in the table (N)
Tuple Length (l)
Number of blocks occupied (B)
Number of overflow blocks occupied (BO)

2. Basic columns for each table

The number of different values in the column (m) 

Selection rate (f)
If the distribution of the different values is uniform, the f=1/m
If the distribution of different values is uneven, the selection rate for each value = the number of tuples with that value/n
The maximum value of the column
The minimum value of the column
Whether an index has been established on this column
Index type (b + Tree index, hash index, clustered index)
3. Index to indexes (e.g. B + Tree index)
Number of layers indexed (L)
Number of different index values
Index selection cardinality s (with S tuples with an index value)
Leaf node number of index (Y)

Database principle-relational database query and optimization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.