SQL Optimization-logical Optimization-subquery optimization (MySQL) and sqlmysql
1) subquery concept:When a query is a subquery of another query, it is called a subquery (a query statement is nested in the query statement ).
Subqueries are located:
A) Target column position: If a subquery is located in the target column, it can only be a scalar query. Otherwise, the database may return a message similar to "error: subquery must return only one field.
B) FROM clause position: When a subquery appears in the FROM clause, the database may return a message similar to "subquery IN THE FROM clause cannot refer to the relationship in the same query level, therefore, correlated subqueries cannot appear in the FROM clause. Non-correlated subqueries appear in the FROM clause. You can pull the subquery to the parent layer. When multiple tables are connected, the connection cost is considered and the best choice is given.
C) WHERE clause position: A subquery that appears in a WHERE clause is part of a conditional expression, and the expression can be divided into operators and operands. Based on the different data types involved in the operation, there are also different operators, such as INT type operations such as ">, <, =, <>, which have certain requirements for subqueries (such as INT type equivalent operations, subqueries must be scalar queries ). IN addition, the format of a subquery that appears IN the WHERE clause also uses operations specified by the predicate, such as IN, BETWEEN, and EXISTS.
D) JOIN/ON Clause position: the JOIN/ON clause can be split into two parts. One is that the JOIN block is similar to the FROM clause, and the other is that the ON Clause block is similar to the WHERE clause, both of these two sections can contain subqueries. Subqueries are processed in the same way as FROM clauses and WHERE clauses.
E) Position of the GROUPBY clause: The target column must be associated with GROUPBY 1. You can write the subquery at the GROUPBY position, but it is not practical to use the subquery at the GROUPBY position.
F) ORDERBY clause location: you can write the subquery at the ORDERBY position. However, the ORDERBY operation applies to the entire SQL statement, and subqueries are not useful at ORDERBY.
2) subquery category
From the relationship between objects:
A) related subqueries.
The execution of a subquery depends on some attribute values of the outer parent query. The subquery depends on the parameters of the parent query. When the parameters of the parent query change, subqueries need to be re-executed based on the new parameter values (the query optimizer has some significance in optimizing related subqueries), such:
SELECT * FROM t1 WHERE col_1 = ANY
(SELECT col_1 FROM t2 WHERE t2.col _ 2 = t1.col _ 2 );
/* The subquery statement contains the col_2 column of Table t1 of the parent query */
B) Non-correlated subqueries.
The execution of a subquery does not depend on any attribute value of the outer parent query. In this way, the query is independent and can be solved independently to form a subquery plan to solve the problem before the outer query, such:
SELECT * FROM t1 WHERE col_1 = ANY
(SELECT col_1 FROM t2 WHERE t2.col _ 2 = 10 );
// In the subquery Statement (t2), the parent query (t1) does not exist.
Specific predicates:
A) [NOT] IN/ALL/ANY/SOME subquery.
Similar to the semantics, it indicates "[reverse] exist/All/Any/any", the left side is the operand, and the right side is the subquery, which is one of the most common subquery types.
B) [NOT] EXISTS subquery.
Semi-join semantics, which indicates "[reverse] exists" with no left operand. The right side is a subquery and is also one of the most common subquery types.
C) Other subqueries.
All subqueries except the preceding two types.
From the statement composition complexity:
A) SPJ subquery.
Queries composed of selection, join, and projection operations.
B) GROUPBY subquery.
The SPJ subquery is composed of grouping and aggregation operations.
C) Other subqueries.
Add other clauses such as Top-N, LIMIT/OFFSET, set, and sort in the GROUPBY subquery.
The last two seed queries are sometimes called non-SPJ subqueries.
From the result perspective:
A) scalar quantum query.
The type of the result set returned by the subquery is a simple value.
B) single-row Single-Column subquery.
The type of the result set returned by the subquery is zero or one unit group. Similar to scalar queries, but zero tuples may be returned.
C) multi-row Single-Column subquery.
The type of the result set returned by the subquery is multiple tuples but only one simple Column exists.
D) Table subquery.
The type of the result set returned by the subquery is a table (multiple rows and multiple columns ).
3) subquery Optimization Method
A) Subquery Coalescing)
Under some conditions (semantic equivalent: Two query blocks generate the same result set), multiple subqueries can be merged into one subquery (after merging, it is still a subquery, in the future, you can use other technologies to remove subqueries ). In this way, you can reduce multiple table scans and connections to a single table scan and a single connection, for example:
SELECT * FROM t1 WHERE a1 <10 AND (
EXISTS (SELECT a2 FROM t2 WHERE t2.a2 <5 AND t2.b2 = 1) OR
EXISTS (SELECT a2 FROM t2 WHERE t2.a2 <5 AND t2.b2 = 2)
);
Optimization:
SELECT * FROM t1 WHERE a1 <10 AND (
EXISTS (SELECT a2 FROM t2 WHERE t2.a2 <5 AND (t2.b2 = 1 OR t2.b2 = 2)
/* The two ESISTS clauses are merged into one, and the conditions are also merged */
);
B) Subquery Unnesting)
Also known as subquery anti-nesting, also known as subquery pulling up. Place some sub-queries in the parent query of the outer layer, and use them as the join relationships to coordinate with the parent query of the outer layer. The essence is to rewrite some sub-queries into equivalent multi-table join operations (after expansion, the subquery does not exist, and the external query becomes a multi-Table connection ). The advantage is that the related access paths, connection methods, and connection sequence may be effectively used to minimize the levels of query statements.
The common IN/ANY/SOME/ALL/EXISTS conversion to SEMI-JOIN (semi join) and elimination of common subqueries are like this, for example:
SELECT * FROM t1, (SELECT * FROM t2 WHERE t2.a2> 10) v_t2
WHERE t1.a1 <10 AND v_t2.a2 <20;
Optimization:
SELECT * FROM t1, t2 WHERE t1.a1 <10 AND t2.a2 <20AND t2.a2> 10;
/* The subquery is changed to the Join Operation of table t1 and table t2, which is equivalent to pulling a layer of table t2 from the subquery */
Conditions for expanding a subquery:
A) if the sub-query contains clustering, GROUPBY, and DISTINCT clauses, the sub-query can only be solved independently and cannot be pulled to the outer layer.
B) if the subquery is just a simple (SPJ format) query statement, you can pull the subquery to the outer layer, which often improves the query efficiency. This format is discussed when the subquery is pulled up. This is also the scope of technical processing for the subquery.
The sub-query is pulled up to the upper-layer query, provided that the result after pulling (expanding) cannot contain redundant tuples.To expand a subquery, follow these rules:
A) if the upper-layer query results are not repeated (that is, the SELECT clause contains the primary code), you can expand its subqueries. In addition, the DISTINCT mark should be added before the SELECT clause of the expanded query.
B) if the SELECT statement in the upper-layer query has the DISTINCT mark, you can directly expand the subquery.
If no duplicate tuples exist in the query results of the inner layer, expand.
To expand a subquery, follow these steps:
A) connect the subquery and the from clause of the outer query to the same FROM clause and modify the corresponding running parameters.
B) modify the predicate symbol of the subquery accordingly (for example, "IN" to "= ").
C) Merge the WHERE condition of the subquery as a whole with the WHERE condition of the outer query AND connect it with the AND condition to ensure that the context meaning of the newly generated predicate is the same as that of the original predicate, and become a whole.
C) Aggregate Subquery Elimination (Aggregate Subquery Elimination)
Generally, some systems support the elimination of scalar aggregation subqueries. For example:
SELECT * FROM t1 WHERE t1.a1> (SELECT avg (t2.a2) FROM t2 );
From the book "Art of database query optimizer"