1) Subquery concept: When a query is a sub-part of another query, it is called a subquery (query statements are nested in query statements).
The sub-queries appear in the following locations:
A) Destination column location: subquery If it is in the destination column, it can only be a scalar subquery, otherwise the database may return a hint like "error: subquery must return only one field".
b) FROM clause position: The related subquery appears in the FROM clause, and the database may return a hint like "a subquery in the FROM clause cannot refer to a relationship in the same query level", so the related subquery cannot appear in the FROM clause , a non-correlated subquery appears in the FROM clause, which can be queried to the parent layer, and the connection cost is considered to be the preferred factor in multi-table joins.
c)where clause position: The subquery that appears in the WHERE clause is part of a conditional expression that can be decomposed into operators and operands, and operators vary according to the different data types of the participating operations, such as the int type has ">, <, =, <> "And so on, this sub-query all have certain requirements (such as an int type of equivalent operation, requires subquery must be a scalar subquery). In addition, subqueries appear in the WHERE clause in the format, and some actions specified by the predicate, such as in, between, exists, and so on.
d)join/on clause position: The JOIN/ON clause can be split into two parts, one is the JOIN block is similar to the FROM clause, and the ON clause block is similar to the WHERE clause, both of which can appear subqueries. The subquery is handled in the same way as the FROM clause and the WHERE clause.
e)GROUPBY clause position: The destination column must be associated with the GROUPBY 1. Subqueries can be written at groupby locations, but subqueries are not useful in groupby.
f) Place by clause: The subquery can be written at the place of the search by. However, the operation is done on the entire SQL statement, the sub-query used in the line is not practical significance.
2) Sub-query classification
Look from the relationship between objects:
A) related sub-queries.
The execution of a subquery relies on some property values of the outer parent query. The subquery relies on the parameters of the parent query, and when the parameters of the parent query change, the subquery needs to be re-executed according to the new parameter values (the query optimizer has some meaning in optimizing the correlated subquery), such as:
SELECT * from t1 WHERE col_1 = any
(SELECT col_1 from t2 WHERE t2.col_2 = t1.col_2);
/* col_2 column for the T1 table of the parent query in the subquery statement */
b) Non-correlated sub-queries.
The execution of the subquery does not depend on any of the property values of the outer parent query. This way the query is independent and can be solved on its own, forming a subquery plan before the outer layer of the query solver, such as:
SELECT * from t1 WHERE col_1 = any
(SELECT col_1 from t2 WHERE t2.col_2 = 10);
The property of the parent query (T1) does not exist in the subquery statement (T2)
See from specific predicates:
a) [NOT] in/all/any/some subquery.
Semantically similar, meaning "[inverse] existence/all/any/any", the left is the operand, the right is a subquery, is one of the most common sub-query types.
b) [NOT] exists subquery.
The semi-join semantics, which means "[Reverse] exists", has no left operand, and the right side is a subquery, and is one of the most common subquery types.
c) Other sub-queries.
All sub-queries except for both of the above.
In terms of the complexity of the statement's composition:
A) SPJ sub-query.
A query consisting of selection, connection, and projection operations.
b) GroupBy sub-query.
SPJ A query that consists of a grouping, aggregation operation.
c) Other sub-queries.
GroupBy the subquery with other clauses such as Top-n, Limit/offset, collection, sorting, and so on.
The latter two seed queries are sometimes collectively referred to as non-SPJ subqueries.
From a result point of view:
A) scalar quantum query.
The result set type returned by the subquery is a simple value.
b) Single-row sub-query.
The result set type returned by the subquery is 0 or one unit group. Similar to the scalar subquery, but may return 0 tuples.
c) Multiline single-row subquery.
The result set type returned by a subquery is a multi-tuple but has only one simple column.
d) Table sub-query.
The result set type returned by a subquery is a table (multiple rows and columns).
3) Optimization method of sub-query
A) Sub-query merging (subquery coalescing)
Under certain conditions (semantic equivalence: Two query blocks produce the same result set), multiple subqueries can be combined into a subquery (merged or subquery, which can be eliminated later by other techniques). This allows multiple table scans, multiple connections to be reduced to a single table scan and a single connection, such as:
SELECT * from T1 WHERE a1<10 and (
EXISTS (SELECT A2 from T2 WHERE t2.a2<5 and T2.b2=1) OR
EXISTS (SELECT A2 from T2 WHERE t2.a2<5 and t2.b2=2)
);
Can be optimized to:
SELECT * from T1 WHERE a1<10 and (
EXISTS (SELECT A2 from T2 WHERE t2.a2<5 and(t2.b2=1 OR t2.b2=2)
/* Two esists clauses are merged into one, the conditions are also merged */
);
b) Sub-query expansion (subquery unnesting)
Also called sub-query anti-nesting, also known as sub-query pull-up. To put some sub-queries in the parent query of the outer layer, as the connection relationship with the outer parent query, the essence is to rewrite some subqueries as equivalent multi-table JOIN operation (after expansion, the subquery does not exist, the external query becomes a multi-table connection). The benefit is that the access path, the connection method, and the connection order may be used effectively, minimizing the level of query statements as much as possible.
Common In/any/some/all/exists-based scenarios such as semi-join (SEMI JOIN), normal-type subquery elimination, and so on, are in this class, such as:
SELECT * from T1, (SELECT * from T2 WHERE t2.a2 >10)v_t2
WHERE t1.a1<10 and v_t2.a2<20;
Can be optimized to:
SELECT * from T1, T2 WHERE t1.a1<10 and t2.a2<20 andt2.a2 >10;
/* The subquery becomes a JOIN operation for the T1, T2 table, which is equivalent to pulling the T2 table from the subquery. */
Criteria for sub-query expansion:
A) If the aggregation, GROUPBY, and DISTINCT clauses appear in the subquery, the subquery can only be solved separately, not above the outer layer.
b) If the subquery is just a simple format (SPJ format) query statement, you can pull up the subquery to the outer layer, which can often improve the query efficiency. Sub-query pull-up, the discussion is this format, which is also the sub-query expansion of the scope of technical processing.
The query is pulled up to the upper level query, if the result of pull (expand) can not bring extra tuples, so the sub-query expansion needs to follow the following rules:
A) If the result of the upper query is not duplicated (that is, the main code is included in the SELECT clause), you can expand its subquery. And the distinct flag should be added before the SELECT clause of the expanded query.
b) If there is a distinct flag in the SELECT statement for the upper query, the subquery can be expanded directly.
If the inner query result does not have a repeating tuple, you can expand it.
Specific steps for sub-query expansion:
A) concatenate the FROM clause of the subquery and the outer query into the same FROM clause, and modify the corresponding run parameters.
b) Modify the predicate symbol of the subquery accordingly (for example: "In" is modified to "=").
c) merges the Where condition of the subquery as a whole with the Where condition of the outer query, and joins the word with the and condition to ensure that the newly generated predicate is the same as the context of the old predicate and becomes a whole.
c) Aggregation subquery elimination (Aggregate subquery elimination)
in general, some systems support scalar-aggregate subquery elimination. such as:
SELECT * from t1 WHERE t1.a1>(select AVG (T2.A2) from T2);
A book from the Art of database query optimizer
SQL optimization--Logical optimization--sub-query optimization (MySQL)