Describes sub-query operations in MySQL and mysql queries in detail.
Continue with the following preparations:
Create a test database TestDB;
create database TestDB;
Create test tables table1 and table2;
CREATE TABLE table1 ( customer_id VARCHAR(10) NOT NULL, city VARCHAR(10) NOT NULL, PRIMARY KEY(customer_id) )ENGINE=INNODB DEFAULT CHARSET=UTF8; CREATE TABLE table2 ( order_id INT NOT NULL auto_increment, customer_id VARCHAR(10), PRIMARY KEY(order_id) )ENGINE=INNODB DEFAULT CHARSET=UTF8;
Insert test data;
INSERT INTO table1(customer_id,city) VALUES('163','hangzhou'); INSERT INTO table1(customer_id,city) VALUES('9you','shanghai'); INSERT INTO table1(customer_id,city) VALUES('tx','hangzhou'); INSERT INTO table1(customer_id,city) VALUES('baidu','hangzhou'); INSERT INTO table2(customer_id) VALUES('163'); INSERT INTO table2(customer_id) VALUES('163'); INSERT INTO table2(customer_id) VALUES('9you'); INSERT INTO table2(customer_id) VALUES('9you'); INSERT INTO table2(customer_id) VALUES('9you'); INSERT INTO table2(customer_id) VALUES('tx');
After the preparation is complete, table1 and table2 should look like the following:
mysql> select * from table1; +-------------+----------+ | customer_id | city | +-------------+----------+ | 163 | hangzhou | | 9you | shanghai | | baidu | hangzhou | | tx | hangzhou | +-------------+----------+ 4 rows in set (0.00 sec) mysql> select * from table2; +----------+-------------+ | order_id | customer_id | +----------+-------------+ | 1 | 163 | | 2 | 163 | | 3 | 9you | | 4 | 9you | | 5 | 9you | | 6 | tx | +----------+-------------+ 7 rows in set (0.00 sec)
The preparation work is almost done. Let's start with today's summary.
One problem
Now we need to query all the order numbers of all Hangzhou users. How can we write this SQL statement? First, you can write:
select table2.customer_id, table2.order_id from table2 join table1 on table1.customer_id=table2.customer_id where table1.city='hangzhou';
Can achieve the results we need. However, we can also write:
select customer_id, order_id from table2 where customer_id in (select customer_id from table1 where city='hangzhou');
Er? What is the select statement in brackets? The question is, what is the syntax and how the task can be completed, so this blog will focus on this issue.
What is a subquery?
In short, the subquery is:
As shown in, subqueries are called internal queries. External queries include internal queries. A subquery can contain any clause that can be included in a common select statement, such as distinct, group by, order by, limit, join, and union. However, an external query must be one of the following statements: select, insert, update, delete, set, or do.
We can use subqueries In the where and having clauses to take the results of subqueries as the conditions for judgment.
Subquery by comparison
A subquery returns a scalar (just a value), a row, a column, or a table. These subqueries are called scalar, row, column, and table subqueries.
When a subquery returns a scalar, we can use a comparison character in The where or having clause to directly judge the results of the subquery. For example, I want to get more mermer_id, city, and order number than the user's tx order number. How can I write this SQL statement.
Let's talk about the general steps for writing SQL:
- Understanding requirements;
- Obtain the number of mermer_id, city, and corresponding orders that are more than the number of user tx orders.
- Check the final fields to be obtained;
- Finally, you need to obtain the customer_id, city, and order number information.
- Analyze the tables involved in the field information;
- Table 1 and Table 2 are involved.
- How are these tables associated;
- The customer_id field is used to associate table 1 with table 2.
- Break down requirements and get small demands;
- Obtain the number of orders for the tx user;
- You need to obtain the number of orders of other users;
- Compare the number of orders.
- Confirm the filtering conditions for each small requirement;
- Obtain the result of each small requirement and assemble it to obtain the final result.
Finally, I will write the following SQL statement:
select table1.customer_id,city,count(order_id) from table1 join table2 on table1.customer_id=table2.customer_id where table1.customer_id <> 'tx'group by customer_id having count(order_id) > (select count(order_id) from table2 where customer_id='tx' group by customer_id);
Subqueries are used in the preceding query, and the results of external queries and subqueries are compared and judged. If a subquery returns a scalar value (only one value), you can use the following symbols for comparison and judgment in an external query: =,>, <,> =, <=, and <>; if the subquery does not return a scalar value, and the external query uses a comparison operator to compare the subquery results, an exception is thrown.
Use ANY for subqueries
The comparison operator is used for subqueries, which specifies that a subquery can return only one scalar value. However, what if a subquery returns one set?
No problem. We can use any, in, some, or all to perform conditional judgment on the returned results of the subquery. Here we will summarize the use of any for subqueries.
The any keyword must be used with the comparison operator summarized above; The any keyword means "TRUE is returned for any value in the column returned by the subquery if the comparison result is TRUE ".
For example, "10> any (11, 20, 2, 30)", because 10> 2, this result returns TRUE. If 10 is compared with any of the sets, if the value is TRUE, TRUE is returned.
For example, I want to query the id, city, and order quantity of users with more orders than tx mer_id is tx or 9you.
I can obtain the following SQL statements to meet the requirements.
select table1.customer_id,city,count(order_id)from table1 join table2on table1.customer_id=table2.customer_idwhere table1.customer_id<>'tx' and table1.customer_id<>'9you'group by customer_idhaving count(order_id) >any (select count(order_id)from table2where customer_id='tx' or customer_id='9you'group by customer_id);
The meaning of any is quite clear. literal translation is any one. If any of the conditions is met, TRUE is returned.
Use IN for subquery
Using in for subqueries, which we often encounter when writing SQL statements. In indicates whether a specified value is in this set. If so, TRUE is returned. Otherwise, FALSE is returned.
In is the alias of "= any". Where "= any" is used, we can use "in" to replace it. I will not give an example here. Let's take full advantage of your imagination.
With in, it is certainly not in; not in does not mean the same as <> any. not in and <> all mean the same thing, the following is a summary.
Use SOME for subquery
Some is the alias of any, which is rarely used. You only need to understand the meaning of any. Here we will not summarize it too much. For details, refer to the summary in the any section above.
Use ALL for subquery
All must be used with comparison operators. All indicates "TRUE is returned if the comparison result is TRUE for all values in the column returned by the subquery ".
For example, "10> all (2, 4, 5, 1)", returns TRUE because 10 is greater than all values in the set; if it is "10> all (20, 3, 2, 1, 4)", then, because 10 is less than 20, FALSE is returned.
<> The synonym for all is not in, indicating that it is not equal to all values in the set. This is easy to mix with <> any. It is good to leave more snacks at ordinary times.
Scalar Query
Subqueries can be divided into scalar queries and multi-value subqueries based on the number of returned values of subqueries. When a comparison character is used for subqueries, The subquery must be a scalar quantum query. If a multi-value subquery is used, an exception is thrown.
Multi-value subquery
The scalar query corresponds to a multi-value subquery. A multi-value subquery returns a column, a row, or a table to form a set. We generally use any, in, all, and some words to judge the results of external queries and subqueries. If any, in, all, and some are compared with scalar queries, null results are obtained.
Independent subquery
An independent subquery is a subquery that runs independently of an external query. What is dependency on external queries? First look at the following two SQL statements.
SQL statement 1: Obtain the order number of all hangzhou customers.
select order_id from table2 where customer_id in (select customer_id from table1 where city='hangzhou');
SQL statement 2: Obtain the user whose city is hangzhou and has an order.
select * from table1 where city='hangzhou' and exists (select * from table2 where table1.customer_id=table2.customer_id);
Although the two SQL statements above are not very appropriate, they are enough to illustrate the problem.
For SQL statement 1, we can copy the subquery separately and execute it separately, that is, the subquery has no relationship with the external query.
For SQL statement 2, we copy the subquery separately and cannot execute it separately. Because the subquery of SQL statement 2 depends on some fields of the external query, as a result, the subquery depends on the external query, and the correlation is generated.
For subqueries, efficiency is often taken into account. When we execute a select statement, we can add the explain keyword to view the query type, index used for query, and other information. For example:
explain select order_id from table2 where customer_id in (select customer_id from table1 where city='hangzhou');
Use an independent subquery. If the maximum number of traversal times for the set in the subquery is n and the maximum number of external queries is m, we can record it as O (m + n ). If a subquery is used, the traversal times may reach O (m + m * n ). As you can see, the efficiency will multiply. Therefore, when using subqueries, you must consider the relevance of subqueries.
For more explanations of explain, refer to here.
Related subqueries
Related subqueries are subqueries that use external query columns. That is, subqueries calculate each row of external queries. However, in MySQL, dynamic optimization is performed, which varies with the situation. Using related subqueries is the easiest place to show performance. The optimization of SQL statements is a very big topic. Only by accumulating practical experience can we better understand how to optimize SQL statements.
I can't talk about SQL Performance here. If I just read other people's articles to consider performance issues, I don't feel any way, we need a project to better understand it.
EXISTS predicates
EXISTS is a very good predicate that allows the database to efficiently check whether a specified query generates certain rows. Returns TRUE or FALSE based on whether the subquery returns rows. Unlike other predicates and logical expressions, EXISTS does not return UNKNOWN regardless of whether the input subquery returns rows. For EXISTS, UNKNOWN is FALSE. Or the above statement, the user who obtains the city hangzhou and has the order.
select * from table1 where city='hangzhou' and exists (select * from table2 where table1.customer_id=table2.customer_id);
Use explain to view the following information:
We can see that there is a related SUBQUERY (dependent subquery ). We can see that EXISTS and IN are very similar. What is the difference between them?
The main difference between IN and EXISTS lies IN the judgment of the Three-value logic. EXISTS always returns TRUE or FALSE. For IN, except for TRUE or FALSE values, it is possible to return UNKNOWN for NULL values. However, IN the filter, the UNKNOWN method is the same as FALSE. Therefore, the SQL optimizer selects the same execution plan as the IN method.
It is said that IN and EXISTS are almost the same, but you have to say not in and not exists. When the input list contains NULL values, the difference between not exists and not in is very big. When the input list contains NULL values, IN always returns TRUE and UNKNOWN, so not in will get not true and not unknown, that is, FALSE and UNKNOWN.
mysql> select 'c' NOT IN ('a', 'b', NULL)\G;
Run the above Code to check the result. You will be surprised.
Derived table
As mentioned above, a table may also be returned in the value returned by the subquery. If the virtual table returned by the subquery is used as the input of the FROM clause again, in this way, the virtual table of the subquery becomes a derived table. The syntax structure is as follows:
FROM (subquery expression) AS derived_table_alias
Because the derived table is a completely virtual table, it is neither physical nor physical.
Summary
The summary is almost the same. Of course, there are still a lot of sub-queries. It is impossible to finish the summary in an article. Here we just put some basic concepts, the common knowledge points are summarized. I have not covered the usage of subqueries in update, delete, and insert statements, which are roughly the same. If the knowledge is expanded, there will be no headers, and we still need to wait for the appropriate depth mining, but the depth should not exceed 2. How can we define this 2. Well, this article is so far. We will see it in the next article.