SQL has a very powerful and flexible way of querying, and multi-table join operations can often be replaced with subqueries, this article will cover all aspects of subqueries.
Introduction
A subquery is essentially a restricted SELECT statement that is nested into another select,update,insert,delete statement, in which only the following clauses can be used
- Select clause (required)
- FROM clause (required)
- WHERE clause (optional)
- GROUP by (optional)
- Have (optional)
- Order BY (available only when the top keyword is used)
Subqueries can also be nested in other subqueries, which can be up to 32 layers in a nested set. A subquery is also called an internal query (Inner query) or an internal selection (Inner Select), and a query statement that contains subqueries is also called an external query (outter) or an external selection (Outer Select), the concept of subqueries can be simply explained:
is a subquery that is used as a data source.
In general, a subquery can be divided into three types according to the type of data returned by the subquery, respectively:
- Returns a data table (table)
- Returns a list of values (column)
- Returns a single value (Scalar)
Below, we follow these three ways to illustrate sub-query
subqueries are used as data sources
When a subquery is used after the FROM clause of an external query, the subquery is used as a data source , even if the subquery returns only a single value (Scalar) or a column value (column), which can still be considered a special A data source , a two-dimensional data table (table). A subquery that is used as a data source is much like a view , except that the subquery only exists temporarily and is not included in the database.
For example, this statement:
SELECT P.productid, P.name, P.productnumber, m.name as Productmodelnamefrom production.product as P INNER JOIN ( SELECT Name, productmodelid from Production.ProductModel) as M on p.productmodelid = M.productmodelid
The subquery statement above ProductModel a subset of M in the table as a data source (table) and a product table for internal connections. The results are as follows:
Using as a data source is also the simplest application for subqueries. Of course, when a subquery is used as a data source, it is also divided into correlated subqueries and unrelated subqueries , which are described later in the article.
subqueries are used as selection criteria
subqueries, which are the criteria for selection, are also the most complex of sub-query applications.
Subqueries that are selection criteria are those that return only a single column (column) , which, if used as a selection criterion, can be thought of as a column of only one row , even if it returns only one value . Like what:
In the AdventureWorks:
I would like to get a total of 68 hours of sick days for employees :
SELECT [FirstName] , [MiddleName] , [LastName] from [adventureworks].[ Person]. [Contact] WHERE ContactID in (SELECT EmployeeID from [adventureworks].[ HumanResources]. [Employee] WHERE sickleavehours>68)
The results are as follows:
In the above query, the subquery that follows the IN keyword returns a column of values to use as the selection criteria for the external query .
Similarly, the not-in keyword, which is logically reversed from the In keyword, is no longer described here
However, it is important to emphasize that you should not use the in and not keyword, which can cause a lot of potential problems, and this article has a good explanation for this question: http://wiki.lessthandot.com/index.php/Subquery_typo_with_using_in. The point of this article is to never use the in and not keyword again, my point is that it is reasonable to be present, and I think it is only possible to use in and not in when inside is a fixed value, for example:
SELECT [FirstName] , [MiddleName] , [LastName] from [adventureworks].[ Person]. [Contact] WHERE ContactID in (25,33)
Only in this case, using the in and not keywords is safe, in other cases it is best to use the Exists,not exists,join keyword instead. In addition to in, the keywords used to select criteria have any and all, and the two keywords are the same as their literal meanings. and "<", ">", "=" connection used, such as the above with in the subquery:
I'd like to get an employee who has a total of sick days greater than 68 hours
The equivalent query with the ANY keyword is:
SELECT [FirstName] , [MiddleName] , [LastName] from [adventureworks].[ Person]. [Contact] WHERE ContactID =any (SELECT EmployeeID from [adventureworks].[ HumanResources]. [Employee] WHERE sickleavehours>68)
When used as the any and all keywords in a subquery, the effect is as follows
=any |
and in equivalence |
<>all |
And not in equivalence |
>any |
Greater than the smallest (>min) |
<any |
Less than the maximum (<max) |
>all |
Greater than the maximum (>max) |
<all |
Less than the smallest (<min) |
=all |
Below say |
The =all keyword is seldom used, and the effect in a subquery is equal to "=" If there is only one return value, and if there are multiple return values, the result is empty
It is important to note that SQL is a very flexible language, just as the effects of a subquery can be implemented using join (with the same effect, with different ideas), the effects of any and all can be replaced in other ways, as shown in the table above, >any and > Min is fully equivalent, such as the following two query statements are completely equivalent:
Select *from AdventureWorks.HumanResources.EmployeeWHERE sickleavehours>any (select Sickleavehours from AdventureWorks.HumanResources.Employee WHERE sickleavehours>68)
Select *from AdventureWorks.HumanResources.EmployeeWHERE sickleavehours> (select MIN (sickleavehours) from AdventureWorks.HumanResources.Employee WHERE sickleavehours>68)
correlated subqueries and EXISTS keywords
The query mentioned earlier is irrelevant subquery (uncorrelated subquery), there is a kind of important query in the subquery is the related subquery (correlated subquery), also known as a repeating subquery , for example, Or the above query, with the related subquery to write:
I would like to get a total of 68 days of sick days for employees :
SELECT [FirstName] , [MiddleName] , [LastName] from [adventureworks].[ Person]. [Contact] C WHERE EXISTS (SELECT * from [adventureworks].[ HumanResources]. [Employee] e WHERE C.contactid=e.contactid and e.sickleavehours>68)
The results are the same as the query results that use the IN keyword:
How do you differentiate between correlated subqueries and unrelated subqueries ? The simplest way is to see whether the subquery itself can be executed, such as the subquery in the example above:
(SELECT * from [AdventureWorks]. [HumanResources]. [Employee] e WHERE C.contactid=e.contactid and e.sickleavehours>68)
The execution of this sentence itself will be an error. Because this sentence refers to a table of external queries
For unrelated subqueries, the entire query is executed only once for the subquery, and then to an external query, such as:
SELECT *from AdventureWorks.HumanResources.EmployeeWHERE Sickleavehours>anysqlresult
The above unrelated subquery, the entire query process can be considered as a subquery first return sqlresult (SQL result set), and then to external query use, the entire process subquery only once
Conversely, as a correlated subquery , the number of times the subquery executes depends on the external query, and the outer query executes one row at a time, such as:
Or the above example: I would like to get an employee who has a total of sick days greater than 68 days
SELECT [FirstName] , [MiddleName] , [LastName] from [adventureworks].[ Person]. [Contact] C WHERE EXISTS (SELECT * from [adventureworks].[ HumanResources]. [Employee] e WHERE C.contactid=e.contactid and e.sickleavehours>68)----step 1:select [FirstName] , [ MiddleName] , [LastName] from [adventureworks].[ Person]. [Contact] C WHERE EXISTS (SELECT * from [adventureworks].[ HumanResources]. [Employee] e WHERE 1=e.contactid and e.sickleavehours>68)----step 2:select [FirstName] , [MiddleName] , [LastName] from [AdventureWorks]. [Person]. [Contact] C WHERE EXISTS (SELECT * from [adventureworks].[ HumanResources]. [Employee] e WHERE 2=e.contactid and e.sickleavehours>68)----step n:select [FirstName] , [MiddleName] , [LastName] from [AdventureWorks]. [Person]. [Contact] C WHERE EXISTS (SELECT * from [adventureworks].[ HumanResources]. [Employee] e WHERE N=e.contactid and e.sickleavehours>68)
As shown in the code above. The related subquery above actually executes n times (n depends on the number of rows of the outer query), each row of the outer query passes the parameters used by the corresponding row to the subquery, and returns TRUE if the subquery has a corresponding value (both the current row is selected and displayed in the result), or false if not. Then repeat the next line.
subqueries are used as computed columns
When a subquery is used as a computed column , only a single value (Scalar) is returned. Used as a computed column after the SELECT statement. Also divided into correlated sub-queries and unrelated subqueries
Examples of correlated subqueries such as: I want to get the name of each product and the total sales
Select [Name], (select COUNT (*) from AdventureWorks.Sales.SalesOrderDetail S WHERE s.productid=p.productid) As Salesamountfrom [AdventureWorks]. [Production]. [Product] P
Some of the results are as follows:
When a subquery is used as a computed column , a unique value is returned for each row of the outer query .
Similarly, the SQL subquery can use other statements to achieve the same effect, the above statement and the following statement to achieve the same effect:
SELECT P.name,count (S.productid) from [AdventureWorks]. [Production]. [Product] P left joins AdventureWorks.Sales.SalesOrderDetail SON S.productid=p.productidgroup by P.name
When a subquery is used as a computed column and is not an unrelated subquery, it is returned only once but a value, which is no longer elaborated here.
Summary
This article elaborates subqueries through three different uses of subqueries. At the same time, all sub-queries can be divided into correlated subqueries and unrelated subqueries, and the functions implemented by subqueries can be implemented using joins or other methods. But a good writer should be a rich vocabulary, not just to express what he means. Learning a variety of SQL queries is the only way to learn SQL queries.