T-SQL Advanced: Beyond basic Level 3: Building correlated Subqueries
Gregory Rassen, 2014/03/05
Original link:
http://www.sqlservercentral.com/articles/Stairway+Series/105972/
The series
This article is part of the Advanced series: T-SQL Advanced: Beyond the basics
From his ladder to T-SQL Dml,gregory Larsen covers more advanced aspects of the T-SQL language, such as subqueries.
On the second floor of this staircase, I discussed how to use subqueries in TRANSACT-T-SQL statements. This stair level expands the subquery topic by discussing a subquery type called an associated subquery. I'll explore what is related to subqueries and how it differs from ordinary subqueries. In addition, I'll provide you with some examples of transaction-t-sql statements that go beyond the basics, and use associated subqueries to help identify the rows returned in the result set to meet complex business requirements.
What is a correlated subquery ?
In the 2nd level of this staircase, we learned that the normal subquery is just a SELECT statement within another Transact-SQL statement, in which the subquery returns the result independently of the external query. An associated subquery is a form of a subquery that cannot run independently of an external query because it contains one or more columns from an external query. correlated subqueries, like ordinary subqueries, are sometimes referred to as internal queries. If the related subquery (internal query) runs independently of the external query, it returns an error. Because the execution of an internal query relies on values from an external query, it is called a correlated subquery.
Correlated subqueries can be executed many times. It will run once for each candidate row selected in the external query. The column values for each candidate row are used to provide values for the internal outer query columns for each execution of the associated subquery. The final result of the statement that contains the related subquery is based on the results of each execution of the correlated subquery.
Sample data for the Correlated subquery sample
To demonstrate how to use related subqueries, I need some test data. All of my examples will use the ADVENTUREWORKS2008R2 database instead of creating my own test data. If you want to follow and run my example in your environment, you can download the ADVENTUREWORKS2008R2 database from here: http://msftdbprodsamples.codeplex.com/releases/view/93587
in the WHERE example of an associated subquery in a clause
To demonstrate the use of associated subqueries in the WHERE clause, I want to make sure that these CustomerID have purchased more than 70 items in a single order. To meet this requirement, I can run the code in Listing 1.
Listing 1: Associating subqueries in the WHERE clause
When I ran the code in Listing 1, I got the output from report 1.
Report 1: Results returned when you run the code in Listing 1
If you review the code in Listing 1, you will see that I have restricted my location with the related subquery. The subquery is the code in parentheses, and I extract the associated subquery code from Listing 1 and put it in Listing 2.
Listing 2: Sub-query code in Listing 1
If I run the code in Listing 2, I'll find that an error is shown in report 2.
Report 2: An error occurred while running code in Listing 2
I got the error shown in report 2 because the associated subquery contains a reference to the column. SalesOrderID, which is a column from an external query. Because all related subqueries reference one or more columns from an external query, you cannot run the external query associated with it independently. The fact that you cannot run subqueries independently of the entire Transact-SQL statement is to differentiate related subqueries from ordinary subqueries.
The example given here is a very simple example of using a related subquery in the WHERE clause. Hopefully, with such a simple example, it's easy to understand the difference between a normal subquery and a correlated subquery. In general, a related subquery can be much more complex. Also, keep in mind that there may be other ways to meet your business needs without using related subqueries.
As you can see, writing a correlated subquery is very similar to a normal subquery, but you cannot run a dependent subquery independently.
in the having example of an associated subquery in a clause
Sometimes, you might want to restrict a clause with a different value from an external query. At this point, you can use the related subquery in your "have" clause. Suppose you have to write a query that calculates the rebate amount for customers who purchased products worth more than $150,000 before the 2008 tax. The code in Listing 3 calculates the rebate amount for these values by using the associated subquery in the HAVING clause.
Listing 3: dependent subquery with clauses
When I run the code in Listing 5, I get the results from report 3.
Report 3: Run the results of listing 3
The related subquery code in Listing 3 uses the CustomerID in the GROUP BY clause in an external query in the associated subquery. The associated subquery is executed once for each row returned from the GROUP BY clause. This allows the "have" clause to calculate the total number of products sold to each CustomerID by summing the value of the Subtotal column for each SalesOrderHeader record, where the record is associated with CustomerID from an external query. The Transact-SQL statement in Listing 3 returns only one row of products that CustomerID has purchased for more than $150,000.
that contains the related subquery. UPDATE example of a statement
Related subqueries can not only return result sets using the SELECT statement. You can also use them to update data in SQL Server tables. To demonstrate this, I first use the code in Listing 4 to generate some test data in the tempdb table.
Listing 4: Creating and populating the code for the test table
The code in Listing 4 creates a carinventory table, and then fills 8 rows to represent the current inventory of the car.
The Sales Manager periodically uses the query in Listing 5 to view his invoicepriceratio.
Checklist 5:invoicepriceratio Query
When the manager ran this query, she noticed that there were many similar cars with the same invoice amount, with different invoicepriceratio values. In order to maximize her invoice price, she asked her to support the writing of a query to update all of her car's stickerprice so that each car had the same carname value and had the same invoicepriceratio. She wants the IT staff to set Stickerprice to the same value as the maximum price of carname. This way, all cars with the same carname value will have the same stickerprice value. To complete the update of the Carinventory table, the IT staff runs the TRANSACT-SQLL statement in Listing 6, which contains a related subquery.
Listing 6: Related subqueries to update carinventory at maximum price
The code in Listing 8 uses the carname of the external query in the associated subquery to identify the maximum stickerprice for each unique carname. Then, the maximum stickerprice value found in the correlated subquery is used to update the Stickerprice value of each carinventory record with the same name.
Performance Considerations for Correlated subqueries
There are some performance considerations that you should be aware of when writing Transact-SQL statements that contain related subqueries. When an external query contains a small number of rows, performance is not bad. However, when an external query contains a large number of rows, it does not scale well from a performance standpoint. This is because the associated subquery is to be executed for each candidate row in the external query. Therefore, when an external query contains more and more candidate rows, a related subquery must be executed more than once, so the Transact-SQL statement will take longer to run. If you find that the performance of a related subquery Transact-SQL statement does not meet your requirements, you should look for alternative solutions, such as queries that use an internal or external join operation, or a query that returns a small number of candidate rows from an external query.
Summary
An associated subquery is an internal query that contains one or more columns from an external query. The associated subquery is executed once for each candidate row of the external query. Because the associated subquery contains columns from an external query, it cannot be run independently of the external query. Correlated subqueries have their place, and although a large number of candidate rows are identified in an external query, they are not scalable from a performance standpoint.
Questions and Answers
In this section, you can review how you understand the concepts of correlated subqueries by answering the following questions.
Question 1:
When writing related subqueries, you need to have ___________________. (Fill in blank)
One or more columns from an internal query that constrains the results of the related subquery.
One or more columns in an internal query that are used in the select list of related subqueries.
One or more columns from an external query to constrain the results of the related subquery.
One or more columns of an external query that are used in the select list of related subqueries.
Question 2:
Select all statements about the related subquery.
As the number of candidate rows increases, the performance of Transact-SQL statements that contain correlated subqueries is improved.
The related subquery is executed once for each candidate row from the external query.
The related subquery references one or more columns in the internal query.
When a related subquery is used in an owning clause, an internal query is executed for each candidate row returned by the GROUP BY clause.
Question 3:
Correlated subqueries are similar to normal subqueries, while correlated subqueries can run independently of the entire Transact-SQL statement (TRUE or false).
The real
False
Answer:
Question 1:
The correct answer is C. Related subqueries need to use one or more columns from an external query in a related subquery statement. When you execute a related subquery, these external column references are replaced with the values for each candidate row.
Question 2:
The correct answer is B and D. A is incorrect because as the number of candidate rows increases, the number of executions of correlated subqueries increases, and the performance of Transact-SQL statements becomes worse. C is incorrect because the related subquery must contain one or more rows from an external query, rather than an internal query.
Question 3:
The correct answer is B. If you try to run a dependent subquery independently of the full Transact-SQL statement, the associated subquery will fail.
This article is part of the Advanced T-sql: In addition to the basic staircase
Sign up to our RSS channel and get notified once we post a new level on the stairs!
Advanced T-sql: Beyond basic Level 3: Building related subqueries--701 Group