Refer from:http://explainextended.com/2009/07/16/inner-join-vs-cross-apply/
INNER JOIN
Used construct in SQL: It joins and together, selecting only those row combinations for which A JOIN
condition is true.
This query:
SELECT *from table1join table2on table2.b = table1.a
Reads
For each row from table1
, select all rows from table2
where the value of field is equal to that b
fielda
Note that this condition can is rewritten as this:
SELECT *from table1, table2where table2.b = table1.a
, in which case it reads as following:
Make a set of all possible combinations of rows from and and of this table1
table2
set select only those rows where the Valu E of field is equal to that b
fielda
These conditions is worded differently, but they yield the same result and database systems is aware of that. Usually both these queries is optimized to use the same execution plan.
The former syntax is called ANSI syntax, and it's generally considered more readable and are recommended to use.
However, it didn ' t get into the Oracle until recently, that's why there is many hardcore Oracle Developers That is just used to the latter syntax.
Actually, it ' s a matter of taste.
To use JOIN
s (with whatever syntax), both sets is joining must be self-sufficient, I. E. The sets should not depend on all other. You can query both sets without ever knowing the contents on another set.
But for some tasks the sets is not self-sufficient. For instance, let ' s consider the following query:
We table table1
and table2
. Has table1
a column
called rowcount
.
For each row from table1
the We need to select first rowcount
rows from table2
, ordered bytable2.id
We cannot formulate a join condition here. The join condition, should it exists, would involve the row number, which is isn't present table2
in, and there is no-to Calculate a row number is only from the values of columns of any given row in table2
.
That ' s where the CROSS APPLY
can is used.
CROSS APPLY
is a Microsoft's extension to SQL, which were originally intended to being used with table-valued functions (T VF' s).
The query above would look like this:
Select *from table1cross APPLY (select TOP (table1.rowcount) *from table2order Byid) T2
For each from table1
, select first table1.rowcount
rows from table2
ordered byid
The sets here is not self-sufficient:the query uses values from table1
to define the second set, and not to with JOIN
it.
The exact contents of t2
known until the corresponding row from is table1
selected.
I previously said that there is no-to join these-sets, which is true as long as we consider the sets as is. However, we can change the second set a little so, we get a additional computed field we can later join on.
The first option to do, just count all preceding rows in a subquery:
Select *from table1 t1join (select t2o.*, (select COUNT (*) from table2 t2iwhere t2i.id <= t2o.id) as Rnfrom table2 t2o) t2on t2.rn <= t1.rowcount
The second option is to use a window function, also available in SQL Server since version 2005:
Select *from table1 t1join (select t2o.*, Row_number () over (ORDER by id) as Rnfrom table2 t2o) T2on t2.rn <= t1.rowcount
This functions returns the ordinal number a row would has being the ORDER BY
condition used in the function applied to the Whol e query.
This was essentially the same result as the subquery used in the previous query.
Now, let's create the sample tables and check all these solutions for efficiency:
SET NOCOUNT ongodrop table [20090716_cross].table1drop table [20090716_cross].table2drop SCHEMA [20090716_cross] Gocreate SCHEMA [20090716_cross]create table table1 (ID INT NOT NULL PRIMARY Key,row_count int. NOT NULL) CREATE TABLE table2 (ID INT NOT NULL PRIMARY key,value VARCHAR (a) NOT null) Gobegin transactiondeclare @cnt Intset @cnt = 1WHILE @cnt <= 100000BEGININSERTINTO [20090716_cross].table2 (ID, VA Lue) VALUES (@cnt, ' Value ' + CAST (@cnt as VARCHAR)) SET @cnt = @cnt + 1ENDINSERTINTO [20090716_cross].table1 (ID, R Ow_count) SELECT TOP 5id, id% 2 + 1FROM [20090716_cross].table2order Byidcommitgo
table2
Contains 100,000 rows with sequential id
s.
table1
Contains the following:
ID |
Row_count |
1 |
2 |
2 |
1 |
3 |
2 |
4 |
1 |
5 |
2 |
Now let's run the first query (with COUNT
):
Select *from [20090716_cross].table1 t1join (select t2o.*, (select COUNT (*) from [ 20090716_cross].table2 t2iwhere t2i.id <= t2o.id) as Rnfrom [20090716_cross].table2 t2o] T2ON t2.rn <= T1.row_countorder Byt1.id, t2.id
ID |
Row_count |
ID |
value |
RN |
1 |
2 |
1 |
Value 1 |
1 |
1 |
2 |
2 |
Value 2 |
2 |
2 |
1 |
1 |
Value 1 |
1 |
3 |
2 |
1 |
Value 1 |
1 |
3 |
2 |
2 |
Value 2 |
2 |
4 |
1 |
1 |
Value 1 |
1 |
5 |
2 |
1 |
Value 1 |
1 |
5 |
2 |
2 |
Value 2 |
2 |
8 rows fetched in 0.0000s (498.4063s) |
Table ' table1 '. Scan count 2, logical reads 200002, physical reads 0, Read-ahead reads 0, LOB logical reads 0, LOB physical reads 0, LOB R Ead-ahead reads 0. Table ' worktable '. Scan count 100000, logical reads 8389920, physical reads 0, Read-ahead reads 0, LOB logical reads 0, LOB physical reads 0, LOB Read-ahead reads 0. Table ' worktable '. Scan count 0, logical reads 0, physical reads 0, Read-ahead reads 0, LOB logical reads 0, LOB physical reads 0, LOB Read-a Head reads 0. Table ' table2 '. Scan Count 4, logical reads 1077, physical reads 0, Read-ahead reads 0, LOB logical reads 0, LOB physical reads 0, LOB Rea D-ahead reads 0. SQL Server execution times: CPU time = 947655 MS,
This query, as were expected, is very unoptimal. It runs for more than seconds.
Here's the query plan:
SELECT Sort Compute Scalar Parallelism (Gather Streams) Inner join (Nested Loops) Inner Join ( Nested Loops) Clustered Index Scan ([20090716_cross].[ Table2]) Compute Scalar Stream Aggregate Eager Spool Clustered Index Scan ([20090716_cross].[ Table2]) Clustered Index Scan ([20090716_cross].[ Table1])
For each row selected from table2
, it counts all previous rows again an again, never recording the intermediate result. The complexity of such an algorithm O(n^2)
are, that's why it takes so long.
Let ' s run he second query, which uses ROW_NUMBER()
:
Select *from [20090716_cross].table1 t1join (select t2o.*, Row_number () over (ORDER by id) as Rnfrom [20090716_cross].table2 t2o] t2on t2.rn <= t1.row_countorder byt1.id, t2.id
ID |
Row_count |
ID |
value |
RN |
1 |
2 |
1 |
Value 1 |
1 |
1 |
2 |
2 |
Value 2 |
2 |
2 |
1 |
1 |
Value 1 |
1 |
3 |
2 |
1 |
Value 1 |
1 |
3 |
2 |
2 |
Value 2 |
2 |
4 |
1 |
1 |
Value 1 |
1 |
5 |
2 |
1 |
Value 1 |
1 |
5 |
2 |
2 |
Value 2 |
2 |
8 rows fetched in 0.0006s (0.5781s) |
Table ' worktable '. Scan count 1, logical reads 214093, physical reads 0, Read-ahead reads 0, LOB logical reads 0, LOB physical reads 0, LOB R Ead-ahead reads 0. Table ' table2 '. Scan count 1, logical reads 522, physical reads 0, Read-ahead reads 0, LOB logical reads 0, LOB physical reads 0, LOB read -ahead reads 0. Table ' table1 '. Scan count 1, logical reads 2, physical reads 0, Read-ahead reads 0, LOB logical reads 0, LOB physical reads 0, LOB Read-a Head reads 0. SQL Server execution times: CPU time = 578 MS,
This was much faster, only 0.5 ms.
Let's look into the query plan:
SELECT Inner Join (Nested Loops) Clustered Index Scan ([20090716_cross].[ Table1]) Lazy Spool Sequence Project (Compute scalar) Compute scalar Segment Clustered Index Scan ([20090716_cross]. [Table2])
This was much better, since this query plan keeps the intermediate results while calculating the ROW_NUMBER
.
However, it still calculates s for all of ROW_NUMBER
100,000
rows table2
in, then puts them to a temporary index over rn
creat Ed Lazy Spool
by, and uses this index in a nested loop to range the rn
s for each row from table1
.
Calculating and indexing all ROW_NUMBER
s are quite expensive, that's why we see 214,093 Logical reads in the query stat Istics.
Finally, let ' s try a CROSS APPLY
:
Select *from [20090716_cross].table1 t1cross APPLY (SELECT TOP (t1.row_count) *from [20090716_ Cross].table2order Byid) T2order byt1.id, t2.id
ID |
Row_count |
ID |
value |
1 |
2 |
1 |
Value 1 |
1 |
2 |
2 |
Value 2 |
2 |
1 |
1 |
Value 1 |
3 |
2 |
1 |
Value 1 |
3 |
2 |
2 |
Value 2 |
4 |
1 |
1 |
Value 1 |
5 |
2 |
1 |
Value 1 |
5 |
2 |
2 |
Value 2 |
8 rows fetched in 0.0004s (0.0008s) |
Table ' table2 '. Scan Count 5, logical reads, physical reads 0, Read-ahead reads 0, LOB logical reads 0, LOB physical reads 0, LOB read- Ahead reads 0. Table ' table1 '. Scan count 1, logical reads 2, physical reads 0, Read-ahead reads 0, LOB logical reads 0, LOB physical reads 0, LOB Read-a Head reads 0. SQL Server execution times: CPU time = 0 ms,
This query is instant, as it should be.
The plan is quite simple:
SELECT Inner Join (Nested Loops) Clustered Index Scan ([20090716_cross].[ Table1]) Top Clustered Index Scan ([20090716_cross].[ Table2])
For the row from table1
, the It just takes first row_count
rows from table2
. And so fast.
Summary:
While most queries which employ CROSS APPLY
can is rewritten using an INNER JOIN
, CROSS APPLY
can yield better execution plan and better PE Rformance, since it can limit the set being joined yet before the join occurs.
INNER JOIN vs. Cross APPLY