SQL Server join method for querying records that are not in another table

Source: Internet
Author: User
Document directory
  • General Solution (low efficiency)
  • Efficiency Analysis
  • Use connection
  • Data Preparation
References

Http://hi.baidu.com/zdfgng/blog/item/dd5f88359a1cd0260b55a9ce.html

Question

How to query records that exist in Table A but do not exist in Table B. For ease of description, we assume that both table A and table B have only one field ID. The records in Table A are {1, 2, 3, 4, 5}, and the records in Table B are {2, 4 }, then we need to use an SQL query to obtain a result set such as {1, 3, 5.

General Solution (low efficiency)

The first thing that comes to mind when we see this question isNot inFor such keywords, the specific query statement is as follows:

select ta.* from ta where ta.id not in(select tb.id from tb)

The query result set of the preceding query statement is {1, 3, 5}. Run the preceding statement with navicat. The result is as follows:

Efficiency Analysis

However, after careful analysis, we can find that if Table B is very long, we need to use the fields in Table A to match every field in Table B to execute the preceding query statement, as a result, every field in Table A needs to traverse table B once, which is very inefficient.(As long as the field in Table A is not in Table B, it is necessary to traverse Table B. If the field in Table A is in Table B, as long as the field in Table B is traversed, it will exit and match the next field in table)

Use connection

Connection query is one of the most frequently used operations for SQL queries.Not inKeyword. Because we need to search for the content in Table A, we use table A to connect to table B on the left. In this way, table B will fill in null. The query statement is as follows:

select * from ta left join tb on ta.id=tb.id

The query results of the preceding query statement are as follows:

Because the IDs of fields in Table A and table B are the same, the ID field in Table B is changed to id1. After careful observation, we can find that the id1 field corresponding to the result set {1, 3, 5} is null. In this way, we can add a condition in the preceding query statement to complete the insertion of the result set in Table A but not in Table B. The query statement is as follows:

select * from ta left join tb on ta.id=tb.id where tb.id is null

Shows the query result:

However, we found that the above query results have two columns, namely the join query results of table A and Table B, but we only need the content in Table A, so we can slightly modify the above query:

select ta.* from ta left join tb on ta.id=tb.id where tb.id is null

Shows the query result:

The above are the query results we require.

Detailed description (PS: 2012-9-7) data preparation view code

Use TESTDB3--1. create Table, Heap Structure, Ta, and Big Table create table ta (id int); -- 2. create Table Tb (id int); -- 3. insert 10000 records to taset nocount on; godeclare @ I int; Set @ I = 1; while @ I <= 10000 begin insert into ta select @ I; set @ I = @ I + 1; end; go-4. insert a small amount of data into TB: insert into TB values (1); insert into TB values (111); insert into TB values (11); insert into TB values (11111111 ); insert into TB values (1222222 );

Looking at this question three months later, there are new discoveries. It was just half-known and half-understood before, after writing this blog on SQL Server join, I will basically understand the core of this question. The core is: What kind of join method should I use to query results.

The most intuitive tsql statement we can write is:

select ta.* from ta where ta.id not in(select tb.id from tb)

Then let's look at the query plan of this statement:

We can find that the nested loops join method is used, but we know that the use cases of the nested loop join method are as follows:It is suitable for joining two smaller result sets, or at least a smaller result set of outer table.The outer table above is ta, which is a large table, so it can be found that nested loop is not suitable. Note: although the preceding query statement does not contain the join field, join is still used.

If we use left join to write query statements, which join method does SQL Server choose? The test is as follows:

select ta.id from ta left join tb on ta.id=tb.id where tb.id is NULL--Hash Match

Shows the execution plan of the preceding query:

We can find that SQL Server helps us choose to useHash match. This is because in the above join, TA is a large table, and the data size gap between Ta and TB is large, and ta and TB are not indexed. From the totalsubtreecost of the execution plan, we can also see that totalsubtreecost = 0.12 using hash match, but totalsubtreecost = 1.03 using nested loop. It can be found that the hash match performance is much better than nest loop.

So what is the performance of using merge join? We recommend that SQL server use a specific connection method by using SQL hint and execute the following tsql statement:

select ta.id from ta left merge join tb on ta.id=tb.id where tb.id is NULL--Merge Join

Shows the execution plan:

We can see that:

  1. Because no index is available on the query column, the query results are not necessarily sorted, so SQL Server helps us sort the results.
  2. After sorting, the merge join operation is performed. The total query uses totalsubtreecost = 0.69, which is better than the nested loop and has poor performance than hash match.

Therefore, when answering the above question, we must describe the use of hash match instead of just giving the left join answer, SQL Server helps us analyze the optimal performance of hash match.

All query methods:

View code

select ta.* from ta where ta.id not in(select tb.id from tb)select ta.id from ta where ta.id not in(select tb.id from tb)--Nested Loopsselect ta.id from ta left loop join tb on ta.id=tb.id where tb.id is NULL--Nested Loopsselect ta.id from ta left merge join tb on ta.id=tb.id where tb.id is NULL--Merge Joinselect ta.id from ta left hash join tb on ta.id=tb.id where tb.id is NULL--Hash Match

 

 

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.