SQL server-Focus Inner JOIN and in performance analysis (14)

Source: Internet
Author: User
Tags create index

Objective

In this section we talk about the integration of integrated knowledge, we are in most tutorials or theoretical books are talking about which good, which performance is inferior to which performance, but really talk about the essence of the problem is not too much, so only the series of each article is not too much, but it must be my intention to find a lot of information and write, short content, In-depth understanding, always to review the basics.

Initial discussion of inner join and in performance analysis

Next we look at the first joint comprehensive knowledge to explain the comparative analysis of inner join and in, and we look at the inner join by creating a table.

Creating a Test Table 1

CREATE TABLE Table1  (ID INT IDENTITY PRIMARY KEY, Somecolumn char (4), Filler char ())

Inserting test data

Insert into Table1 (somecolumn) Values (1), (2), (3), (4), (5)

Create test table 2 and insert data

Use tsql2012gocreate TABLE Table2   int ) Insert into Table2 (intcol) Values (1), (2), (2), (3), (4 ), (5), (5)

Next we join the Somecolumn and Intcol in test table 1 and test table 2

* FROMTable1 b  = S.intcol

At this point we see that all two test tables return 7 rows of data because there is duplicate data in test table 2 that matches all of the data on all test table 1. Now let's look at the query in

* FROM Table1 WHERE somecolumn in (Select intcol from Table2)

Now return 5 data, from here we know inner join and in is still a big difference, but if there is no duplicate data in the Test table 2, and in the test table 2 does not need the column, then the query data and test table 1 is the same, at this time what difference in performance? Next we create a lot of data in the premise of testing to see.

Create two test tables

CREATE TABLE BigTable (id INT IDENTITY PRIMARY key,somecolumn uniqueidentifier not Null,filler CHAR ( ) CREATE TABLE smallertable (id INT IDENTITY PRIMARY key,lookupcolumn uniqueidentifier not null,somearbdate DATETIME DEFAULT GETDATE ())

Insert 1 million data in the BigTable table somecolumn column

INSERT into BigTable (somecolumn) SELECT NEWID () from dbo. Numswhere n<1000001

Remove 25% data from the bigtable into the smallertable table Lookupcolumn column

Use Tsql2012goinsert into smallertable (lookupcolumn) SELECT DISTINCT somecolumnfrom BigTable tablesample (  PERCENT)

Here we test in three different situations.

(1) Index comparison inner and join not established

= dbo. Smallertable.lookupcolumn

As you can see from the above, there is no difference in query overhead or IO, let's take a look at indexing now

(2) Establishing a non-unique nonclustered index comparison inner join and in

Create INDEX Idx_bigtable_somecolumn on BigTable (somecolumn) CREATE index idx_smallertable_lookupcolumn on smallertable (Lookupcolumn)

At this point, we find that in the case of non-unique nonclustered indexes, there is a big difference in query overhead, and the cost of INNER join is twice times of in and Io is almost equal.

(3) Establish unique nonclustered index comparison inner join and in

Create unique index idx_bigtable_somecolumn on BigTable (somecolumn) create unique index idx_smallertable_lookupcolumn on Smallertable (Lookupcolumn)

Why does the index become a unique clustered index when the performance cost is consistent? A little puzzled, at the same time to here is not to show that in the query performance is better than the performance of join, completely subvert our idea, in the preface we discussed in the tutorial will give most of the join than exists performance, and exists better than in performance, usually hands-on practice, Personal verification is the king, we can only draw a general conclusion: Generally speaking, join is better than exists, and exists is better than in performance. This is all a general case, and this series needs to tell you when you should use exists, when you should use join, and when you should use in, and the content will be discussed in succession. Well, a little off the mark, we have 1 million data to get in the performance of the inner join performance of twice times, completely beyond your expectations, with this question, and then we further explore.

Further discussion of inner join and in performance analysis

The above 25% of the data taken from the BigTable table in the Smallertable table are unique, and we will then set the portion of the 25% data as duplicates. We remove the data from the BigTable table somecolumn This column, and then set the data for the Lookupcolumn column in the smallertable table to repeat 10,000, as follows

' 0067cb6c-64e1-46cc-b7f2-334a7dd812ff ' WHERE ID>=1 and id<=10000

At this point we are querying for the 10,000 duplicates

= dbo. Smallertable.lookupcolumn

At this point the result or in performance is nearly half the performance of the inner join, and then we query the Smallertable table when the duplicate lookupcolumn column data is removed, when our query becomes as follows:

Use tsql2012goselect bigtable.id, Somecolumnfrom bigtablewhere somecolumn in (SELECT lookupcolumn from dbo. smallertable) Select Bigtable.id, Somecolumnfrom bigtableinner JOIN (SELECT DISTINCT lookupcolumn from dbo. smallertable= dbo. Bigtable.somecolumn

Finally the query cost and the above is not the same, at this time the query performance cost is the same, I believe here we should be very clear. We can derive the performance overhead of inner join and in by the above-mentioned large number of pages, and when we are initially exploring the performance analysis of inner join and in, when a non-unique clustered index is established, the in performance is close to twice times the inner join, And when it comes to creating a unique clustered index, the performance overhead is consistent, and it's a little bit puzzling that when we continue down the discussion we finally get to the point where we finally come to the inner join and in performance cost conclusions.

INNER join and in performance overhead conclusion: when the column data in the INNER join table is unique, the performance cost of INNER join and in is the same, when the column data in the INNER join table is duplicated, in which case the in performance is better INNER join.

Summarize

In this section we describe in detail the performance analysis of the inner join and in, and finally the consistency conclusion, the next section we start to discuss not exists and not in performance analysis, short content, in-depth understanding, we'll see you next, good night.

SQL server-Focus Inner JOIN and in performance analysis (14)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.