INNER join vs. LEFT JOIN in SQL Server performance

Last Update:2016-03-11 Source: Internet

Author: User

Tags table definition what sql

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I created the Inner JOIN 9 table, which takes a long time (more than five minutes) anyway. So, my folk song changed inner join to the left join the left join performance is better, for the first time despite what I know. Then I changed, and the speed of the query increased significantly. I want to know why the left join is faster than inner join? I look like this: SELECT * FROM A INNER JOIN B ON ... INNER JOIN C ON ... INNER JOIN D so there is no update: This is my simple architecture.

From Sidisaleshdrmly A-NO has PK and FK INNER JOIN sidisalesdetmly B--this TABLE ALSO has NO PK and FK on A.compa NYCD = B.COMPANYCD and A.sprno = B.sprno and a.suffixno = b.suffixno and A.dnno = B.dnno INNER JOIN exfslipdet H --PK = COMPANYCD, Fslipno, Fslipsuffix, fslipline on A.COMPANYCD = H.COMPANYCD and A.sprno = H.acctsprno INNER JOIN ex FSLIPHDR C--PK = COMPANYCD, Fslipno, fslipsuffix on C.COMPANYCD = H.COMPANYCD and C.fslipno = H.fslipno and C.FSli Psuffix = H.fslipsuffix INNER JOIN comappingexpparty D--NO PK and FK on C.COMPANYCD = D.COMPANYCD and C.COUNTRYCD =  D.COUNTRYCD INNER JOIN coproduct e--PK = COMPANYCD, productsalescd on B.COMPANYCD = E.COMPANYCD and B.PRODUCTSALESCD = E.productsalescd LEFT Join Couom i--PK = uomid on h.uomid = I.uomid INNER JOIN coproductoldinformation J--PK = C OMPANYCD, Bfstatus, speccd on A.COMPANYCD = J.COMPANYCD and B.bfstatus = j.bfstatus and B.PRODUCTSALESCD = J.ProductS ALESCD INNER JOIN CoprodUctGroup1 G1--PK = COMPANYCD, PRODUCTCATEGORYCD, Useddepartment, productgroup1cd on e.productgroup1cd = G1. PRODUCTGROUP1CD INNER JOIN coProductGroup2 g2-PK = COMPANYCD, PRODUCTCATEGORYCD, Useddepartment, PRODUCTGROUP2CD on E. PRODUCTGROUP1CD = G2. Productgroup1cd

This article address: Codego.net/150174/
------------------------------------------------------------------------------------------ -------------------------------&NBSP;
1. a left join is not faster than the INNER join . In fact, it's slower, and by definition, an outer join ( left JOIN or right join ) has to do all the work INNER JOIN Empty extension results extra work. It will also be expected to return more rows, further increasing the overall execution simply due to the larger scale of the result set. (Moreover, even if a left JOIN is a faster cause in a particular case there are many difficult to imagine confluence, it is not equivalent to INNER join on the function.) Codego.net, so you can't simply go to replace one with all the other instances! The most likely performance problem is elsewhere, such as the absence of a candidate key or an appropriate index foreign key. The 9 table is quite a lot that can be added to such a slowdown literally almost anywhere. If you publish your model, we may be able to provide more details. Edit: Further reflected in this, I can think of the next plot left join may be faster than a INNER join , that is, when: The table is very small (for example, in 10 rows); The table does not have enough indexes to overwrite the query. Consider this example:

CREATE TABLE #Test1( ID int NOT NULL PRIMARY KEY, Name varchar(50) NOT NULL)INSERT #Test1 (ID, Name) VALUES (1, ‘One‘)INSERT #Test1 (ID, Name) VALUES (2, ‘Two‘)INSERT #Test1 (ID, Name) VALUES (3, ‘Three‘)INSERT #Test1 (ID, Name) VALUES (4, ‘Four‘)INSERT #Test1 (ID, Name) VALUES (5, ‘Five‘)CREATE TABLE #Test2( ID int NOT NULL PRIMARY KEY, Name varchar(50) NOT NULL)INSERT #Test2 (ID, Name) VALUES (1, ‘One‘)INSERT #Test2 (ID, Name) VALUES (2, ‘Two‘)INSERT #Test2 (ID, Name) VALUES (3, ‘Three‘)INSERT #Test2 (ID, Name) VALUES (4, ‘Four‘)INSERT #Test2 (ID, Name) VALUES (5, ‘Five‘)SELECT *FROM #Test1 t1INNER JOIN #Test2 t2ON t2.Name = t1.NameSELECT *FROM #Test1 t1LEFT JOIN #Test2 t2ON t2.Name = t1.NameDROP TABLE #Test1DROP TABLE #Test2

If you run this and look at the execution plan, you will seeINNER JOINQuery Real cost ratioLEFT JOIN, which satisfies both of the above conditions. This is what SQL Server wants to do with a hash matchINNER JOIN, but not nested loopsLEFT JOIN; The result of a hash operation that is usually much faster, but because the number of rows is so small and not indexed is the most expensive part of the query. You can see the effect by writing a program in your favorite programming language with a hash table of 5 to 5 size, and the hash table version is actually slow. But it increases to 50 or 5000 and the version of the list is slow like a snail, it's O (N) with an O (1) hash table. However, changing this query isIDcolumn, notNameYou will see a completely different story. In this case, it nests a loop of two queries, butINNER JOINThe version can replace the index scan with one of the seeking-which means that this is simply an order of magnitude faster with a large number of rows. So more or less on what several paragraphs above, it is almost certain that the index or index covers the problem with one or more very small tables. These are the only cases where SQL Server may choose a worse execution planINNER JOINThanLEFT JOIN。
2.It is possible to cause an outer join to join not yet another important scene faster than inside. Outside the connection, the optimization has been free from the execution plan to drop the outer join table, if the connection column is the PK in the external table, and no column is selected from the external table. For exampleSELECT A.* FROM A LEFT OUTER JOIN B ON A.KEY=B.KEYAnd B.key is PK B. Two Oracle (I believe I release 10) and SQL Server (R2) from the Execution Plan trimb table. is not necessarily the true inner join:SELECT A.* FROM A INNER JOIN B ON A.KEY=B.KEYMay or may not require party B to be in the execution plan based on what constraints exists. If A.key is an empty foreign key reference B.key, then the optimizer cannot from it must confirm that a B row exists, and each row is scheduled to fall B. If A.key is a mandatory foreign key reference B.key, then the optimizer is free to guarantee the existence of the line from the constraints of the plan to reduce B. However, only the optimizer can delete the table from the plan, without pressing ' yes '. SQL Server 2008 R2 does not fall from the plan to the B. Oracle 10 did not fall from the plan to the B. It is easy to see how the external connection will be in performance in this case, on the internal join of SQL Server. This is a simple example, not an actual independent query. Why join to a table if you don't need it? While designing the point of view, this may be a very important design consideration. Often the idea of "doing everything" is to build a connection where everything may need to involve CCTV. (especially if there are adhoc queries that do not understand the relational model) views can include all the corresponding and columns from multiple tables. However, a subset of tables from the view may only access columns. If the table is connected to an outer join, then the optimizer can (not) drop the table that the person needs from the plan. The key is to make sure that the outer join gives the correct results. As Aaronaught says-you cannot replace the results of the outer join inner join and expect. However, the point of view for performance reasons is when it can. Last note-I have not tested the impact on performance, given the above situation, but in theory it seems that you should be able to safely replace inner join with outer join if you have also added state <FOREIGN_KEY> is not empty place
3.I have known several cases where the left connection has been faster than the internal connection speed. The root cause I can think of is this: if you have two tables with your index (two tables) in the column. The inner join will produce the result regardless of whether you iterate over one of the tables above and match the entries in the Index table to the two indexes, if you would do the opposite: two in the index table of entries in the loop, and with one of the exponential tables. The problem is that when you have the old stats, I query the statistics of the optimized indexes to find the table with at least matching entries (depending on your other criteria). If you have two tables in each of 1 million, one in the table has 10 rows that match and two in the table have 100000 rows to match. The best way to do this is to make an index scan of one of the tables and match the two 10 in the table. Instead it will be an index scan that traverses 100000 rows and tries to match 100000, only 10 so if the stats are incorrect the optimizer may choose the wrong table and index traversal. If the optimizer chooses the best left connection in it it is written that the command will perform better than the inner connection. However, the optimizer can also optimize left-side joins that are optimized for left connections. To make a selection you want one of the force order hints.
4.Try these two queries (one with both internal and left connections) andOPTION (FORCE ORDER)The result after the end.OPTION (FORCE ORDER)is a query prompt that forces the optimizer to generate an execution plan with the order of connections that you provide in the query. IfINNER JOINStart executing as fast asLEFT JOINThat it was in a completely byINNER JOINs, the connection order is not important. This gives the free query optimization an order that it deems appropriate for the join, so the problem may depend on optimization. WithLEFT JOIN, this is not the case of changing the connection order will change the results of the query. The engine must follow the query you provide, which may be better than optimizing the order of connections. Do not know this answer your question, but I have been in this characteristic query to carry on the calculation of a project, of which the most optimization. We have a situation where aFORCE ORDERWill reduce the execution of a query from 5 minutes to 10 seconds.
5. your performance issues are more likely to be connected to what you are doing and whether the columns you are joining are indexed or unlimited in number. You can easily do 9 full table worst case will scan each affiliate.
6. has done some left external and internal connections and has not been able to find the difference between a consisten. There are many variables. I work in a reporting database that has a lot of tables with lots of fields, many changes in (providers version and local worker threads). It is not possible to create all the overwrite indexes such as the need for multiple queries and to process historical data. Have seen the inner layer of query kill server performance two large (million to tens of millions of rows) table is internally joined by two pull the large number of fields, and does not overwrite the existence of an index. But the biggest problem does not seem to be appeaer above. Perhaps your database is well-designed with triggers and well-designed deals processed to ensure good data. Mines often have, and they have no expected null values. Yes, the table definition can execute no null value, but not in my options so the problem is ... You design your query to run a one-minute code only on speed, transaction processing, and higher priority. Or you go to the accuracy that a left outer connection will provide. The internal connection must be found on both sides, so an unexpected void will not only remove the data from the two tables, but it is possible that the entire line and it happens so nicely, not that you can get 90% of the required data very quickly, and that there is no internal connection to have a silent internal connection can be faster, but I do not believe anyone to make such assumptions Unless they have reviewed the execution plan. Speed is important, but accuracy is more important.

INNER Join and LEFT JOIN performance in SQL Server

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More