Some small places that are easy to overlook in SQL Development (ii)

Last Update:2015-01-05 Source: Internet

Author: User

Tags joins

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original: Some small places that are easy to overlook in SQL Development (ii)

Purpose: Next: Some small places that are easy to overlook in SQL Development (a) after summing up the null usages in SQL, I will talk about table join queries .

To illustrate the problem, I created two tables, the Student Information table (student) and the class table (ClassInfo). Related field Description I create a script description with sql:

test Environment: SQL2005

CREATE TABLE [dbo]. [Student] (
[ID] [int] IDENTITY (*) Not NULL,
[sUserName] [NCHAR] (Ten) COLLATE chinese_prc_ci_as NULL,--Name
[Saddress] [varchar] (COLLATE) chinese_prc_ci_as NULL,--Address
[ClassID] [INT] NULL,--Class
[Create_date] [DateTime] NULL CONSTRAINT [df_student_create_date] DEFAULT (GETDATE ())--onboarding time
) on [PRIMARY]

Student table record: Insert 999999 rows of data. Can be said to be a moderate table.

CREATE TABLE [dbo]. [ClassInfo] (
[ClassID] [INT] IDENTITY (+) not NULL,--class ID
[Sclassname] [varchar] (COLLATE) chinese_prc_ci_as NULL,--class name
[Sinformation] [varchar] (COLLATE) chinese_prc_ci_as NULL,--class-related information
[Sdescription] [varchar] (COLLATE) chinese_prc_ci_as NULL,--class description
[Ischooling] [INT] NULL,--Tuition
CONSTRAINT [Pk_classinfo] PRIMARY KEY CLUSTERED
(
[ClassID] Asc
) with (Ignore_dup_key = OFF) on [PRIMARY]

) on [PRIMARY]

Class Table: A total of 100 lines inserted, there may not be so many classes.

Sample requirements: Query the basic information of students and the class name, we will be the first time to think about using the table association, here I listed the relevant implementation method.

First:Put the student table with a large amount of data in front.
--Big table in front
Select top A.susername,b.sclassname from Student a
INNER JOIN ClassInfo B on

A.classid=b.classid

Second:Put the class table with the smaller amount of data in front.
--small table in front
Select top A.susername,b.sclassname from ClassInfo b
Inner JOIN student A on

A.classid=b.classid

Third:Implemented in where.
--join and where
Select top A.susername,b.sclassname from ClassInfo B, student A
where A.classid=b.classid

Induction:The results of the above three-way queries are identical, but are they different in their efficiency? Here are the first two ideas on the Web:

Network point one: generally to make the database query statement performance good point to follow the principle: in the table and table connection query, large table in front, small table in the back.

Execution Plan effect one:

Network Point of view a conclusion: It can be very clear from the diagram that the three are done in the execution plan. Therefore, I do not agree with the network point of view. The table does not affect the final execution efficiency before and after. We have different opinions.
Description:
The connection statement used in the 1:where clause, in the database language, is called the implicit connection. A INNER join is called a dominant connection. There is no essential difference between where and inner joins, and the result is the same. But! With the standardization and development of the database language, the recessive connection has been eliminated gradually, and the new database language basically has abandoned the recessive connection, all adopt the explicit connection.
Categories of 2:join:
1> INNER JOIN: understood as a "valid connection",
2>left join: Understood as "have left display",
3> right Join: understood to be "show on display"
4> full join: Understood as "fully connected"

3. Join can be divided into primary and secondary table leftist is the main table on the left, supplemented on the right, opposite the right

Network View Two:is inner join the same as where in efficiency? Original address: Http://topic.csdn.net/t/20050520/13/4022440.html The following paragraph in the original text:

---------------------------Reference----------------------------------------------

4 Specifying a multi-table relationship can greatly improve speed, such as
SELECT a.x,b.y from A B WHERE a.x=b.x
SELECT a.x,b.y from A INNER JOIN B on a.x=b.x
2 sentences, but the speed difference is many, time complexity is O (2n) and O (n*n)
------------------------------------------------------------------------------
my point of view:The time complexity of the join query is not fixed, not to say that it is determined by two different ways of expression. Join in the query algorithm is divided into three different cases depending on the junction table:

First algorithm: NESTED LOOP:

Definition:Nested loops are a good choice for cases where the subset of data being connected is small. In a nested loop, the inner table is driven by the exterior, and each row returned by the surface is retrieved in the inner table to find the row that matches it, so the result set returned by the entire query cannot be too large (greater than 10,000 is not appropriate), and the smaller table of the returned subset is used as the appearance.
Example:With a class table on it, I'll create a class schedule,

CREATE TABLE [dbo]. [Course] (
[ID] [int] IDENTITY (*) Not NULL,
[Scoursename] [NCHAR] (Ten) COLLATE chinese_prc_ci_as NULL,--Course Name
[ClassID] [INT] NULL,--class ID
CONSTRAINT [Pk_ckh] PRIMARY KEY CLUSTERED
(
[ID] ASC
) with (Ignore_dup_key = OFF) on [PRIMARY]
) on [PRIMARY]
Business Requirements:Check the course status for all classes.
Select Scoursename,sclassname from ClassInfo a
INNER JOIN course B

On A.classid=b.classid

Execution Plan:

Conclusion:It is clear from the query execution plan that nested queries are used because both tables have a small amount of data and are of equal size. The query cost at this time is n*n

The second algorithm: HASH JOIN:

definition: A hash join is a common way to make a large dataset connection, using a smaller table (or data source) in two tables to create a hash in memory using the connection key, then scan the larger table and probe the hash list to find the rows that match the hash list. This approach applies to situations where smaller tables can be put entirely in memory, so the total cost is the sum of the costs of accessing two tables. However, when the table is very large and can not be completely put into memory, when the optimizer will split it into several different partitions, can not be put into the memory portion of the partition to write to the disk temporary segment, at this time to have a large temporary segment to maximize the performance of I/O.

Conclusion: It is possible to see that SQL joins the Class table and student table by using the hash join method, because the class table data is large and the class table data is small. This way the query time complexity is 2n.

Note:a hash join can easily become a nested loop, and the following query is a hash join

Select top 10000 a.id,a.susername,b.sclassname from ClassInfo b
Inner JOIN student A on

A.classid=b.classid

conversion (hash join variable nesteed loop):What if I add a sort to a later group? This is a nested query.
Select top 10000 a.id,a.susername,b.sclassname from ClassInfo b
Inner JOIN student A on
A.classid=b.classid
ORDER BY a.ID

conversion ( Nesteed Loop Change Hash Join ):The above nested query can be re-elected into a hash join

SELECT * FROM (
Select top 10000 a.id,a.susername,b.sclassname from ClassInfo b
Inner JOIN student A on
A.CLASSID=B.CLASSID) as TBL
ORDER BY ID

Third algorithm: Sort merge Connections

definition: The effect of a hash join is generally better than a sort merge connection, but if the row source has already been ordered and no more sorting is required to perform a sort merge connection, the performance of the sort merge connection is better than the hash join.

Network Point of view two conclusions:INNER JOIN and where are not different in query efficiency, just reflect the form.

Summary: We can analyze the performance of SQL by looking at the execution plan of SQL, whether the word is correct or not in the speaker, but in the practice of validating the results. I talk about the table connection of their own understanding, if there is not the right place also hope that the advice.

Note:

This article refers to:

http://blog.chinaunix.net/u1/46451/showart.php?id=415529

Some small places that are easy to overlook in SQL Development (ii)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More