Develop a good SQL habit and bring a great fortune

Last Update:2016-03-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

We do software development, most people are inseparable from dealing with the database, especially the development of ERP, and database dealing with more frequent, storage process is thousands of rows, if the volume of data large, people flow, then I can also guarantee the next period of time the system can still run smoothly? So can the next person be able to understand my stored procedures? Then I combined with the company's usual training and personal work experience and share with you, I hope to help everyone.

To know the SQL statements, I think we need to know how SQL Server Query Analyzer executes the statement I have, a lot of people will look at the execution plan, or use profile to monitor and tune the query statement or the reason for the slow stored procedure, but if we know the query parser execution logical order, When the hands of the time to be confident, then the start is not a certain point?

One: The logical execution order of the query

(1) from < left_table>

(2) on < join_condition>

(3) < join_type> join < right_table>

(4) WHERE < where_condition>

(5) GROUP by < group_by_list>

(6) with {cube | rollup}

(7) Having < having_condition>

(8) SELECT (9) DISTINCT (one) < top_specification> < select_list>

(Ten) ORDER by < order_by_list>

The parsing order of standard SQL is:

(1). The FROM clause assembles data from different data sources

(2). The WHERE clause filters records based on the specified criteria

(3). The GROUP BY clause divides the data into multiple groupings

(4). Using aggregate functions for calculations

(5). Use the HAVING clause to filter the grouping

(6). Calculate All expressions

(7). Use order by to sort the result set

Two execution order:

1.FROM: Performs a cartesian product of the first two tables in the FROM clause to generate a virtual table VT1

2.ON: Apply on filter to VT1 table only rows that meet < join_condition> are inserted VT2

3.OUTER (Join): Adds rows that are not found in the OUTER join reservation table (preserved table) as outer rows to the VT2 Generate T3 If the from contains more than two tables repeat steps and steps for the result table and the next table generated by the previous junction directly end

4.WHERE: Apply a WHERE filter to vt3 only rows that make < where_condition> true are inserted VT4

5.GROUP by: Grouping rows in VT4 by the list of columns in the GROUP BY clause VT5

6.cube| ROLLUP: Inserts a Hyper-group (supergroups) into the VT6 generation VT6

7.HAVING: Apply a have filter to vt6 only groups that make < having_condition> true are inserted vt7

8.SELECT: Process SELECT list Generation Vt8

9.DISTINCT: Remove duplicate rows from vt8 to produce VT9

10.ORDER by: Sorts the rows of VT9 by the column list in the ORDER BY clause to generate a cursor VC10

11.TOP: Select a specified number or proportion of rows from the beginning of VC10 to generate vt11 and return the caller

See here, then used Linqtosql grammar a bit similar ah? If we understand the order of SQL Server execution, then we will further develop the daily good habit, that is, in the implementation of the function of the thinking of performance, the database is a tool to perform set operations, We should try our best to use this tool, the so-called set operation is actually a batch operation, is to minimize the large amount of data in the client loop operation, and the use of SQL statements or stored procedures instead.

Iii. return only the data you need

The return data to the client needs at least the database extracts data, the network transmits the data, the client receives the data as well as the client processing data and so on, if returns the unnecessary data, will increase the server, the network and the client invalid labor, its harm is obvious, avoids this kind of event to need to notice:

A, Horizontal view

(1) Instead of writing the SELECT * statement, select the field you want.

(2) When you concatenate multiple tables in an SQL statement, use the alias of the table and prefix the alias to each column. This reduces the time to parse and reduces the syntax errors caused by column ambiguity.

If there are tables table1 (id,col1) and table2 (id,col2)
Select a.id, A.col1, b.col2
--Select a.ID, col1, col2– do not write this, not conducive to future program expansion
From table1 A inner joins table2 B on a.id=b.id Where ...

B, longitudinal view

(1) write the WHERE clause reasonably, and do not write the SQL statement without where.

(2) SELECT TOP N *-Use this substitution without a where condition

Four: Try to do less repetitive work

A, control the execution of the same statement multiple times, especially the number of basic data execution is rarely noticed by many programmers.

B, reduce the number of data conversion, may require data conversion is a design problem, but the reduction is the programmer can do.

C, eliminate unnecessary sub-queries and join tables, sub-query in the execution plan is generally interpreted as an external connection, redundant connection table brings additional overhead.

D. Combine multiple update of the same condition on the same table, such as

UPDATE EMPLOYEE SET fname=' haiwer '
WHERE emp_id=' vpa30890f ' UPDATE EMPLOYEE SET lname=' YANG '
WHERE emp_id=' vpa30890f '

These two statements should be merged into one of the following statements

UPDATE EMPLOYEE SET fname= ' haiwer ', lname= ' YANG ' WHERE emp_id= ' vpa30890f '

E, update operation do not split into the form of delete operation +insert operation, although the function is the same, but the performance difference is very large.

V. Note the use of temporary tables and table variables

In complex systems, temporary tables and table variables are difficult to avoid, and for the use of temporal tables and table variables, it is important to note:

A, if the statement is complex and there are too many connections, consider stepping through the temporary table and table variables.

B, if you need to use the same part of a large table multiple times, consider staging this part of the data with temporary tables and table variables.

C, if you need to synthesize data from multiple tables to form a result, consider using temporal tables and table variables to summarize data for these tables in steps.

D, in other cases, you should control the use of temporary tables and table variables.

E, on the choice of temporary tables and table variables, many of the statements are table variables in memory, fast, should prefer table variables, but in actual use found that

(1) The main consideration is the amount of data that needs to be placed on the temporary table, and the temporary table will be faster than the amount of data.

(2) Execution time period and expected execution time (how long)

F, about temporary table generation using SELECT INTO and CREATE TABLE + INSERT into selection, in general,

The SELECT into will be much faster than the Create TABLE + INSERT into method.

However, select into locks the system tables sysobjects, sysindexes, and syscolumns in tempdb, which can easily block other processes in a multiuser concurrency environment.

So my advice is, in a concurrent system, try to use create TABLE + INSERT into, while a single statement of large data is used, using SELECT INTO.

Vi. usage of subqueries

A subquery is a select query that is nested within a SELECT, INSERT, UPDATE, DELETE statement, or other subquery.

Subqueries can be used wherever expressions are allowed, and subqueries make our programming flexible and can be used to implement special functions. But on the performance,

Often an inappropriate subquery usage can create a performance bottleneck. If a subquery's condition uses a field of its outer table, this seed query is called a correlated subquery.

Correlated subqueries can be introduced with in, not in, EXISTS, not EXISTS. For related subqueries, it should be noted that:

(1)

A, not in, not EXISTS related subquery can use the left join instead of the notation.

For example: Select Pub_name from publishers where pub_id not in (select pub_id from TITLES WHERE TYPE = ' business ') can be rewritten as: Select A.pub_name from Publishers A, left JOIN, TITLES B on b.type = "Business" and a.pub_id=b. pub_id WHERE b.pub_id is NULL

(2)

SELECT TITLE from TITLES
WHERE not EXISTS
(SELECT title_id from SALES
WHERE title_id = TITLES. title_id)

Can be changed to write:

SELECT TITLE
From the TITLES left JOIN SALES
On SALES. title_id = TITLES. title_id
WHERE SALES. title_id is NULL

B, if guaranteed subqueries are not duplicated, in, exists related subqueries can be replaced with inner JOIN. Like what:

SELECT pub_name
From publishers
WHERE pub_id in
(SELECT pub_id
From TITLES
WHERE TYPE = ' business ')

Can be changed to write:

SELECT a.pub_name --select DISTINCT a.pub_name
From publishers A INNER JOIN TITLES B
On b.type = "Business" and
A.pub_id=b. pub_id

(3)

The related subqueries of C, in are replaced by exists, for example

SELECT pub_name from Publishers
WHERE pub_id in
(SELECT pub_id from the TITLES WHERE TYPE = ' business ')

You can use the following statement instead:

SELECT pub_name from Publishers WHERE EXISTS
(SELECT 1 from the TITLES WHERE TYPE = ' business ' and
Pub_id= Publishers. pub_id)

D, do not use the subquery of Count (*) to determine whether there is a record, preferably with a left join or exists, such as someone to write such a statement:

SELECT Job_desc from JOBS
Where (SELECT COUNT (*) from EMPLOYEE WHERE job_id=jobs. job_id) =0

Should be changed to:

SELECT JOBS. Job_desc from JOBS left JOIN EMPLOYEE
On EMPLOYEE. Job_id=jobs. job_id
WHERE EMPLOYEE. emp_id is NULL
SELECT Job_desc from JOBS
Where (SELECT COUNT (*) from EMPLOYEE WHERE job_id=jobs. job_id) <>0

Should be changed to:

SELECT Job_desc from JOBS
where EXISTS (SELECT 1 from EMPLOYEE WHERE job_id=jobs. job_id)

Seven: Try to use the index

After indexing, not every query uses an index, and the efficiency of indexing can vary greatly when using indexes. As long as we do not enforce the specified index in the query statement,

The selection and use of an index is an automatic choice for the SQL Server optimizer, and it is selected based on the conditions of the query statement and the statistics of the related table, which requires us to write the SQL

When you try to make the optimizer use the index. In order for the optimizer to use the index efficiently, it should be noted when writing statements:

A, do not operate on the indexed fields, but want to do a change, such as

SELECT ID from T WHERE num/2=100

should read:

SELECT ID from T WHERE num=100*2

-------------------------------------------------------

SELECT ID from T WHERE num/2=num1

If Num has an index, it should read:

SELECT ID from T WHERE num=num1*2

If the NUM1 has an index, it should not be changed.

--------------------------------------------------------------------

You have found a statement like this:

SELECT year, month, amount from balance table WHERE 100* year + month =2010*100+10

should read:

SELECT year, month, amount from balance table WHERE year =2010 and month =10

B. Do not format the indexed fields

Examples of date fields:

WHERE CONVERT (VARCHAR (10), date field, 120) = ' 2010-07-15 '

should read

Where date field 〉= ' 2010-07-15 ' and date field < ' 2010-07-16 '

Examples of IsNull conversions:

where ISNULL (field, ') <> ' should be changed to: where Field <> '

WHERE ISNULL (field, ') = ' should not be modified

where ISNULL (field, ' F ') = ' t ' should be changed to: where field = ' t '

WHERE ISNULL (field, ' F ') <> ' t ' should not be modified

C, do not use functions for indexed fields

where left (NAME, 3) = ' abc ' or where SUBSTRING (name,1, 3) = ' abc '

Should read: WHERE NAME like ' abc% '

Examples of date queries:

WHERE DATEDIFF (day, date, ' 2010-06-30 ') =0

Should read: WHERE date >= ' 2010-06-30 ' and date < ' 2010-07-01 '

WHERE DATEDIFF (day, date, ' 2010-06-30 ') >0

Should read: WHERE date < ' 2010-06-30 '

WHERE DATEDIFF (day, date, ' 2010-06-30 ') >=0

Should read: WHERE date < ' 2010-07-01 '

WHERE DATEDIFF (day, date, ' 2010-06-30 ') <0

Should read: WHERE date >= ' 2010-07-01 '

WHERE DATEDIFF (day, date, ' 2010-06-30 ') <=0

Should read: WHERE date >= ' 2010-06-30 '

D, do not make multi-field connections to indexed fields

Like what:

WHERE fame+ '. ' +lname= ' Haiwei. YANG '

should read:

WHERE fname= ' Haiwei ' and lname= ' YANG '

Eight: The connection condition of the multi-table connection is of great significance to the selection of the index, so we need to pay special attention when we write the condition of the connection.

A, multi-table connection, the connection conditions must be written in full, rather repeat, do not leave gaps.

B, connection conditions use clustered index as far as possible

C, notice the difference between on, where and having partial conditions

On is the first execution, where the second, having the last, because on is the non-qualified records filtered before the statistics, it can reduce the intermediate operation to process the data, it should be said that the speed is the fastest, where also should be faster than having, Because it filters the data before it does sum, it is used on when two tables are joined, so in a table, there is a comparison between where and having.

1 Consider the join precedence order:

2INNER JOIN

3LEFT Join (Note: Right join replaces with left join)

4CROSS JOIN

Other areas of attention and understanding are:

A, in the face value of the list, will appear the most frequent values on the front, the least appear in the last face, reduce the number of judgments

B, note the difference between Union and union all. --Allow duplicate data with union ALL good

C, pay attention to the use of distinct, do not use when not necessary

D, TRUNCATE TABLE and DELETE difference

E, reduce the number of access to the database

There is we write stored procedures, if long, and finally with the mark descriptor open, because it is very readable, even if the statement is not very good, but the statement is neat, C # has region

SQL I prefer to use is

--startof Enquiry for Number of employees
SQL statements
--end of

On the formal machine we generally can not easily debug the program, but many times the program in our machine is not a problem, but into the formal system there is a problem, but we can not casually operate on the formal machine, then how to do? We can debug our stored procedures or SQL statements with rollback, thus troubleshooting.

BEGIN TRAN
UPDATE a SET field =' '
ROLLBACK

Job stored procedures I usually add the following paragraph, so that the check error can be placed in the stored procedure, if the error rollback operation, but if there is already a transaction rollback in the program, then the stored procedure does not write the transaction, which will cause the transaction rollback nesting to reduce the efficiency of execution, But most of the time we can put the check in the stored procedure, which helps us to interpret the stored procedure and debug the error.

BEGIN TRANSACTION
--Transaction rollback begins
--Check the error
IF (@ @ERROR > 0)
BEGIN
--Rollback operation
ROLLBACK TRANSACTION
RAISERROR (' Delete work report error ', 3)
RETURN
END
--End transaction
COMMIT TRANSACTION

Long time no blog, work items one by one, coupled with the company's staff flow, the newcomer many things can not come down, overtime became a commonplace, rushed to write these hope to everyone has help, not to also welcome guidance, exchanges and improve each other.

There are mistakes in the place to welcome everyone to shoot bricks, hope to exchange and share.

Original link: http://www.cnblogs.com/MR_ke/archive/2011/05/29/2062085.html

Develop a good SQL habit and bring a great fortune

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Develop a good SQL habit and bring a great fortune

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support