It is a fortune to develop SQL statements.

Source: Internet
Author: User
Tags sql server query
A good habit of developing SQL is bitsCN.com.

For software development, most people do not need to deal with databases, especially erp Development. it is even more frequent to deal with databases, and thousands of rows of stored procedures are motionless. if there is a large amount of data, there will be a large flow of personnel, so can I ensure that the system will run smoothly in the next period? Can I still ensure that the next person can understand my stored procedures? I would like to share with you my personal experience in training at ordinary times and hope to help you.

To know SQL statements, I think it is necessary for us to know how the SQL Server query analyzer executes our SQL statements. many of us will look at the execution plan, you can also use Profiler to monitor and optimize the query statement or the cause of slow storage process. However, if we know the execution logic sequence of the query analyzer, we are confident that we can start with it, are you sure you want to start?

  I. query logic execution sequence

(1) FROM left_table

(3) join_type JOIN right_table (2) ON join_condition

(4) WHERE where_condition

(5) group by group_by_list

(6) WITH {cube | rollup}

(7) HAVING having_condition

(8) SELECT (9) DISTINCT (11) top_specification select_list

(9) order by order_by_list

The standard SQL parsing sequence is:

(1) The FROM clause assembles data FROM different data sources.

(2) The WHERE clause filters records based on specified conditions.

(3) the group by Clause divides data into multiple groups.

(4) use aggregate functions for calculation

(5) Use the HAVING clause to filter groups

(6) calculate all expressions

(7) Use order by to sort the result set

  II. execution sequence

1. FROM: run Cartesian product on the first two tables in the FROM clause to generate virtual table vt1

2. ON: apply the ON filter to the vt1 table and insert vt2 only when the join_condition is true.

3. OUTER (join): if the row not found in the outer join reserved table (preserved table) is specified as an external row, add it to vt2 to generate t3. if the from contains more than two tables, repeat the steps and steps for the result table generated by the previous join and the next table to end directly.

4. WHERE: apply the WHERE filter to vt3. vt4 is inserted only when the where_condition is set to true.

5. group by: generates a vt5 instance BY grouping the rows in vt4 BY the column list in the group by clause.

6. CUBE | ROLLUP: inserts supergroups into vt6 to generate vt6.

7. HAVING: apply HAVING filter to vt6. only groups with having_condition set to true are inserted into vt7.

8. SELECT: process the select list to generate vt8

9. DISTINCT: remove duplicate rows from vt8 to generate vt9

10. order by: sorts the rows of vt9 by the column list in the order BY clause to generate a cursor vc10

11. TOP: Select a specified number or proportion of rows from the beginning of vc10 to generate vt11 and return to the caller

As shown in the preceding figure, the syntax for using Linq to SQL is similar? If we understand the execution sequence of SQL Server, then we will further develop the good habit of daily SQL, that is, to consider the performance while implementing functions, A database is a tool that can perform set operations. we should try to use this tool. The so-called set operation is actually a batch operation, that is, to reduce the number of large data cycles on the client, instead of SQL statements or stored procedures.

  3. only the required data is returned.

To return data to the client, you must at least extract data from the database, transmit data over the network, receive data from the client, and process data from the client. if no data is returned, it will increase invalid labor on servers, networks, and clients. the harm is obvious. to avoid such incidents, you must note:

A. horizontal view

(1) do not write the SELECT * statement, but SELECT the fields you need.

(2) when connecting multiple tables in an SQL statement, use the table alias and prefix the alias on each Column. In this way, the parsing time can be reduced and the syntax errors caused by Column ambiguity can be reduced.

1

Table 1 (ID, col1) and Table 2 (ID, col2)

1

Select A. ID, A. col1, B. col2

2

-- Select A. ID, col1, col2-do not write this statement, which is not conducive to future program expansion.

3

From table1 A inner join table2 B on A. ID = B. ID Where...

B. vertical view

(1) write the WHERE clause reasonably. do not write SQL statements without WHERE.

(2) select top n * -- use this method without the WHERE condition.

  4. do as little repetitive work as possible

A. control multiple executions of the same statement, especially the multiple executions of some basic data.

B. data conversion may be designed to reduce the number of data conversions, but it can be done by programmers.

C. eliminate unnecessary subqueries and connection tables. subqueries are generally interpreted as external connections in the execution plan, resulting in additional costs for redundant connection tables.

D. Merge multiple updates for the same table with the same condition, for example

Update employee set fname = 'haiwer'

WHERE EMP_ID = 'vpa30890f' update employee set lname = 'Yang'

WHERE EMP_ID = 'vpa30890f'

These two statements should be merged into the next statement.

Update employee set fname = 'haiwer ', LNAME = 'Yang' WHERE EMP_ID = 'vpa30890f'

E. do not split the UPDATE operation into the DELETE operation + INSERT operation. Although the functions are the same, the performance difference is great.

  5. use temporary tables and table variables

In complex systems, temporary tables and table variables are difficult to avoid. for usage of temporary tables and table variables, note the following:

A. If the statements are complex and have too many connections, you can use temporary tables and table variables for step-by-step execution.

B. If you need to use the same part of the data of a large table multiple times, use temporary tables and table variables to store the data.

C. If you need to combine the data of multiple tables to form a result, you can use temporary tables and table variables to summarize the data of these tables step by step.

D. in other cases, use of temporary tables and table variables should be controlled.

E. for the selection of temporary tables and table variables, many statements are that the table variables are in the memory and are fast. The table variables should be preferred. However, in actual use, we found that:

(1) the data volume that needs to be stored in the temporary table is mainly considered. when the data volume is large, the temporary table speed is faster.

(2) execution time period and estimated execution time (how long)

F. select into and create table + insert into for temporary tables. generally:

Select into is much faster than the create table + insert into method,

However, select into locks the system tables SYSOBJECTS, SYSINDEXES, and SYSCOLUMNS of TEMPDB. in a multi-user concurrency environment, it is easy to block other processes.

Therefore, I suggest using CREATE TABLE + INSERT INTO in a concurrent system, while using SELECT INTO in a single statement with a large data volume.

  VI. subquery usage

A subquery is a SELECT query nested in SELECT, INSERT, UPDATE, DELETE statements, or other subqueries.

Subqueries can be used wherever expressions are allowed. subqueries allow flexible programming and can be used to implement some special functions. However, in terms of performance, an inappropriate subquery may cause a performance bottleneck. If the sub-query condition uses the field of the outer table, this subquery is called the related subquery.

You can use IN, not in, EXISTS, and not exists to introduce related subqueries. Note the following for related subqueries:

(1) for subqueries not in and not exists, use left join instead. For example:

SELECT PUB_NAME from publishers where PUB_ID NOTIN (SELECT PUB_ID from titles where type = 'business ')

It can be rewritten:

Select a. PUB_NAME from publishers a leftjoin titles B on B. TYPE = 'business' and a. PUB_ID = B. PUB_ID where B. PUB_ID IS NULL

For example, not exists:

SELECT TITLE FROM TITLES

WHERE NOT EXISTS

(SELECT TITLE_ID from sales where TITLE_ID = TITLES. TITLE_ID)

1

It can be rewritten:

SELECT TITLE

FROM TITLES LEFTJOIN SALES

On sales. TITLE_ID = TITLES. TITLE_ID

Where sales. TITLE_ID ISNULL

2) If the subquery is not repeated, the subquery related to IN and EXISTS can be replaced by inner join. For example:

SELECT PUB_NAME

FROM PUBLISHERS

WHERE PUB_ID IN

(SELECT PUB_ID

FROM TITLES

Where type = 'business ')

1

It can be rewritten:

Select a. PUB_NAME -- select distinct a. PUB_NAME

FROM PUBLISHERS A INNERJOIN TITLES B

On B. TYPE = 'business' AND

A. PUB_ID = B. PUB_ID

(3) IN subqueries are replaced by EXISTS, for example:

SELECT PUB_NAME FROM PUBLISHERS

WHERE PUB_ID IN

(SELECT PUB_ID from titles where type = 'business ')

You can use the following statement instead:

SELECT PUB_NAME FROM PUBLISHERS WHERE EXISTS

(SELECT1FROM titles where type = 'business' AND

PUB_ID = PUBLISHERS. PUB_ID)

4) do not use the COUNT (*) subquery to determine whether a record EXISTS. it is best to use left join or EXISTS. for example, someone writes a statement like this:

SELECT JOB_DESC FROM JOBS

WHERE (SELECTCOUNT (*) from employee where JOB_ID = JOBS. JOB_ID) = 0

It should be changed:

Select jobs. JOB_DESC FROM JOBS LEFTJOIN EMPLOYEE

On employee. JOB_ID = JOBS. JOB_ID

Where employee. EMP_ID ISNULL

SELECT JOB_DESC FROM JOBS

WHERE (select count (*) from employee where JOB_ID = JOBS. JOB_ID) 0

It should be changed:

SELECT JOB_DESC FROM JOBS

WHEREEXISTS (SELECT 1 from employee where JOB_ID = JOBS. JOB_ID)

  7. try to use indexes

After an index is created, not every query uses an index. when an index is used, the efficiency of the index varies greatly. As long as the index is not forcibly specified in the query statement, the selection and use of the index is automatically selected by the SQL Server Optimizer, the selection is based on the conditions of the query statement and the statistical information of related tables. This requires that we try to make the optimizer use indexes when writing SQL statements. To enable the optimizer to efficiently use indexes, note the following when writing statements:

(1) do not perform operations on index fields, but try to perform transformations, for example:

Select id from t where num/2 = 100

Should be changed:

Select id from t where num = 100*2

Select id from t where num/2 = NUM1

If NUM has an index, change it:

Select id from t where num = NUM1 * 2

If NUM1 has an index, it should not be changed.

(2) the following statements have been found:

SELECT year, month, amount FROM balance table WHERE100 * year + month = 2010*100 + 10

1

It should be changed:

SELECT year, month, amount FROM balance table WHERE year = 2010 AND month = 10

(3) do not convert the format of index fields

Example of a date field:

WHERECONVERT (VARCHAR (10), date field, 120) = '2017-07-15'

Should be changed

WHERE date field> = '2014-07-15 'AND date field '2014-07-16'

Example of ISNULL conversion:

WHEREISNULL (field, '')'' should be changed to: WHERE field''

WHEREISNULL (field, '') ='' should not be modified

WHEREISNULL (field, 'F') = 'T' should be changed to: WHERE field ='t'

WHEREISNULL (field, 'F') 'T' should not be modified

(4) do not convert the format of index fields

WHERELEFT (NAME, 3) = 'abc' or where substring (NAME, 1, 3) = 'abc'

Should be changed to: where name like 'ABC %'

Example of Date query:

WHEREDATEDIFF (DAY, date, '2017-06-30 ') = 0

Should be changed to: WHERE Date = '2017-06-30 'AND date '2017-07-01'

WHEREDATEDIFF (DAY, date, '2017-06-30 ') 0

Should be changed to: WHERE date '2017-06-30'

WHEREDATEDIFF (DAY, date, '2017-06-30 ') = 0

Should be changed to: WHERE date '2017-07-01'

WHEREDATEDIFF (DAY, date, '2017-06-30 ') 0

Should be changed to: WHERE Date = '2017-07-01'

WHEREDATEDIFF (DAY, date, '2017-06-30 ') = 0

Should be changed to: WHERE Date = '2017-06-30'

(5) Do not use functions for indexed fields

Where left (NAME, 3) = 'abc' or where substring (NAME, 1, 3) = 'abc'

Should be changed to: where name like 'ABC %'

Example of Date query:

WHEREDATEDIFF (DAY, date, '2017-06-30 ') = 0

Should be changed to: WHERE Date = '2017-06-30 'AND date '2017-07-01'

WHEREDATEDIFF (DAY, date, '2017-06-30 ') 0

Should be changed to: WHERE date '2017-06-30'

WHEREDATEDIFF (DAY, date, '2017-06-30 ') = 0

Should be changed to: WHERE date '2017-07-01'

WHEREDATEDIFF (DAY, date, '2017-06-30 ') 0

Should be changed to: WHERE Date = '2017-07-01'

WHEREDATEDIFF (DAY, date, '2017-06-30 ') = 0

Should be changed to: WHERE Date = '2017-06-30'

(6) do not connect index fields with multiple fields

For example:

Where fame + '.' + LNAME = 'haiwei. yang'

Should be changed:

Where fname = 'haiwei' and lname = 'Yang'

  8. the join conditions for multi-table join are of great significance to the selection of indexes. Therefore, we need to pay special attention when writing the join conditions.

A. When connecting multiple tables, the connection conditions must be fully written. you 'd better repeat them without missing them.

B. use clustered indexes as much as possible in connection conditions

C. note the differences between the ON, WHERE, and HAVING conditions.

ON is the first execution, followed by WHERE, and HAVING. because ON filters records that do not meet the conditions before statistics are made, it can reduce the data to be processed by intermediate operations, it should be said that the speed is the fastest, and the WHERE should be faster than HAVING, because it only performs SUM after filtering data, and ON is used only when two tables are joined, therefore, when a table is created, we can compare the WHERE and HAVING.

1

Consider the connection priority:

2

(1) INNER JOIN

3

(2) left join (note: right join is replaced by left join)

4

(3) CROSS JOIN

Other notes include:

A. IN the post-IN nominal value list, place the most frequent values at the beginning and the least value at the end to reduce the number of judgments.

B. pay attention to the difference between UNION and UNION ALL. -- Enable union all for repeated data

C. use DISTINCT unless necessary.

D. The difference between truncate table and DELETE.

E. reduce the number of database accesses.

In addition, we write the stored procedure. if it is long, mark it with a token, because it is quite readable. even if the statement is not written well, the statement is neat, and C # has region, what I like most about SQL is:

-- Startof query number of employees

SQL statement

-- End

We generally cannot debug programs on formal machines, but many times the program is normal on our local machine, but there is a problem in the formal system, but we cannot operate on formal machines at will, so what should we do? We can use rollback to debug our stored procedures or SQL statements to eliminate errors.

BEGINTRAN

UPDATE a SET field =''

ROLLBACK

I usually add the following section to the job stored procedure, so that the check errors can be stored in the stored procedure. if the error rollback operation is executed, but if the program already has a transaction rollback, therefore, do not write transactions in the stored procedure. this will cause the transaction to roll back and nest to reduce the execution efficiency. However, we can usually put the check in the stored procedure, which helps us to interpret the stored procedure, and troubleshooting.

BEGINTRANSACTION

  -- Start transaction rollback

-- Check for errors

IF (@ ERROR0)

BEGIN

  -- Rollback

ROLLBACKTRANSACTION

RAISERROR ('delete work report error', 16, 3)

RETURN

END

  -- End the transaction

COMMITTRANSACTION

BitsCN.com

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.