It is a fortune to develop SQL statements.

Last Update:2018-04-16 Source: Internet

Author: User

Tags sql server query

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A good habit of developing SQL is bitsCN.com.

For software development, most people do not need to deal with databases, especially erp Development. it is even more frequent to deal with databases, and thousands of rows of stored procedures are motionless. if there is a large amount of data, there will be a large flow of personnel, so can I ensure that the system will run smoothly in the next period? Can I still ensure that the next person can understand my stored procedures? I would like to share with you my personal experience in training at ordinary times and hope to help you.

To know SQL statements, I think it is necessary for us to know how the SQL Server query analyzer executes our SQL statements. many of us will look at the execution plan, you can also use Profiler to monitor and optimize the query statement or the cause of slow storage process. However, if we know the execution logic sequence of the query analyzer, we are confident that we can start with it, are you sure you want to start?

　　I. query logic execution sequence

(1) FROM left_table

(3) join_type JOIN right_table (2) ON join_condition

(4) WHERE where_condition

(5) group by group_by_list

(6) WITH {cube | rollup}

(7) HAVING having_condition

(8) SELECT (9) DISTINCT (11) top_specification select_list

(9) order by order_by_list

The standard SQL parsing sequence is:

(1) The FROM clause assembles data FROM different data sources.

(2) The WHERE clause filters records based on specified conditions.

(3) the group by Clause divides data into multiple groups.

(4) use aggregate functions for calculation

(5) Use the HAVING clause to filter groups

(6) calculate all expressions

(7) Use order by to sort the result set

　　II. execution sequence

1. FROM: run Cartesian product on the first two tables in the FROM clause to generate virtual table vt1

2. ON: apply the ON filter to the vt1 table and insert vt2 only when the join_condition is true.

3. OUTER (join): if the row not found in the outer join reserved table (preserved table) is specified as an external row, add it to vt2 to generate t3. if the from contains more than two tables, repeat the steps and steps for the result table generated by the previous join and the next table to end directly.

4. WHERE: apply the WHERE filter to vt3. vt4 is inserted only when the where_condition is set to true.

5. group by: generates a vt5 instance BY grouping the rows in vt4 BY the column list in the group by clause.

6. CUBE | ROLLUP: inserts supergroups into vt6 to generate vt6.

7. HAVING: apply HAVING filter to vt6. only groups with having_condition set to true are inserted into vt7.

8. SELECT: process the select list to generate vt8

9. DISTINCT: remove duplicate rows from vt8 to generate vt9

10. order by: sorts the rows of vt9 by the column list in the order BY clause to generate a cursor vc10

11. TOP: Select a specified number or proportion of rows from the beginning of vc10 to generate vt11 and return to the caller

As shown in the preceding figure, the syntax for using Linq to SQL is similar? If we understand the execution sequence of SQL Server, then we will further develop the good habit of daily SQL, that is, to consider the performance while implementing functions, A database is a tool that can perform set operations. we should try to use this tool. The so-called set operation is actually a batch operation, that is, to reduce the number of large data cycles on the client, instead of SQL statements or stored procedures.

　　3. only the required data is returned.

To return data to the client, you must at least extract data from the database, transmit data over the network, receive data from the client, and process data from the client. if no data is returned, it will increase invalid labor on servers, networks, and clients. the harm is obvious. to avoid such incidents, you must note:

A. horizontal view

(1) do not write the SELECT * statement, but SELECT the fields you need.

(2) when connecting multiple tables in an SQL statement, use the table alias and prefix the alias on each Column. In this way, the parsing time can be reduced and the syntax errors caused by Column ambiguity can be reduced.

Table 1 (ID, col1) and Table 2 (ID, col2)

Select A. ID, A. col1, B. col2

-- Select A. ID, col1, col2-do not write this statement, which is not conducive to future program expansion.

From table1 A inner join table2 B on A. ID = B. ID Where...

B. vertical view

(1) write the WHERE clause reasonably. do not write SQL statements without WHERE.

(2) select top n * -- use this method without the WHERE condition.

　　4. do as little repetitive work as possible

A. control multiple executions of the same statement, especially the multiple executions of some basic data.

B. data conversion may be designed to reduce the number of data conversions, but it can be done by programmers.

C. eliminate unnecessary subqueries and connection tables. subqueries are generally interpreted as external connections in the execution plan, resulting in additional costs for redundant connection tables.

D. Merge multiple updates for the same table with the same condition, for example

Update employee set fname = 'haiwer'

WHERE EMP_ID = 'vpa30890f' update employee set lname = 'Yang'

WHERE EMP_ID = 'vpa30890f'

These two statements should be merged into the next statement.

Update employee set fname = 'haiwer ', LNAME = 'Yang' WHERE EMP_ID = 'vpa30890f'

E. do not split the UPDATE operation into the DELETE operation + INSERT operation. Although the functions are the same, the performance difference is great.

　　5. use temporary tables and table variables

In complex systems, temporary tables and table variables are difficult to avoid. for usage of temporary tables and table variables, note the following:

A. If the statements are complex and have too many connections, you can use temporary tables and table variables for step-by-step execution.

B. If you need to use the same part of the data of a large table multiple times, use temporary tables and table variables to store the data.

C. If you need to combine the data of multiple tables to form a result, you can use temporary tables and table variables to summarize the data of these tables step by step.

D. in other cases, use of temporary tables and table variables should be controlled.

E. for the selection of temporary tables and table variables, many statements are that the table variables are in the memory and are fast. The table variables should be preferred. However, in actual use, we found that:

(1) the data volume that needs to be stored in the temporary table is mainly considered. when the data volume is large, the temporary table speed is faster.

(2) execution time period and estimated execution time (how long)

F. select into and create table + insert into for temporary tables. generally:

Select into is much faster than the create table + insert into method,

However, select into locks the system tables SYSOBJECTS, SYSINDEXES, and SYSCOLUMNS of TEMPDB. in a multi-user concurrency environment, it is easy to block other processes.

Therefore, I suggest using CREATE TABLE + INSERT INTO in a concurrent system, while using SELECT INTO in a single statement with a large data volume.

　　VI. subquery usage

A subquery is a SELECT query nested in SELECT, INSERT, UPDATE, DELETE statements, or other subqueries.

Subqueries can be used wherever expressions are allowed. subqueries allow flexible programming and can be used to implement some special functions. However, in terms of performance, an inappropriate subquery may cause a performance bottleneck. If the sub-query condition uses the field of the outer table, this subquery is called the related subquery.

You can use IN, not in, EXISTS, and not exists to introduce related subqueries. Note the following for related subqueries:

(1) for subqueries not in and not exists, use left join instead. For example:

SELECT PUB_NAME from publishers where PUB_ID NOTIN (SELECT PUB_ID from titles where type = 'business ')

It can be rewritten:

Select a. PUB_NAME from publishers a leftjoin titles B on B. TYPE = 'business' and a. PUB_ID = B. PUB_ID where B. PUB_ID IS NULL

For example, not exists:

SELECT TITLE FROM TITLES

WHERE NOT EXISTS

(SELECT TITLE_ID from sales where TITLE_ID = TITLES. TITLE_ID)

It can be rewritten:

SELECT TITLE

FROM TITLES LEFTJOIN SALES

On sales. TITLE_ID = TITLES. TITLE_ID

Where sales. TITLE_ID ISNULL

2) If the subquery is not repeated, the subquery related to IN and EXISTS can be replaced by inner join. For example:

SELECT PUB_NAME

FROM PUBLISHERS

WHERE PUB_ID IN

(SELECT PUB_ID

FROM TITLES

Where type = 'business ')

It can be rewritten:

Select a. PUB_NAME -- select distinct a. PUB_NAME

FROM PUBLISHERS A INNERJOIN TITLES B

On B. TYPE = 'business' AND

A. PUB_ID = B. PUB_ID

(3) IN subqueries are replaced by EXISTS, for example:

SELECT PUB_NAME FROM PUBLISHERS

WHERE PUB_ID IN

(SELECT PUB_ID from titles where type = 'business ')

You can use the following statement instead:

SELECT PUB_NAME FROM PUBLISHERS WHERE EXISTS

(SELECT1FROM titles where type = 'business' AND

PUB_ID = PUBLISHERS. PUB_ID)

4) do not use the COUNT (*) subquery to determine whether a record EXISTS. it is best to use left join or EXISTS. for example, someone writes a statement like this:

SELECT JOB_DESC FROM JOBS

WHERE (SELECTCOUNT (*) from employee where JOB_ID = JOBS. JOB_ID) = 0

It should be changed:

Select jobs. JOB_DESC FROM JOBS LEFTJOIN EMPLOYEE

On employee. JOB_ID = JOBS. JOB_ID

Where employee. EMP_ID ISNULL

SELECT JOB_DESC FROM JOBS

WHERE (select count (*) from employee where JOB_ID = JOBS. JOB_ID) 0

It should be changed:

SELECT JOB_DESC FROM JOBS

WHEREEXISTS (SELECT 1 from employee where JOB_ID = JOBS. JOB_ID)

　　7. try to use indexes

After an index is created, not every query uses an index. when an index is used, the efficiency of the index varies greatly. As long as the index is not forcibly specified in the query statement, the selection and use of the index is automatically selected by the SQL Server Optimizer, the selection is based on the conditions of the query statement and the statistical information of related tables. This requires that we try to make the optimizer use indexes when writing SQL statements. To enable the optimizer to efficiently use indexes, note the following when writing statements:

(1) do not perform operations on index fields, but try to perform transformations, for example:

Select id from t where num/2 = 100

Should be changed:

Select id from t where num = 100*2

Select id from t where num/2 = NUM1

If NUM has an index, change it:

Select id from t where num = NUM1 * 2

If NUM1 has an index, it should not be changed.

(2) the following statements have been found:

SELECT year, month, amount FROM balance table WHERE100 * year + month = 2010*100 + 10

It should be changed:

SELECT year, month, amount FROM balance table WHERE year = 2010 AND month = 10

(3) do not convert the format of index fields

Example of a date field:

WHERECONVERT (VARCHAR (10), date field, 120) = '2017-07-15'

Should be changed

WHERE date field> = '2014-07-15 'AND date field '2014-07-16'

Example of ISNULL conversion:

WHEREISNULL (field, '')'' should be changed to: WHERE field''

WHEREISNULL (field, '') ='' should not be modified

WHEREISNULL (field, 'F') = 'T' should be changed to: WHERE field ='t'

WHEREISNULL (field, 'F') 'T' should not be modified

(4) do not convert the format of index fields

WHERELEFT (NAME, 3) = 'abc' or where substring (NAME, 1, 3) = 'abc'

Should be changed to: where name like 'ABC %'

Example of Date query: