Understand SQL principles and write efficient SQL statements

Last Update:2018-06-10 Source: Internet

Author: User

Tags sql server query

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Most of us do not need to deal with databases when we develop software. Especially for erp development, we have to deal with databases more frequently, and thousands of rows of data are still stored. If there is a large amount of data, there will be a large flow of people, so can we ensure that the system will run smoothly in the next period of time? Can we still ensure that the next person can understand our stored procedures? Yes

To know SQL statements, I think it is necessary for us to know how the SQL Server Query analyzer executes My SQL statements. Many of me will look at the execution plan, you can also use profile to monitor and Optimize Query statements or the cause of slow stored procedures. However, if we know the execution logic sequence of the query analyzer, we are confident that we can start with it, are you sure you want to start?

Query logic execution sequence

FROM <left_table>
ON <join_condition>
<Join_type> JOIN <right_table>
WHERE <where_condition>
Group by <group_by_list>
WITH {cube | rollup}
HAVING
SELECT
DISTINCT
Order by <order_by_list>
<Top_specification> <select_list>

The standard SQL parsing sequence is:

. FROM clause assembles data FROM different data sources
The. WHERE clause filters records based on specified conditions.
The. group by clause divides data into multiple groups.
. Use Aggregate functions for Computation
. Use the HAVING clause to filter groups.
. Calculate all expressions
. Use order by to sort the result set.

Execution sequence

FROM: Run Cartesian product to generate virtual table vt1 for the first two tables in the FROM clause.
ON: Apply the ON filter to the vt1 table to insert vt2 only when the <join_condition> is true.
OUTER (join): If the outer join reserved table (preserved table) is specified) unfound rows are added as external rows to vt2 to generate t3. If the from table contains more than two tables, the result table generated for the previous join and the next table are repeat steps and steps are directly completed.
WHERE: Apply the WHERE filter to vt3 to insert vt4 only when the <where_condition> is true.
Group by: generates a vt5 instance BY grouping the rows in vt4 BY the column list in the group by clause.
CUBE | ROLLUP: insert supergroups into vt6 to generate vt6
HAVING: Apply the HAVING filter to vt6 to insert vt7 only when the
SELECT: process the select list to generate vt8
DISTINCT: Remove duplicate rows from vt8 to generate vt9
Order by: sorts the rows of vt9 by the column list in the order BY clause to generate a cursor vc10
TOP: select a specified number or proportion of rows from the beginning of vc10 to generate vt11 and return to the caller

As shown in the preceding figure, the linqtosql syntax is similar? If we understand the execution sequence of SQL Server, we will further develop the good habit of daily SQL, that is, the idea of considering performance while implementing functions. The database is a tool that can perform set operations, we should try our best to use this tool. The so-called set operation is actually batch operation, that is, to minimize the number of cyclic operations on the client, instead of using SQL statements or stored procedures.

Only the required data is returned.

To return data to the client, you must at least extract data from the database, transmit data over the network, receive data from the client, and process data from the client. If no data is returned, it will increase invalid labor on servers, networks, and clients. The harm is obvious. To avoid such incidents, you must note:

1. horizontal view:

Do not write the SELECT * statement, but SELECT the required field.
When connecting multiple tables in an SQL statement, use the table alias and prefix the alias on each Column. in this way, the parsing time can be reduced and the syntax errors caused by Column ambiguity can be reduced.

Table 1 (ID, col1) and table 2 (ID, col2)

Select. ID,. col1, B. col2 -- Select. ID, col1, col2-do not write this way, it is not conducive to future program expansion from table1 A inner join table2 B on. ID = B. ID Where...

2. vertical view:

Write the WHERE clause properly. Do not write SQL statements without WHERE.
Select top n * -- replace this with no WHERE Condition

Try to do less repetitive work

It is seldom noticed by many programmers to control multiple executions of the same statement, especially the multiple executions of some basic data.
Data conversion may be designed to reduce the number of data conversions, but it is possible for programmers to reduce the number of data conversions.
Eliminate unnecessary subqueries and connection tables. subqueries are generally interpreted as external connections in the execution plan, resulting in additional costs for redundant connection tables.
Merge multiple updates for the same table with the same condition, for example:

Update employee set fname = 'haiwer 'WHERE EMP_ID = 'vpa30890f' update employee set lname = 'yang' WHERE EMP_ID = 'vpa30890f' the two statements should be merged into the next statement UPDATE EMPLOYEE SET FNAME = 'haiwer ', LNAME = 'yang' WHERE EMP_ID = 'vpa30890f'

The UPDATE operation should not be split into DELETE operations + INSERT operations. Although the functions are the same, the performance difference is great.

Pay attention to the usage of temporary tables and table Variables

In complex systems, temporary tables and table variables are difficult to avoid. for usage of temporary tables and table variables, note the following:

If the statements are complex and have too many connections, you can use temporary tables and table variables to complete them step by step.
If you need to use the same part of the data of a large table multiple times, use temporary tables and table variables to store the data.
If you need to combine the data of multiple tables to form a result, you can use temporary tables and table variables to summarize the data of these tables step by step.
In other cases, use of temporary tables and table variables should be controlled.
Many statements about the selection of temporary tables and table variables are that the table variables are in the memory and are fast. Table variables should be preferred. However, in actual use, the primary consideration is the amount of data that needs to be placed in a temporary table. When there is a large amount of data, the temporary table speed is faster. Execution time period and estimated execution time (how long ).
Select into and create table + insert into are generated for temporary tables. Generally, select into is much faster than create table + insert, however, select into locks the system tables SYSOBJECTS, SYSINDEXES, and SYSCOLUMNS of TEMPDB. In a multi-user concurrent environment, other processes are easily blocked. Therefore, in a concurrent system, try to use create table + insert into, while select into is used in a single statement with a large data volume.

Subquery usage

A subquery is a SELECT query nested in SELECT, INSERT, UPDATE, DELETE statements, or other subqueries. Subqueries can be used wherever expressions are allowed. subqueries allow flexible programming and can be used to implement some special functions. However, in terms of performance, an inappropriate subquery may cause a performance bottleneck. If the sub-query condition uses the field of the outer table, this subquery is called the related subquery. You can use IN, not in, EXISTS, and not exists to introduce related subqueries. Note the following for related subqueries:

1. For subqueries related to not in and not exists, use left join instead.

For example:

SELECT PUB_NAME FROM PUBLISHERS WHERE PUB_ID NOT IN (SELECT PUB_ID FROM TITLES WHERE TYPE = 'BUSINESS')

It can be rewritten:

SELECT A.PUB_NAME FROM PUBLISHERS A LEFT JOIN TITLES B ON B.TYPE = 'BUSINESS' AND A.PUB_ID=B. PUB_ID WHERE B.PUB_ID IS NULL

For example:

SELECT TITLE FROM TITLES WHERE NOT EXISTS  (SELECT TITLE_ID FROM SALES WHERE TITLE_ID = TITLES.TITLE_ID)

It can be rewritten:

SELECT TITLE FROM TITLES LEFT JOIN SALES ON SALES.TITLE_ID = TITLES.TITLE_ID WHERE SALES.TITLE_ID IS NULL

2. If the subquery is not repeated, the subquery related to IN and EXISTS can be replaced by inner join. For example:

SELECT PUB_NAME FROM PUBLISHERS WHERE PUB_ID IN  (SELECT PUB_ID  FROM TITLES  WHERE TYPE = 'BUSINESS')

It can be rewritten:

SELECT A.PUB_NAME --SELECT DISTINCT A.PUB_NAME FROM PUBLISHERS A INNER JOIN TITLES B ON        B.TYPE = 'BUSINESS' AND A.PUB_ID=B. PUB_ID

3. IN subqueries are replaced by EXISTS, for example

SELECT PUB_NAME FROM PUBLISHERS WHERE PUB_ID IN (SELECT PUB_ID FROM TITLES WHERE TYPE = 'BUSINESS')

You can use the following statement instead:

SELECT PUB_NAME FROM PUBLISHERS WHERE EXISTS (SELECT 1 FROM TITLES WHERE TYPE = 'BUSINESS' AND PUB_ID= PUBLISHERS.PUB_ID)

4. Do not use the COUNT (*) subquery to determine whether a record EXISTS. It is best to use left join or EXISTS. For example, someone writes a statement like this:

SELECT JOB_DESC FROM JOBS WHERE (SELECT COUNT(*) FROM EMPLOYEE WHERE JOB_ID=JOBS.JOB_ID)=0

It should be written:

SELECT JOBS.JOB_DESC FROM JOBS LEFT JOIN EMPLOYEE  ON EMPLOYEE.JOB_ID=JOBS.JOB_ID WHERE EMPLOYEE.EMP_ID IS NULL

And

SELECT JOB_DESC FROM JOBS WHERE (SELECT COUNT(*) FROM EMPLOYEE WHERE JOB_ID=JOBS.JOB_ID)<>0

It should be written:

SELECT JOB_DESC FROM JOBS WHERE EXISTS (SELECT 1 FROM EMPLOYEE WHERE JOB_ID=JOBS.JOB_ID)

Try to use Indexes

After an index is created, not every query uses an index. When an index is used, the efficiency of the index varies greatly. As long as the index is not forcibly specified in the query statement, the selection and use of the index is automatically selected by the SQL Server optimizer, it selects the query statement conditions and related table statistics, which requires that we write SQL.

Make sure that the optimizer can use the index when using the statement. To enable the optimizer to efficiently use indexes, note the following when writing statements:

A. do not perform operations on index fields, but try to perform transformations, such

Select id from t where num/2 = 100 should be changed to: select id from t where num = 100*2 select id from t where num/2 = NUM1 if NUM has an index should be changed: select id from t where num = NUM1 * 2 if NUM1 has an index, it should not be changed. We have found such a statement: SELECT year, month, amount FROM balance table WHERE 100 * year + month = 2010*100 + 10 should be changed to: SELECT year, month, amount FROM balance table WHERE year = 2010 AND month = 10

B. Do not convert the format of index fields.

Example of a date field: where convert (VARCHAR (10), Date Field, 120) = '2017-07-15 'should be changed to the WHERE Date Field> = '2017-07-15' AND Date Field <'2017-07-16 '. Example of ISNULL conversion: where isnull (field ,'') <> ''should be changed to: WHERE field <> ''where ISNULL (field,'') = ''. where isnull (field, 'F') should not be modified ') = 'T' should be changed to: WHERE field = 'T' where isnull (field, 'F') <> 'T' should not be modified

C. Do not use functions for indexed fields

Where left (NAME, 3) = 'abc' or where substring (NAME, 1, 3) = 'abc' should be changed to: where name like 'abc % 'date query example: where datediff (DAY, date, '2017-06-30 ') = 0 should be changed to: WHERE date> = '2017-06-30' AND date <'2017-07-01 'where datediff (DAY, date, '2017-06-30 ')> 0 should be changed to: WHERE date <'2017-06-30' where datediff (DAY, date, '2017-06-30 ')> = 0 should be changed: WHERE date <'2017-07-01 'where datediff (DAY, date, '2017-06-30') <0 should be changed to: WHERE date> = '2017-07-01 'where datediff (DAY, date, '2017-06-30 ') <= 0 should be changed to: WHERE date> = '2017-06-30'

4. Do not connect multiple fields in the index.

For example, where fame + '.' + LNAME = 'haiwei. yang' should be changed to: where fname = 'haiwei' and lname = 'yang'

Multi-table join conditions

The join conditions for multi-table join are of great significance to the selection of indexes. Therefore, we need to pay special attention when writing the join conditions.

When connecting multiple tables, the connection conditions must be fully written. It is preferred to repeat the conditions without missing them.
Use clustered indexes as much as possible in connection conditions
Note the differences between some ON, WHERE, and HAVING conditions.

ON is the first execution, followed by WHERE, and HAVING. Because ON filters records that do not meet the conditions before statistics are made, it can reduce the data to be processed by intermediate operations, it should be said that the speed is the fastest, and the WHERE should be faster than HAVING, because it only performs SUM after filtering data, and ON is used only when two tables are joined, therefore, when a table is created, we can compare the WHERE and HAVING.

Consider the connection priority:

INNER JOIN
Left join (Note: right join is replaced by left join)
CROSS JOIN

Other notes include:

IN the list of post-IN denominations, place the most frequent values at the beginning and the least value at the end to reduce the number of judgments.
Note the difference between UNION and union all. -- Enable union all for repeated data
Use DISTINCT. Do not use DISTINCT unless necessary.
Difference between truncate table and DELETE
Reduce the number of database accesses

The other is that we write the stored procedure. If it is long, mark it with a token, because it is quite readable. Even if the statement is not well written, the statement is neat. C # has region, what I like most about SQL is:

-- Startof query Number of employees SQL statement -- end

We generally cannot Debug Programs on formal machines, but many times the program is normal on our local machine, but there is a problem in the formal system, but we cannot operate on formal machines at will, so what should we do? We can use rollback to debug our stored procedures or SQL statements to eliminate errors.

Begin tran update a SET field = ''ROLLBACK

I usually add the following section to the job stored procedure, so that the check errors can be stored in the stored procedure. If the error rollback operation is executed, but if the program already has a transaction rollback, therefore, do not write transactions in the stored procedure. This will cause the transaction to roll back and nest to reduce the execution efficiency. However, we can usually put the check in the stored procedure, which helps us to interpret the stored procedure, and troubleshooting.

Begin transaction -- start of transaction rollback -- check IF (@ ERROR> 0) BEGIN -- rollback transaction raiserror ('delete Work Report error', 16, 3) return end -- END TRANSACTION COMMIT TRANSACTION

This is probably because I have written so much about it. If you have any mistakes, you are welcome to share your ideas.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More