Developing a good habit of SQL brings a great fortune

Source: Internet
Author: User
Tags sql server query

Most of us do not need to deal with databases when we develop software. Especially for ERP development, we have to deal with databases more frequently, and thousands of rows of data are still stored. If there is a large amount of data, there will be a large flow of people, so can I ensure that the system will run smoothly in the next period? Can the next person understand my stored procedures? I would like to share with you my personal experience in training at ordinary times and hope to help you.

To know SQL statements, I think it is necessary for us to know how the SQL Server Query analyzer executes My SQL statements. Many of me will look at the execution plan, you can also use profile to monitor and optimize the query statement or the cause of slow storage process. However, if we know the execution logic sequence of the query analyzer, we are confident that we can start with it, are you sure you want to start?

I. query logic execution sequence

(1) From <left_table>

(2) On <join_condition>

(3) <join_type> join <right_table>

(4) Where <where_condition>

(5) group by <group_by_list>

(6) with {cube | rollup}

(7) having

(8) Select (9) distinct (11) <top_specification> <select_list>

(10) order by <order_by_list>

The standard SQL parsing sequence is:

(1). The from clause assembles data from different data sources.

(2). The WHERE clause filters records based on specified conditions.

(3) The Group by clause divides data into multiple groups.

(4). Use Aggregate functions for Calculation

(5). Use the having clause to filter groups.

(6) Calculate all expressions

(7). Use order by to sort the result set

2. execution sequence:

1. From: Run Cartesian product on the first two tables in the from clause to generate virtual table vt1

2. On: Apply the on filter to the vt1 table and insert VT2 only when the <join_condition> is true.

3. outer (join): If the outer join reserved table (preserved table) is specified) unfound rows are added as external rows to VT2 to generate T3. If the from table contains more than two tables, the result table generated for the previous join and the next table are repeat steps and steps are directly completed.

4. Where: Apply the where filter to vt3 to insert vt4 only when the <where_condition> is true.

5. Group by: generates a vt5 instance by grouping the rows in vt4 by the column list in the group by clause.

6. cube | rollup: inserts supergroups into vt6 to generate vt6.

7. Having: Apply having filter to vt6. Only groups with

8. Select: process the select list to generate vt8

9. distinct: Remove duplicate rows from vt8 to generate vt9

10. Order by: sorts the rows of vt9 by the column list in the order by clause to generate a cursor VC10

11. Top: select a specified number or proportion of rows from the beginning of VC10 to generate vt11 and return to the caller

As shown in the preceding figure, the linqtosql syntax is similar? If we understand the execution sequence of SQL Server, then we will further develop the habit of daily SQL, that is, the idea of considering performance while implementing functions, A database is a tool that can perform set operations. We should try to use this tool. The so-called set operation is actually a batch operation, that is, to reduce the number of large data cycles on the client, instead of SQL statements or stored procedures.

3. Only the required data is returned.

To return data to the client, you must at least extract data from the database, transmit data over the network, receive data from the client, and process data from the client. If no data is returned, it will increase invalid labor on servers, networks, and clients. The harm is obvious. To avoid such incidents, you must note:

A. Horizontally,

(1) do not write the select * statement, but select the fields you need.

(2) When connecting multiple tables in an SQL statement, use the table alias and prefix the alias on each column. in this way, the parsing time can be reduced and the syntax errors caused by column ambiguity can be reduced.

 
 
  1. Table 1 (ID, col1) and table 2 (ID, col2)
  2. Select a. ID, A. col1, B. col2
  3. -- Select a. ID, col1, col2-do not write this statement, which is not conducive to future program expansion.
  4. From Table1 a inner join Table2 B on A. ID = B. ID where...

B. Vertically,

(1) write the WHERE clause reasonably. Do not write SQL statements without where.

(2) Select Top N * -- replace this with no where Condition

4. Do as little repetitive work as possible

A. control multiple executions of the same statement, especially the multiple executions of some basic data.

B. Data conversion may be designed to reduce the number of data conversions, but it can be done by programmers.

C. eliminate unnecessary subqueries and connection tables. subqueries are generally interpreted as external connections in the execution plan, resulting in additional costs for redundant connection tables.

D. Merge multiple updates for the same table with the same condition, for example

 
 
  1. UPDATE EMPLOYEE SET FNAME='HAIWER' 
  2. WHERE EMP_ID=' VPA30890F' UPDATE EMPLOYEE SET LNAME='YANG' 
  3. WHERE EMP_ID=' VPA30890F' 

These two statements should be merged into the next statement.

Update employee set fname = 'haiwer ', lname = 'yang' where emp_id = 'vpa30890f'

E. Do not split the update operation into the delete operation + insert operation. Although the functions are the same, the performance difference is great.

5. Pay attention to the usage of temporary tables and table Variables

In complex systems, temporary tables and table variables are difficult to avoid. for usage of temporary tables and table variables, note the following:

A. if the statements are complex and have too many connections, you can use temporary tables and table variables for step-by-step execution.

B. If you need to use the same part of the data of a large table multiple times, use temporary tables and table variables to store the data.

C. If you need to combine the data of multiple tables to form a result, you can use temporary tables and table variables to summarize the data of these tables step by step.

D. In other cases, use of temporary tables and table variables should be controlled.

E. For the selection of temporary tables and table variables, many statements are that the table variables are in memory and are fast. Table variables should be preferred,

(1) The data volume that needs to be stored in the temporary table is mainly considered. When the data volume is large, the temporary table speed is faster.

(2) execution time period and estimated execution time (how long)

F. Select into and Create Table + insert into for temporary tables. Generally,

Select into is much faster than the create table + insert into method,

However, select into locks the system tables sysobjects, sysindexes, and syscolumns of tempdb. In a multi-user concurrency environment, it is easy to block other processes,

Therefore, I suggest using CREATE TABLE + insert into in a concurrent system, while using select into in a single statement with a large data volume.

Vi. subquery usage

A subquery is a SELECT query nested in select, insert, update, delete statements, or other subqueries.

Subqueries can be used wherever expressions are allowed. subqueries allow flexible programming and can be used to implement some special functions. However, in terms of performance,

An inappropriate subquery may cause a performance bottleneck. If the sub-query condition uses the field of the outer table, this subquery is called the related subquery.

You can use in, not in, exists, and not exists to introduce related subqueries. Note the following for related subqueries:

(1)

For subqueries related to a, not in, and not exists, you can use left join instead.

For example, select pub_name from publishers where pub_id not in (select pub_id from titles where type = 'business') can be rewritten to: select. pub_name from publishers a left join titles B on B. type = 'business' and. pub_id = B. pub_id where B. pub_id is null

(2)

 
 
  1. SELECT TITLE FROM TITLES  
  2. WHERE NOT EXISTS  
  3. (SELECT TITLE_ID FROM SALES  
  4. WHERE TITLE_ID = TITLES.TITLE_ID) 

It can be rewritten:

 
 
  1. SELECT TITLE  
  2. FROM TITLES LEFT JOIN SALES  
  3. ON SALES.TITLE_ID = TITLES.TITLE_ID  
  4. WHERE SALES.TITLE_ID IS NULL 

B. If the subquery is not repeated, the subquery related to in and exists can be replaced by inner join. For example:

 
 
  1. SELECT PUB_NAME  
  2. FROM PUBLISHERS  
  3. WHERE PUB_ID IN 
  4. (SELECT PUB_ID  
  5. FROM TITLES  
  6. WHERE TYPE = 'BUSINESS') 

It can be rewritten:

 
 
  1. SELECT A.PUB_NAME --SELECT DISTINCT A.PUB_NAME  
  2. FROM PUBLISHERS A INNER JOIN TITLES B  
  3. ON B.TYPE = 'BUSINESS' AND 
  4. A.PUB_ID=B. PUB_ID 

(3)

C. Use exists instead of related subqueries of in, such

 
 
  1. SELECT PUB_NAME FROM PUBLISHERS  
  2. WHERE PUB_ID IN 
  3. (SELECT PUB_ID FROM TITLES WHERE TYPE = 'BUSINESS') 

You can use the following statement instead:

 
 
  1. SELECT PUB_NAME FROM PUBLISHERS WHERE EXISTS  
  2. (SELECT 1 FROM TITLES WHERE TYPE = 'BUSINESS' AND 
  3. PUB_ID= PUBLISHERS.PUB_ID) 

D. Do not use the count (*) subquery to determine whether a record exists. It is best to use left join or exists. For example, someone writes a statement like this:

 
 
  1. SELECT JOB_DESC FROM JOBS  
  2. WHERE (SELECT COUNT(*) FROM EMPLOYEE WHERE JOB_ID=JOBS.JOB_ID)=0 

It should be changed:

 
 
  1. SELECT JOBS.JOB_DESC FROM JOBS LEFT JOIN EMPLOYEE  
  2. ON EMPLOYEE.JOB_ID=JOBS.JOB_ID  
  3. WHERE EMPLOYEE.EMP_ID IS NULL 
  4. SELECT JOB_DESC FROM JOBS  
  5. WHERE (SELECT COUNT(*) FROM EMPLOYEE WHERE JOB_ID=JOBS.JOB_ID)<>0 

It should be changed:

 
 
  1. SELECT JOB_DESC FROM JOBS  
  2. WHERE EXISTS (SELECT 1 FROM EMPLOYEE WHERE JOB_ID=JOBS.JOB_ID)

7. Try to use Indexes

After an index is created, not every query uses an index. When an index is used, the efficiency of the index varies greatly. As long as the index is not forcibly specified in the query statement,

The selection and usage of indexes are automatically selected by the SQL Server optimizer. The selection is based on the conditions of the query statements and the statistics of related tables, which requires that we write SQL

Make sure that the optimizer can use the index when using the statement. To enable the optimizer to efficiently use indexes, note the following when writing statements:

A. do not perform operations on index fields, but try to perform transformations, such

Select ID from t where num/2 = 100

Should be changed:

Select ID from t where num = 100*2

-------------------------------------------------------

Select ID from t where num/2 = num1

If num has an index, change it:

Select ID from t where num = num1 * 2

If num1 has an index, it should not be changed.

--------------------------------------------------------------------

Such statements have been found:

Select year, month, amount from balance table where 100 * year + month = 2010*100 + 10

It should be changed:

Select year, month, amount from balance table where year = 2010 and month = 10

B. Do not convert the format of index fields.

Example of a date field:

Where convert (varchar (10), Date Field, 120) = '2017-07-15'

Should be changed

Where Date Field> = '2014-07-15 'and Date Field <'2014-07-16'

Example of isnull conversion:

Where isnull (field, '') <>'' should be changed to: where field <>''

Where isnull (field, '') ='' should not be modified

Where isnull (field, 'F') = 'T' should be changed to: where field = 'T'

Where isnull (field, 'F') <> 'T' should not be modified

C. Do not use functions for indexed fields

Where left (name, 3) = 'abc' or where substring (name, 1, 3) = 'abc'

Should be changed to: Where name like 'abc %'

Example of date query:

Where datediff (day, date, '2017-06-30 ') = 0

Should be changed to: Where date> = '2014-06-30 'and date <'2014-07-01'

Where datediff (day, date, '2017-06-30 ')> 0

Should be changed to: Where date <'2017-06-30'

Where datediff (day, date, '2017-06-30 ')> = 0

Should be changed to: Where date <'2017-07-01'

Where datediff (day, date, '2017-06-30 ') <0

Should be changed to: Where date> = '2017-07-01'

Where datediff (day, date, '2017-06-30 ') <= 0

Should be changed to: Where date> = '2017-06-30'

D. Do not connect multiple fields in the index.

For example:

Where fame + '.' + lname = 'haiwei. yang'

Should be changed:

Where fname = 'haiwei' and lname = 'yang'

8. The join conditions for multi-table join are of great significance to the selection of indexes. Therefore, we need to pay special attention when writing the join conditions.

A. When connecting multiple tables, the connection conditions must be fully written. You 'd better repeat them without missing them.

B. Use clustered indexes as much as possible in connection conditions

C. Note the differences between the on, where, and having conditions.

On is the first execution, followed by where, and having. Because on filters records that do not meet the conditions before statistics are made, it can reduce the data to be processed by intermediate operations, it should be said that the speed is the fastest, and the where should be faster than having, because it only performs sum after filtering data, and on is used only when two tables are joined, therefore, when a table is created, we can compare the where and having.

1. Consider the connection priority:

2 inner join

3 left join (Note: Right join is replaced by left join)

4 cross join

Other notes include:

A. In the post-in nominal value list, place the most frequent values at the beginning and the least at the end to reduce the number of judgments.

B. Pay attention to the difference between Union and Union all. -- Enable Union all for repeated data

C. Use distinct unless necessary.

D. Differences between truncate table and delete

E. Reduce the number of database accesses

The other is that we write the stored procedure. If it is long, mark it with a token, because it is quite readable. Even if the statement is not well written, the statement is neat. C # has region.

What I like most about SQL is

 
 
  1. -- Startof query Number of employees
  2. SQL statement
  3. -- End

We generally cannot Debug Programs on formal machines, but many times the program is normal on our local machine, but there is a problem in the formal system, but we cannot operate on formal machines at will, so what should we do? We can use rollback to debug our stored procedures or SQL statements to eliminate errors.

 
 
  1. Begin tran
  2. Update a set field =''
  3. Rollback

I usually add the following section to the job stored procedure, so that the check errors can be stored in the stored procedure. If the error rollback operation is executed, but if the program already has a transaction rollback, therefore, do not write transactions in the stored procedure. This will cause the transaction to roll back and nest to reduce the execution efficiency. However, we can usually put the check in the stored procedure, which helps us to interpret the stored procedure, and troubleshooting.

 
 
  1. Begin transaction
  2. -- Start transaction rollback
  3. -- Check for errors
  4. If (@ error> 0)
  5. Begin
  6. -- Rollback
  7. Rollback transaction
  8. Raiserror ('delete Work Report error', 16, 3)
  9. Return
  10. End
  11. -- End the transaction
  12. Commit transaction

I haven't written a blog post for a long time. I work on one project after another. In addition to the company's staff flow, many new people cannot answer things. Working overtime has become a common habit. I am eager to write down these hopes to help everyone, if this is not the case, please give us some advice and improve your communication.

If something is wrong, you are welcome to share your ideas.

Link: http://www.cnblogs.com/MR_ke/archive/2011/05/29/2062085.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.