Good habits of SQL programming and good habits of SQL programming

Last Update:2017-03-03 Source: Internet

Author: User

Tags sql server query

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Good habits of SQL programming and good habits of SQL programming

| Reposted from: cnblog

| Original article: http://www.cnblogs.com/MR_ke/archive/2011/05/29/2062085.html

Most of us do not need to deal with databases when we develop software. Especially for erp development, we have to deal with databases more frequently, and thousands of rows of data are still stored. If there is a large amount of data, there will be a large flow of people, so can we ensure that the system will run smoothly in the next period of time? Can we still ensure that the next person can understand our stored procedures? I would like to share with you my personal experience in training at ordinary times and hope to help you.

To know SQL statements, I think it is necessary for us to know how the SQL Server Query analyzer executes My SQL statements. Many of us will look at the execution plan, you can also use profile to monitor and Optimize Query statements or the cause of slow stored procedures. However, if we know the execution logic sequence of the query analyzer, we are confident that we can start with it, are you sure you want to start?

The following are some good habits of SQL programming:

I. query logic execution sequence

(1) FROM <left_table>

(3) <join_type> JOIN <right_table> (2) ON <join_condition>

(4) WHERE <where_condition>

(5) group by <group_by_list>

(6) WITH {cube | rollup}

(7) HAVING

(8) SELECT (9) DISTINCT (11) <top_specification> <select_list>

(10) order by <order_by_list>

The standard SQL parsing sequence is:

(1). The FROM clause assembles data FROM different data sources.

(2). The WHERE clause filters records based on specified conditions.

(3) The group by clause divides data into multiple groups.

(4). Use Aggregate functions for Calculation

(5). Use the HAVING clause to filter groups.

(6) Calculate all expressions

(7). Use order by to sort the result set

2. execution sequence:

1. FROM:Run Cartesian product on the first two tables in the FROM clause to generate virtual table vt1

2. ON:Apply the ON filter to the vt1 table to insert vt2 only when the <join_condition> is true.

3. OUTER (join ):If you specify the outer join reserved table (preserved table) unfound rows are added as external rows to vt2 to generate t3. If the from table contains more than two tables, the result table generated for the previous join and the next table are repeat steps and steps are directly completed.

4. WHERE:Vt3 should be a WHERE filter. Only rows with <where_condition> true can be inserted into vt4.

5. group:Generate vt5 instances BY grouping the rows in vt4 BY the column list in the group by clause

6. CUBE | ROLLUP:Insert supergroups into vt6 to generate vt6

7. HAVING:Apply the HAVING filter to vt6 to insert vt7 only when the

8. SELECT:Process select list to generate vt8

9. DISTINCT:Remove duplicate rows from vt8 to generate vt9

10. order:Sort the rows of vt9 by column list in the order by clause to generate a cursor vc10

11. TOP:Select a specified number or proportion of rows from the beginning of vc10 to generate vt11 and return to the caller

As shown in the preceding figure, the linqtosql syntax is similar? If we understand the execution sequence of SQL Server, we will further develop the good habit of daily SQL, that is, the idea of considering performance while implementing functions. The database is a tool that can perform set operations, we should try our best to use this tool. The so-called set operation is actually batch operation, that is, to minimize the number of cyclic operations on the client, instead of using SQL statements or stored procedures.

3. Only the required data is returned.

To return data to the client, you must at least extract data from the database, transmit data over the network, receive data from the client, and process data from the client. If no data is returned, it will increase invalid labor on servers, networks, and clients. The harm is obvious. To avoid such incidents, you must note:

A. horizontal view

(1) do not write the SELECT * statement, but SELECT the fields you need.

(2) When connecting multiple tables in an SQL statement, use the table alias and prefix the alias on each Column. in this way, the parsing time can be reduced and the syntax errors caused by Column ambiguity can be reduced.

Table 1 (ID, col1) and table 2 (ID, col2)

Select A. ID, A. col1, B. col2

-- Select A. ID, col1, col2-do not write this statement, which is not conducive to future program expansion.

From table1 A inner join table2 B on A. ID = B. ID Where...

B. vertical view:

(1) write the WHERE clause reasonably. Do not write SQL statements without WHERE.

(2) select top n * -- replace this with no WHERE Condition

4. Do as little repetitive work as possible

A. control multiple executions of the same statement, especially the multiple executions of some basic data.

B. Data conversion may be designed to reduce the number of data conversions, but it can be done by programmers.

C. eliminate unnecessary subqueries and connection tables. subqueries are generally interpreted as external connections in the execution plan, resulting in additional costs for redundant connection tables.

D. Merge multiple updates for the same table with the same condition, for example

Update employee set fname = 'haiwer'

WHERE EMP_ID = 'vpa30890f' update employee set lname = 'yang'

WHERE EMP_ID = 'vpa30890f'

These two statements should be merged into the next statement.

Update employee set fname = 'haiwer ', LNAME = 'yang' WHERE EMP_ID = 'vpa30890f'

E. Do not split the UPDATE operation into the DELETE operation + INSERT operation. Although the functions are the same, the performance difference is great.

5. Pay attention to the usage of temporary tables and table Variables

In complex systems, temporary tables and table variables are difficult to avoid. for usage of temporary tables and table variables, note the following:

A. if the statements are complex and have too many connections, you can use temporary tables and table variables for step-by-step execution.

B. If you need to use the same part of the data of a large table multiple times, use temporary tables and table variables to store the data.

C. If you need to combine the data of multiple tables to form a result, you can use temporary tables and table variables to summarize the data of these tables step by step.

D. In other cases, use of temporary tables and table variables should be controlled.

E. For the selection of temporary tables and table variables, many statements are that the table variables are in memory and are fast. Table variables should be preferred,

(1) The data volume that needs to be stored in the temporary table is mainly considered. When the data volume is large, the temporary table speed is faster.

(2) execution time period and estimated execution time (how long)

F. select into and create table + insert into for temporary tables. Generally,

Select into is much faster than the create table + insert into method,

However, select into locks the system tables SYSOBJECTS, SYSINDEXES, and SYSCOLUMNS of TEMPDB. In a multi-user concurrency environment, it is easy to block other processes,

Therefore, I suggest using CREATE TABLE + INSERT INTO in a concurrent system, while using SELECT INTO in a single statement with a large data volume.

Vi. subquery usage (1)

A subquery is a SELECT query nested in SELECT, INSERT, UPDATE, DELETE statements, or other subqueries.

Subqueries can be used wherever expressions are allowed. subqueries allow flexible programming and can be used to implement some special functions. However, in terms of performance,

An inappropriate subquery may cause a performance bottleneck. If the sub-query condition uses the field of the outer table, this subquery is called the related subquery.

You can use IN, not in, EXISTS, and not exists to introduce related subqueries. Note the following for related subqueries:

(1)

For subqueries related to a, not in, and not exists, you can use left join instead. For example, SELECT PUB_NAME from publishers where PUB_ID not in (SELECT PUB_ID from titles where type = 'business') can be rewritten to: select. PUB_NAME from publishers a left join titles B on B. TYPE = 'business' and. PUB_ID = B. PUB_ID where B. PUB_ID IS NULL

(2)

SELECT TITLE FROM TITLES

WHERE NOT EXISTS

(SELECT TITLE_ID FROM SALES

WHERE TITLE_ID = TITLES. TITLE_ID)

It can be rewritten:

SELECT TITLE

FROM TITLES LEFT JOIN SALES

On sales. TITLE_ID = TITLES. TITLE_ID

Where sales. TITLE_ID IS NULL

B. If the subquery is not repeated, the subquery related to IN and EXISTS can be replaced by INNER JOIN. For example:

SELECT PUB_NAME

FROM PUBLISHERS

WHERE PUB_ID IN

(SELECT PUB_ID

FROM TITLES

Where type = 'business ')

It can be rewritten:

Select a. PUB_NAME -- select distinct a. PUB_NAME

FROM PUBLISHERS A INNER JOIN TITLES B

On B. TYPE = 'business' AND

A. PUB_ID = B. PUB_ID

(3)

C. Use EXISTS instead of related subqueries of IN, such

SELECT PUB_NAME FROM PUBLISHERS

WHERE PUB_ID IN

(SELECT PUB_ID from titles where type = 'business ')

You can use the following statement instead:

SELECT PUB_NAME FROM PUBLISHERS WHERE EXISTS

(SELECT 1 from titles where type = 'business' AND

PUB_ID = PUBLISHERS. PUB_ID)

D. Do not use the COUNT (*) subquery to determine whether a record EXISTS. It is best to use left join or EXISTS. For example, someone writes a statement like this:

SELECT JOB_DESC FROM JOBS

WHERE (select count (*) from employee where JOB_ID = JOBS. JOB_ID) = 0

It should be changed:

Select jobs. JOB_DESC FROM JOBS LEFT JOIN EMPLOYEE

On employee. JOB_ID = JOBS. JOB_ID

Where employee. EMP_ID IS NULL

SELECT JOB_DESC FROM JOBS

WHERE (select count (*) from employee where JOB_ID = JOBS. JOB_ID) <> 0

It should be changed:

SELECT JOB_DESC FROM JOBS

Where exists (SELECT 1 from employee where JOB_ID = JOBS. JOB_ID)

7. Try to use Indexes

After an index is created, not every query uses an index. When an index is used, the efficiency of the index varies greatly. As long as the index is not forcibly specified in the query statement,

The selection and usage of indexes are automatically selected by the SQL Server optimizer. The selection is based on the conditions of the query statements and the statistics of related tables, which requires that we write SQL

Make sure that the optimizer can use the index when using the statement. To enable the optimizer to efficiently use indexes, note the following when writing statements:

(1)

A. do not perform operations on index fields, but try to perform transformations, such

Select id from t where num/2 = 100

Should be changed:

Select id from t where num = 100*2

Select id from t where num/2 = NUM1

If NUM has an index, change it:

Select id from t where num = NUM1 * 2

If NUM1 has an index, it should not be changed.

(2)

Such statements have been found:

SELECT year, month, amount FROM balance table WHERE 100 * year + month = 2010*100 + 10

It should be changed:

SELECT year, month, amount FROM balance table WHERE year = 2010 AND month = 10

B. Do not convert the format of index fields.

Example of a date field:

Where convert (VARCHAR (10), Date Field, 120) = '2017-07-15'

Should be changed

WHERE Date Field> = '2014-07-15 'AND Date Field <'2014-07-16'

Example of ISNULL conversion:

Where isnull (field, '') <>'' should be changed to: WHERE field <>''

Where isnull (field, '') ='' should not be modified

Where isnull (field, 'F') = 'T' should be changed to: WHERE field = 'T'

Where isnull (field, 'F') <> 'T' should not be modified

(3)

C. Do not use functions for indexed fields

Where left (NAME, 3) = 'abc' or where substring (NAME, 1, 3) = 'abc'

Should be changed to: where name like 'abc %'

Example of date query:

Where datediff (DAY, date, '2017-06-30 ') = 0

Should be changed to: WHERE date> = '2014-06-30 'AND date <'2014-07-01'

Where datediff (DAY, date, '2017-06-30 ')> 0

Should be changed to: WHERE date <'2017-06-30'

Where datediff (DAY, date, '2017-06-30 ')> = 0

Should be changed to: WHERE date <'2017-07-01'

Where datediff (DAY, date, '2017-06-30 ') <0

Should be changed to: WHERE date> = '2017-07-01'

Where datediff (DAY, date, '2017-06-30 ') <= 0

Should be changed to: WHERE date> = '2017-06-30'

D. Do not connect multiple fields in the index.

For example:

Where fame + '.' + LNAME = 'haiwei. yang'

Should be changed:

Where fname = 'haiwei' and lname = 'yang'

8. Multi-table join conditions and index selection

A. When connecting multiple tables, the connection conditions must be fully written. You 'd better repeat them without missing them.

B. Use clustered indexes as much as possible in connection conditions

C. Note the differences between the ON, WHERE, and HAVING conditions.

ON is the first execution, followed by WHERE, and HAVING. Because ON filters records that do not meet the conditions before statistics are made, it can reduce the data to be processed by intermediate operations, it should be said that the speed is the fastest, and the WHERE should be faster than HAVING, because it only performs SUM after filtering data, and ON is used only when two tables are joined, therefore, when a table is created, we can compare the WHERE and HAVING.

Consider the connection priority:

(1) INNER JOIN

(2) left join (Note: right join is replaced by left join)

(3) CROSS JOIN

9. Others

A. IN the post-IN nominal value list, place the most frequent values at the beginning and the least at the end to reduce the number of judgments.

B. Pay attention to the difference between UNION and UNION ALL. -- Enable union all for repeated data

C. Use DISTINCT unless necessary.

D. Differences between truncate table and DELETE

E. Reduce the number of database accesses

The other is that we write the stored procedure. If it is long, mark it with a token, because it is quite readable. Even if the statement is not well written, the statement is neat. C # has region.

What I like most about SQL is

-- Startof query Number of employees

SQL statement

-- End

We generally cannot Debug Programs on formal machines, but many times the program is normal on our local machine, but there is a problem in the formal system, but we cannot operate on formal machines at will, so what should we do? We can use rollback to debug our stored procedures or SQL statements to eliminate errors.

BEGIN TRAN

UPDATE a SET field =''

ROLLBACK

I usually add the following section to the job stored procedure, so that the check errors can be stored in the stored procedure. If the error rollback operation is executed, but if the program already has a transaction rollback, therefore, do not write transactions in the stored procedure. This will cause the transaction to roll back and nest to reduce the execution efficiency. However, we can usually put the check in the stored procedure, which helps us to interpret the stored procedure, and troubleshooting.

BEGIN TRANSACTION

-- Start transaction rollback

-- Check for errors

IF (@ ERROR> 0)

BEGIN

-- Rollback

ROLLBACK TRANSACTION

RAISERROR ('delete Work Report error', 16, 3)

RETURN

END

-- End the transaction

COMMIT TRANSACTION

I haven't written a blog post for a long time. I work on one project after another. In addition to the company's staff flow, many new people cannot answer things. Working overtime has become a common habit. I am eager to write down these hopes to help everyone, if this is not the case, please give us some advice and improve your communication.

If something is wrong, you are welcome to share your ideas.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More