Common SQL Server Specification highlights

Last Update:2017-11-09 Source: Internet

Author: User

Tags joins server memory

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Common SQL Server Specification highlights

Common field type selection

1. Varchar/nvarchar data types are recommended for character types

2. Money data type is recommended for monetary value

3. Scientific counting recommended numeric data type

4. Self-growth logo recommended to use the bigint data type (a large amount of data, with the type of int will not fit, then the transformation will be troublesome)

5. Time type recommended for datetime data type

6. Prohibit the use of text, ntext, image old data type

7. Prohibit the use of XML data types, varchar (max), nvarchar (max)

Constraints and Indexes

Each table must have a primary key

Each table must have a primary key to enforce entity integrity
Single table can have only one primary key (not allowed to be empty and duplicate data)
Use single field primary key as much as possible

Foreign keys are not allowed

Foreign keys increase the complexity of table structure changes and data migrations
Foreign keys have an impact on insert, update performance, and need to check for primary foreign KEY constraints
Data integrity by Program control

Null property

Newly added table, all fields are forbidden null

(Why doesn't the new table allow Nulls?)
　　

Allowing null values increases the complexity of the application. You have to add specific logic code to prevent the occurrence of various unexpected bugs
　　

The three-valued logic, all queries of the equals sign ("=") must increase the isnull's judgment.
　　

Null=null, Null!=null, not (Null=null), not (null!=null) are unknown, not true)

For example, to illustrate:

If the data inside the table is:

You want to find all the data except the NAME equals AA, and then you inadvertently use the SELECT * from Nulltest WHERE name<> ' AA '

It turns out that it's not the same as expected, in fact it only finds Name=bb and doesn't find name=null data records.

So how do we find all the data except the name equals AA, only with the IsNull function?

SELECT * from Nulltest WHERE ISNULL (name,1) <> ' AA '

However, you may not know that isnull can cause serious performance bottlenecks, so it is often best to limit user input at the application level, ensuring that users enter valid data and then query.

Old table new Add field, need to allow null (avoid full table data Update, long-term lock caused blocking) (this is mainly considering the transformation of the previous table)

　 Index Design Guidelines

You should create an index on the columns that are used frequently in the WHERE clause
You should create an index on a column that is frequently used to join tables
You should create an index on the columns that are used frequently in the ORDER by clause
You should not create an index on a small table (a table that uses only a few pages) because a full table scan operation may be faster than a query executed with an index
The number of single-table indexes is not more than 6
Do not give low-selectivity characters Jianjian single-column index
Take advantage of UNIQUE constraints
Index contains no more than 5 fields (including include columns)

　 do not create a single-column index for low-selectivity fields

SQL Server has requirements for the selectivity of indexed fields, and if selectivity is too low, SQL Server discards the use of
Fields that are not suitable for index creation: Gender, 0/1, True/false
Fields suitable for index creation: ORDERID, UID, etc.

　 Take advantage of unique indexes

A unique index provides SQL Server with information that ensures that a column has absolutely no duplicate values, and when Query Analyzer finds a record through a unique index, it exits immediately without continuing to look up the index

Table index number not exceeding 6

There are no more than 6 table indexes (this rule was developed by Ctrip's DBA after a trial ...). ）

Indexes speed up queries, but affect write performance
The index of a table should be combined with all of the SQL that is associated with this table, merging as much as possible
The principle of combining indexes is that the better the filter is, the more the field is ahead
Too many indexes will not only increase compilation time, but also affect database selection best execution plan

　 SQL query

Prohibit complex operations in the database
Disable the use of SELECT *
Prohibit the use of functions or calculations on indexed columns
Disable cursor use
Disable the use of triggers
Prohibit specifying indexes in queries
The variable/parameter/associated field type must be the same as the field type
parameterized queries
Limit number of Joins
Limit the length of SQL statements and the number of in clauses
Try to avoid large transaction operations
Closes the affected row count information back
Unless necessary SELECT statements must be added Nolock
Replace union with UNION all
Querying large amounts of data using page or top
Recursive query Hierarchy Limits
Not exists instead of in
Temporary tables and table variables
Use local variables to select moderation Execution plan
Try to avoid using the OR operator
Increase transaction exception handling mechanism
Output columns use two-segment naming format

　 prohibit complex operations in the database

XML parsing
Comparison of string similarity
String Search (Charindex)
Complex operations are done at the terminal

　 Disable the use of select *

Reduce memory consumption and network bandwidth
Gives the query optimizer the opportunity to read the required columns from the index
Error in query when table structure changes

prohibit the use of functions or calculations on indexed columns

In the WHERE clause, if the index is part of a function, the optimizer will no longer use the index for full table scanning

Assuming an index is built on the field Col1, the following scenarios will not work with the index:

Abs[col1]=1

[Col1]+1>9

Let me give you an example.

Queries like the above will not be able to use the Printtime index on the o_orderprocess table, so we apply the query SQL as shown below

　 prohibit the use of functions or calculations on indexed columns

Assuming an index is built on the field Col1, the following scenarios will be available to the index:

[col1]=3.14

[col1]>100

[Col1] Between 0 and 99

[Col1] Like ' abc% '

[Col1] In (2,3,5,7)

　 index issues for like queries

1.[col1] Like "abc%"--index seek this uses the index query

2.[col1] Like "%abc%"--index scan and this is not used for index queries

3.[col1] Like "%abc"--index scan this also does not use the index query

I think from the above and three examples, you should understand that it is best not to use a fuzzy match in front of like conditions, otherwise you will not be able to use the index query.

　 Disable cursor Use

The relational database is suitable for collection operations, which is the set of result sets determined by the WHERE clause and the selection column, which is a way to provide a non-collection operation. In general, cursors implement functionality that is often equivalent to a loop implementation of the client.

Cursors are the result set placed in the server memory, and by looping a single processing record, the database resources (especially memory and lock resources) consumption is very large. (plus the cursor is really complicated, it is not good to use, try to use less)

　 Disable the use of triggers

The trigger is opaque to the application (the application level does not know when the trigger will be triggered, nor does it occur, and it feels inexplicable ...). ）

　 prohibit specifying indexes in queries

With (INDEX=XXX) (in the query we specify the index is generally used with (INDEX=XXX))

The index performance specified by the query statement may not be optimal as the data changes
The index should be transparent to the application, such as the deletion of the specified index will result in query error, not conducive to troubleshooting
The newly created index cannot be applied immediately and must be published code to take effect

　 The variable/parameter/associated field type must be the same as the field type (this is not what I was concerned about before)

Avoid the type conversion of additional CPU consumption, caused by the large table scan is particularly serious

Look at the above two pictures, I think I do not have to explain the explanation, everyone should have made it clear.

If the database field type is varchar, the best type in the application is specified as ansistring and its length is explicitly specified

If the database field type is char, the best type in the application is specified as Ansistringfixedlength and its length is explicitly specified

If the database field type is nvarchar, the best type in the application is specified as a string and its length is explicitly specified

　 Parameterized Queries

You can parameterize query SQL in the following ways:

sp_executesql
Prepared Queries
Stored procedures

Use the diagram to illustrate, haha.

　 Limit number of joins

The number of table joins for a single SQL statement cannot exceed 5
An excessive number of joins causes the query parser to go through the wrong execution plan
Too many joins consume a lot when compiling the execution plan

　 limit the number of conditions in the IN clause

Including a very large number of values in the IN clause (thousands) may consume resources and return an error 8623 or 8632, requiring that the number of conditions in the IN clause be limited to less than 100

　 try to avoid large transaction operations

Start transactions only when data needs to be updated, reducing resource lock holding time
Increase transaction exception capture preprocessing mechanism
Prohibit the use of distributed transactions on the database

To illustrate with a diagram

In other words, we should not commit tran after the 1000 rows of data have been updated, and think about whether you have exclusive resources to update the 1000 rows of data to prevent other transactions from being processed.

　 closes the affected row count information back

Display setting set NOCOUNT on in SQL statements, cancellation of affected row count information returned, reduced network traffic

Unless necessary SELECT statements must be added Nolock

Specifies that dirty reads are allowed. No shared locks are published to prevent other transactions from modifying the data read by the current transaction, and exclusive locks set by other transactions do not prevent the current transaction from reading the locked data. Allowing dirty reads can result in more concurrent operations, but at the cost of reading data modifications that are later rolled back by other transactions. This may cause errors in your transaction, display data that has never been submitted to the user, or cause the user to see the record two times (or not see the record at all)

Replace union with UNION all

Union will reorder the SQL result set, increasing CPU, memory and other consumption

　 querying large amounts of data using page or top

Reasonable limit record return number, avoid IO, network bandwidth bottleneck

　 recursive query hierarchy limit

Use Maxrecursion to prevent unreasonable recursive CTE from entering an infinite loop

　 temporary tables and table variables

　 use local variables to select moderation Execution Plan

In a stored procedure or query, a table with a very uneven data distribution is accessed, which often causes the stored procedure or query to use suboptimal and even poor execution plans, causing high CPU and large IO read problems, and using local variables to prevent the execution plan from going wrong.

In the way of local variables, SQL does not know the value of this local variable at compile time, when SQL will "guess" a return value based on the general distribution of the data in the table. The resulting plan is the same regardless of the value of the variable that the user is substituting for when invoking the stored procedure or statement. Such a plan will generally be more moderate, not necessarily the best plan, but generally not the worst plan

If the local variable in the query uses the inequality operator, the Query Analyzer uses a simple 30% calculation to estimate

Estimated rows = (total rows * 30)/100

If a local variable in the query uses the equality operator, the Query Analyzer uses: The total number of accuracy * table records to estimate

Estimated rows = Density * Total Rows

　 try to avoid using the OR operator

For the OR operator, a full table scan is usually used, considering that the decomposition into multiple queries is implemented with Union/union all, which confirms that the query can go to the index and return fewer result sets

　 increase transaction exception handling mechanism

Application to do unexpected processing, timely do rollback.
　　

Set connection Properties "set Xact_abort on"

　 output Columns use two-segment naming format

Two-segment naming format: Table name. Field Name

A tsql with a join relationship, the field must indicate which table the field belongs to, or the program compatibility error of ambiguous column name may occur after a future table structure change

　 Architecture Design

Read/write separation
Schema decoupling
Data life cycle

　 read/write separation

At the beginning of the design consider read and write separation, even read and write the same library, facilitate rapid expansion
Read into real-time reading and deferred reading to write and read libraries according to reading characteristics
Read/write separations should consider automatically switching to write-side when read is not available

Schema decoupling

Prohibit cross-Library join

　 Data life cycle

Archiving of large-scale periodic sub-libraries based on the frequency of use of data

Physical separation of the main library/archive library

　 Table of log type should be partitioned or divided into tables

For large tables to be partitioned, partition operations divide tables and indexes across multiple partitions, and partition switching enables fast new and old partition substitution, faster data cleanup and significantly reduced IO resource consumption

　 frequently written tables that require partitioning or sub-tables

Self-growth with latch Lock

The latch is SQL Server's own internal application and control, the user has no way to intervene, to ensure the consistency of the memory data structure, lock level is page-level lock

Common SQL Server Specification highlights

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More