Reprint NOTE: It is said to be a company (Ctrip) internal specifications.
Common field type selection
1. Varchar/nvarchar data types are recommended for character types
2. Money data type is recommended for monetary value
3. Scientific counting recommended numeric data type
4. Self-growth logo recommended to use the bigint data type (a large amount of data, with the type of int will not fit, then the transformation will be troublesome)
5. Time type recommended for datetime data type
6. Prohibit the use of text, ntext, image old data type
7. Prohibit the use of XML data types, varchar (max), nvarchar (max)
Constraints and indexes
Each table must have a primary key
- Each table must have a primary key to enforce entity integrity
- Single table can have only one primary key (not allowed to be empty and duplicate data)
- Use single field primary key as much as possible
Foreign keys are not allowed
- Foreign keys increase the complexity of table structure changes and data migrations
- Foreign keys have an impact on insert, update performance, and need to check for primary foreign KEY constraints
- Data integrity by Program control
Null property
Newly added table, all fields are forbidden null
(Why doesn't the new table allow Nulls?)
Allowing null values increases the complexity of the application. You have to add specific logic code to prevent the occurrence of various unexpected bugs
The three-valued logic, all queries of the equals sign ("=") must increase the isnull's judgment.
Null=null, Null!=null, not (Null=null), not (null!=null) are unknown, not true)
For example, to illustrate:
If the data inside the table is:
You want to find all the data except the NAME equals AA, and then you inadvertently use the SELECT * from Nulltest WHERE name<> ' AA '
It turns out that it's not the same as expected, in fact it only finds Name=bb and doesn't find name=null data records.
So how do we find all the data except the name equals AA, only with the IsNull function?
SELECT * from Nulltest WHERE ISNULL (name,1) <> ' AA '
However, you may not know that isnull can cause serious performance bottlenecks, so it is often best to limit user input at the application level, ensuring that users enter valid data and then query.
Old table new Add field, need to allow null (avoid full table data Update, long-term lock caused blocking) (this is mainly considering the transformation of the previous table)
Index Design Guidelines
- You should create an index on the columns that are used frequently in the WHERE clause
- You should create an index on a column that is frequently used to join tables
- You should create an index on the columns that are used frequently in the ORDER by clause
- You should not create an index on a small table (a table that uses only a few pages) because a full table scan operation may be faster than a query executed with an index
- The number of single-table indexes is not more than 6
- Do not give low-selectivity characters Jianjian single-column index
- Take advantage of UNIQUE constraints
- Index contains no more than 5 fields (including include columns)
Do not create a single-column index for low-selectivity fields
- SQL Server has requirements for the selectivity of indexed fields, and if selectivity is too low, SQL Server discards the use of
- Fields that are not suitable for index creation: Gender, 0/1, True/false
- Fields suitable for index creation: ORDERID, UID, etc.
Take advantage of unique indexes
A unique index provides SQL Server with information that ensures that a column has absolutely no duplicate values, and when Query Analyzer finds a record through a unique index, it exits immediately without continuing to look up the index
Table index number not exceeding 6
There are no more than 6 table indexes (this rule was developed by Ctrip's DBA after a trial ...). )
- Indexes speed up queries, but affect write performance
- The index of a table should be combined with all of the SQL that is associated with this table, merging as much as possible
- The principle of combining indexes is that the better the filter is, the more the field is ahead
- Too many indexes will not only increase compilation time, but also affect database selection best execution plan
SQL query
- Prohibit complex operations in the database
- Disable the use of SELECT *
- Prohibit the use of functions or calculations on indexed columns
- Disable cursor use
- Disable the use of triggers
- Prohibit specifying indexes in queries
- The variable/parameter/associated field type must be the same as the field type
- parameterized queries
- Limit number of Joins
- Limit the length of SQL statements and the number of in clauses
- Try to avoid large transaction operations
- Closes the affected row count information back
- Unless necessary SELECT statements must be added Nolock
- Replace union with UNION all
- Querying large amounts of data using page or top
- Recursive query Hierarchy Limits
- Not exists instead of in
- Temporary tables and table variables
- Use local variables to select moderation Execution plan
- Try to avoid using the OR operator
- Increase transaction exception handling mechanism
- Output columns use two-segment naming format
Prohibit complex operations in the database
- XML parsing
- Comparison of string similarity
- String Search (Charindex)
- Complex operations are done at the terminal
Disable the use of SELECT *
- Reduce memory consumption and network bandwidth
- Gives the query optimizer the opportunity to read the required columns from the index
- Error in query when table structure changes
Prohibit the use of functions or calculations on indexed columns
Prohibit the use of functions or calculations on indexed columns
In the WHERE clause, if the index is part of a function, the optimizer will no longer use the index for full table scanning
Assuming an index is built on the field Col1, the following scenarios will not work with the index:
Abs[col1]=1
[Col1]+1>9
Let me give you an example.
Queries like the above will not be able to use the Printtime index on the o_orderprocess table, so we apply the query SQL as shown below
Prohibit the use of functions or calculations on indexed columns
Assuming an index is built on the field Col1, the following scenarios will be available to the index:
[col1]=3.14
[col1]>100
[Col1] Between 0 and 99
[Col1] Like ' abc% '
[Col1] In (2,3,5,7)
Index issues for like queries
1.[col1] Like "abc%" –index seek this uses the index query
2.[col1] Like "%abc%" –index scan and this is not used for index queries
3.[col1] Like "%abc" –index scan this also does not use the index query
I think from the above and three examples, you should understand that it is best not to use a fuzzy match in front of like conditions, otherwise you will not be able to use the index query.
Disable cursor use
The relational database is suitable for collection operations, which is the set of result sets determined by the WHERE clause and the selection column, which is a way to provide a non-collection operation. In general, cursors implement functionality that is often equivalent to a loop implementation of the client.
Cursors are the result set placed in the server memory, and by looping a single processing record, the database resources (especially memory and lock resources) consumption is very large.
(plus the cursor is really complicated, it is not good to use, try to use less)
Disable the use of triggers
The trigger is opaque to the application (the application level does not know when the trigger will be triggered, nor does it occur, and it feels inexplicable ...). )
Prohibit specifying indexes in queries
With (INDEX=XXX) (in the query we specify the index is generally used with (INDEX=XXX))
- The index performance specified by the query statement may not be optimal as the data changes
- The index should be transparent to the application, such as the deletion of the specified index will result in query error, not conducive to troubleshooting
- The newly created index cannot be applied immediately and must be published code to take effect
The variable/parameter/associated field type must be the same as the field type (this is not what I was concerned about before)
Avoid the type conversion of additional CPU consumption, caused by the large table scan is particularly serious
Look at the above two pictures, I think I do not have to explain the explanation, everyone should have made it clear.
If the database field type is varchar, the best type in the application is specified as ansistring and its length is explicitly specified
If the database field type is char, the best type in the application is specified as Ansistringfixedlength and its length is explicitly specified
If the database field type is nvarchar, the best type in the application is specified as a string and its length is explicitly specified
parameterized queries
You can parameterize query SQL in the following ways:
sp_executesql
Prepared Queries
Stored procedures
Use the diagram to illustrate, haha.
Limit number of Joins
- The number of table joins for a single SQL statement cannot exceed 5
- An excessive number of joins causes the query parser to go through the wrong execution plan
- Too many joins consume a lot when compiling the execution plan
Limit the number of conditions in the IN clause
Including a very large number of values in the IN clause (thousands) may consume resources and return an error 8623 or 8632, requiring that the number of conditions in the IN clause be limited to less than 100
Try to avoid large transaction operations
- Start transactions only when data needs to be updated, reducing resource lock holding time
- Increase transaction exception capture preprocessing mechanism
- Prohibit the use of distributed transactions on the database
To illustrate with a diagram
In other words, we should not commit tran after the 1000 rows of data have been updated, and think about whether you have exclusive resources to update the 1000 rows of data to prevent other transactions from being processed.
Closes the affected row count information back
Display setting set NOCOUNT on in SQL statements, cancellation of affected row count information returned, reduced network traffic
Unless necessary SELECT statements must be added Nolock
Unless necessary, try to have all the SELECT statements added Nolock
Specifies that dirty reads are allowed. No shared locks are published to prevent other transactions from modifying the data read by the current transaction, and exclusive locks set by other transactions do not prevent the current transaction from reading the locked data. Allowing dirty reads can result in more concurrent operations, but at the cost of reading data modifications that are later rolled back by other transactions. This may cause errors in your transaction, display data that has never been submitted to the user, or cause the user to see the record two times (or not see the record at all)
Replace union with UNION all
Replace union with UNION all
Union will reorder the SQL result set, increasing CPU, memory and other consumption
Querying large amounts of data using page or top
Reasonable limit record return number, avoid IO, network bandwidth bottleneck
Recursive query hierarchy limit
Use Maxrecursion to prevent unreasonable recursive CTE from entering an infinite loop
Temporary tables and table variables
Use local variables to select moderation Execution plan
In a stored procedure or query, a table with a very uneven data distribution is accessed, which often causes the stored procedure or query to use suboptimal and even poor execution plans, causing high CPU and large IO read problems, and using local variables to prevent the execution plan from going wrong.
In the way of local variables, SQL does not know the value of this local variable at compile time, when SQL will "guess" a return value based on the general distribution of the data in the table. The resulting plan is the same regardless of the value of the variable that the user is substituting for when invoking the stored procedure or statement. Such a plan will generally be more moderate, not necessarily the best plan, but generally not the worst plan
If the local variable in the query uses the inequality operator, the Query Analyzer uses a simple 30% calculation to estimate
Estimated rows = (total rows * 30)/100
If a local variable in the query uses the equality operator, the Query Analyzer uses: The total number of accuracy * table records to estimate
Estimated rows = Density * Total Rows
Try to avoid using the OR operator
For the OR operator, a full table scan is usually used, considering that the decomposition into multiple queries is implemented with Union/union all, which confirms that the query can go to the index and return fewer result sets
Increase transaction exception handling mechanism
Application to do unexpected processing, timely do rollback.
Set connection Properties "set Xact_abort on"
Output columns use two-segment naming format
Two-segment naming format: Table name. Field Name
A tsql with a join relationship, the field must indicate which table the field belongs to, or the program compatibility error of ambiguous column name may occur after a future table structure change
Architecture Design
- Read/write separation
- Schema decoupling
- Data life cycle
Read/write separation
- At the beginning of the design consider read and write separation, even read and write the same library, facilitate rapid expansion
- Read into real-time reading and deferred reading to write and read libraries according to reading characteristics
- Read/write separations should consider automatically switching to write-side when read is not available
Schema decoupling
Prohibit cross-Library join
Data life cycle
Archiving of large-scale periodic sub-libraries based on the frequency of use of data
Physical separation of the main library/archive library
Table of log type should be partitioned or divided into tables
For large tables to be partitioned, partition operations divide tables and indexes across multiple partitions, and partition switching enables fast new and old partition substitution, faster data cleanup and significantly reduced IO resource consumption
Frequently written tables that require partitioning or sub-tables
Self-growth with latch Lock
The latch is SQL Server's own internal application and control, the user has no way to intervene, to ensure the consistency of the memory data structure, lock level is page-level lock
[Reprint] Common SQL Server Specification collection