The above mentioned are some basic notes for improving the query speed. However, in more cases, we often need to experiment with different statements repeatedly to obtain the best solution. The best way is to test the SQL statement that implements the same function, which has the least execution time. However, if the data volume in the database is small, it cannot be compared. In this case, you can view the execution plan, that is, you can obtain multiple SQL statements that implement the same function to the query analyzer, press Ctrl + L to view the indexes used for query, and the number of table scans (the two have the greatest impact on performance ), in general, check the cost percentage.
You can use the Wizard to automatically generate a simple stored procedure: Click the run wizard icon on the Enterprise Manager toolbar, and click "Database" and "create Stored Procedure wizard ". Debugging of complex stored procedures: the object browser on the left side of the query analyzer (no? Press F8) Select the stored procedure to be debugged, right-click, click debug, and enter the parameter for execution. A floating toolbar is displayed, including one-step execution and breakpoint settings.
I. unreasonable index design
---- For example, a table with 620000 rows of record and rows with different indexes has the following SQL statements:
---- 1. A non-cluster index is created on date.
Select count (*) from record where date>
'20140901' and date <'20140901' and amount>
2000 (25 seconds)
Select date, sum (amount) from record group by date
(55 seconds)
Select count (*) from record where date>
'123' and place in ('bj ', 'sh') (27 seconds)
---- Analysis:
---- There are a large number of duplicate values on date. In non-clustered indexes, data is physically stored on the data page at random
During range search, you must perform a table scan to find all rows in this range.
---- 2. A cluster index on Date
Select count (*) from record where date>
'20140901' and date <'20140901' and amount>
2000 (14 seconds)
Select date, sum (amount) from record group by date
(28 seconds)
Select count (*) from record where date>
'123' and place in ('bj ', 'sh') (14 seconds)
---- Analysis:
---- Under the cluster index, data is physically stored on the data page in order, and duplicate values are arranged together.
When searching, you can first find the start and end points of this range, and only scan the data page within this range to avoid
Perimeter scanning improves the query speed.
---- 3. composite indexes on place, date, and amount
Select count (*) from record where date>
'20140901' and date <'20140901' and amount>
2000 (26 seconds)
Select date, sum (amount) from record group by date
(27 seconds)
Select count (*) from record where date>
'123' and place in ('bj, 'sh') (<1 second)
---- Analysis:
---- This is an unreasonable composite index, because its leading column is place, and the first and second SQL statements do not reference
Place is used, so no index is used. The third SQL uses place, and all referenced columns are included in the group.
The index overwrite is formed, so it is very fast.
---- 4. composite indexes on date, place, and amount
Select count (*) from record where date>
'20140901' and date <'20140901' and amount>
2000 (<1 second)
Select date, sum (amount) from record group by date
(11 seconds)
Select count (*) from record where date>
'123' and place in ('bj ', 'sh') (<1 second)
---- Analysis:
---- This is a reasonable combination of indexes. It uses date as the leading column so that each SQL can use the index and
In addition, index coverage is formed in the first and third SQL statements, so the performance is optimal.
---- 5. Summary:
---- The index created by default is a non-clustered index, but sometimes it is not the best. A reasonable index design requires
It is based on the analysis and prediction of various queries. Generally speaking:
---- ① There are a large number of repeated values and frequent range queries
(Between, >,<, >=, <=) and order
For columns generated by group by, you can consider creating a cluster index;
---- ②. Multiple columns are frequently accessed at the same time, and each column contains duplicate values. You can consider creating a composite index;
---- ③ The composite index should try to overwrite key queries, and its leading column must be the most frequently used column.
2. Incomplete connection conditions:
---- For example, a table card contains 7896 rows, a non-clustered index on card_no, and a table account contains 191122 rows.
There is a non-clustered index on account_no. The execution of the two SQL statements of explain in different table connection conditions:
Select sum (A. Amount) from account,
Card B where a. card_no = B. card_no (20 seconds)
---- Change SQL:
Select sum (A. Amount) from account,
Card B where a. card_no = B. card_no and.
Account_no = B. account_no (<1 second)
---- Analysis:
---- In the first connection condition, the best query solution is to use the account as the outer table, and use the card as the inner table.
The number of I/O times of indexes on the card can be estimated by the following formula:
---- 22541 page + on the account of the outer table (the second row of the account of the outer table * Corresponds to the outer layer on the card of the inner table
3 pages to be searched in the first row of the table) = 595907 times I/O
---- In the second join condition, the best query solution is to use card as the outer table, and account as the inner table, using
The number of I/O times of an index on an account can be estimated by the following formula:
---- 1944 page + on the card of the outer table (the first row of the card of the outer table * corresponding to each of the outer tables on the account of the inner table
4 pages of the row to be searched) = 33528 times I/O
---- It can be seen that only a full set of connection conditions can be executed for the best solution.
---- Conclusion:
---- 1. Before a multi-table operation is executed, the query optimizer will list several groups of possible connected parties based on the connection conditions.
And find the best solution with the minimum system overhead. The connection conditions must fully consider the tables with indexes and the number of rows
Table; the choice of the internal and external table can be determined by the formula: Number of matched rows in the outer table * Number of times each query is performed in the inner table, multiplied
The minimum product is the best solution.
---- 2. view the execution method -- use set showplanon to open the showplan option and you will see the connection
The order of access and the index information used. For more detailed information, you need to use the SA role to execute DBCC (3604,310, 30
2 ).
3. Where clause that cannot be optimized
---- 1. For example, the columns in the following SQL condition statements have an appropriate index, but the execution speed is very slow:
Select * from record where
Substring (card_no, 5378) = '000000' (13 seconds)
Select * from record where
Amount/30 <1000 (11 seconds)
Select * from record where
Convert (char (10), date, 112) = '000000' (10 seconds)
---- Analysis:
---- Any column operation results in the WHERE clause are calculated by column during SQL Execution, so it has
The index on the column is not used for table search. If these results are obtained during query compilation
It can be optimized by the SQL optimizer and indexed to avoid table search. Therefore, rewrite the SQL statement as follows:
Select * from record where card_no like
'123' (<1 second)
Select * from record where amount
<1000*30 (<1 second)
Select * from record where date = '2014/1/01'
(<1 second)
---- You will find that SQL is obviously getting faster!
---- 2. For example, the stuff table has 200000 rows and the id_no table has non-clustered indexes. Please refer to the following SQL statement:
Select count (*) from stuff where id_no in ('0', '1 ')
(23 seconds)
---- Analysis:
---- 'In' in the where condition is logically equivalent to 'or', so the syntax analyzer converts in ('0', '1 ')
Id_no = '0' or id_no = '1. We expect it to search for each or clause separately, and then return the result
Add, so that the index on id_no can be used; but in fact (according to showplan), it uses the "or policy"
That is, first extract the rows that satisfy each or clause, store them to the worksheet of the temporary database, and then create a unique index to remove
Repeat the rows and finally calculate the results from this temporary table. Therefore, the id_no index is not used in the actual process.
The time is also affected by the performance of the tempdb database.
---- Practice has proved that the more rows in a table, the worse the performance of the worksheet. When stuff has 620000 rows
220 seconds! It is better to separate the or clause:
Select count (*) from stuff where id_no = '0'
Select count (*) from stuff where id_no = '1'
---- Two results are obtained, and the addition is worthwhile. Because each sentence uses an index, the execution time is only 3 seconds,
In the case of Row 3, the time is only 4 seconds. Or, write a simple stored procedure in a better way:
Create proc count_stuff
Declare @ A int
Declare @ B INT
Declare @ C int
Declare @ d char (10)
Begin
Select @ A = count (*) from stuff where id_no = '0'
Select @ B = count (*) from stuff where id_no = '1'
End
Select @ C = @ A + @ B
Select @ d = convert (char (10), @ C)
Print @ d
---- Calculate the result directly, and the execution time is as fast as above!
---- Conclusion:
---- It can be seen that the WHERE clause uses the index and cannot be optimized, that is, table scanning or additional overhead occurs.
---- 1. Any operations on columns will cause table scanning, including database functions and calculation expressions.
Move the operation to the right of the equal sign as much as possible.
---- 2.in, or clauses usually use worksheets to invalidate indexes. If a large number of duplicate values are not generated, consider
Separate the sub-statement. The split sub-statement should contain the index.
---- 3. Be good at using stored procedures to make SQL more flexible and efficient.
---- From the above examples, we can see that the essence of SQL optimization is to use the optimizer
To identify the statement and use indexes in full, reduce the number of I/O scans on the table, and avoid table search as much as possible. Actually s
The performance optimization of Ql is a complex process. These are only a manifestation of the application layer.
Resource Configuration involving the database layer, traffic control at the network layer, and overall design of the operating system layer.
1. Use indexes reasonably
Index is an important component of a Database. Many people ignore it. In fact, the fundamental purpose of index is
To improve query efficiency.
The usage principles are as follows:
For fields that are frequently connected but not specified as foreign keys
The optimizer automatically generates indexes.
Indexes are created on columns that are frequently sorted or grouped (that is, group by or order by operations.
Create a search for columns with different values that are frequently used in conditional expressions.
Create an index. For example, in the "gender" column of the employee table, there are only two different values: "male" and "female ".
No need to create an index. If an index is created, the query efficiency is not improved, but the update speed is greatly reduced.
.
If there are multiple columns to be sorted, you can create a compound index on these columns ).
When writing SQL statements, you must note that some statements make the database unable to use indexes, such as is null.
Is not null, in, not in, etc...
2. Avoid or simplify sorting
Duplicate sorting of large tables should be simplified or avoided. When indexes can be used to automatically generate data in the appropriate order
During output, the optimizer avoids sorting steps. The following are some influencing factors:
● The index does not contain one or more columns to be sorted;
● The Order of columns in the group by or order by clause is different from that of the index;
● Sort columns from different tables.
To avoid unnecessary sorting, you must correctly add indexes and reasonably merge database tables (although sometimes
It can affect table standardization, but it is worthwhile to Improve the efficiency ). If sorting is unavoidable
When trying to simplify it, such as narrowing the column range of sorting.
3. Eliminates sequential access to data in large table rows
In nested queries, sequential access to a table may have a fatal impact on query efficiency. For example, sequential Storage
Take the policy, a nested layer-3 query, if each layer queries 1000 rows, then this query needs to query 10
Million rows of data. The primary way to avoid this is to index the connected columns. For example, two tables: Learning
Table (student ID, name, age ......) And Course Selection form (student ID, course number, score ). If you want
To establish a connection, you must create an index on the connection field "student ID.
Union can also be used to avoid sequential access. Although all check columns have indexes, some forms
The where clause forces the optimizer to use sequential access. The following query forces sequential operations on the orders table
:
Select * from orders where (customer_num = 104 and order_num> 1001) or
Order_num= 1008
Although there is an index on customer_num and order_num, the optimizer in the preceding statement still uses
Scan the entire table using the sequential access path. This statement is used to retrieve the set of separated rows.
Modify the statement as follows:
Select * from orders where customer_num = 104 and order_num> 1001
Union
Select * from orders where order_num = 1008
In this way, you can use the index path to process queries.
4. Avoid related subqueries
The label of a column appears in both the primary query and the where clause.
After the column value is changed, the subquery must be re-queried. The more nesting layers, the lower the efficiency.
Avoid subqueries as much as possible. If subqueries are unavoidable, filter out as many subqueries as possible.
Line.
5. Avoid difficult Regular Expressions
Matches and like keywords support wildcard matching, technically called regular expressions. However, this matching is especially expensive.
Time-consuming. Example: Select * from customer where zipcode like "98 ___"
Even if an index is created on the zipcode field, sequential scanning is used in this case. For example
If you change the statement to select * from customer where zipcode> "98000 ",
The index will be used for query, which obviously greatly improves the speed.
In addition, avoid non-starting substrings. Example Statement: Select * from customer where
Zipcode [2, 3]> "80". A non-starting substring is used in the WHERE clause, so this statement does not
Use indexes.
6. Use temporary tables to accelerate queries. You can also use table variables in SQL2000 to replace temporary tables.
Sort a subset of a table and create a temporary table, which sometimes accelerates query. It helps avoid multiple sorting
In addition, the optimizer can be simplified in other aspects. For example:
Select Cust. Name, rcvbles. Balance ,...... Other Columns
From Cust, rcvbles
Where Cust. customer_id = rcvlbes. customer_id
And rcvblls. Balance> 0
And Cust. postcode> 98000"
Order by Cust. Name
If this query is executed multiple times but more than once, you can find all the unpaid customers and put them in one
Temporary files, sorted by customer name:
Select Cust. Name, rcvbles. Balance ,...... Other Columns
From Cust, rcvbles
Where Cust. customer_id = rcvlbes. customer_id
And rcvblls. Balance> 0
Order by Cust. Name
Into temp cust_with_balance
Then, query the temporary table in the following way:
Select * From cust_with_balance
Where postcode> 98000"
The number of rows in a temporary table is smaller than that in the primary table, and the physical order is the required order, reducing the disk size.
I/O, so the query workload can be greatly reduced.
Note: after a temporary table is created, the modification to the primary table is not reflected. Note the following when the data in the master table is frequently modified:
Do not lose data.
7. Use sorting to replace non-sequential access
Non-sequential disk access is the slowest operation, as shown in the back-and-forth movement of the disk inventory arm. The SQL statement hides this
First, we can easily write queries that require access to a large number of non-sequential pages when writing applications.
In some cases, the database sorting capability can be used to replace non-sequential access to improve queries.
If you do not understand the execution of SQL statements.
Remember this principle.
1.
Calculation of fields will cause a full table scan.
Therefore, it can be used:
Select * from table where field = 1
Do not use:
Select * from table where field-1 = 0
2.
Necessary indexes are important for improving data processing speed. Therefore, you must create indexes for fields that are frequently sorted and compared with conditions. (Note that compound indexes and separate indexes are distinguished)
3.
Be good at simplifying storage process/View
4.
Copy the query analysis statement that cannot determine the efficiency to the query analyzer. Press Ctrl + L for analysis.
Let's see which steps are required for its execution.
The percentage of time required for the execution of each step.
Whether the index you created is used in the table scan process.
If there are other query methods, analyze other query methods and compare them to determine the optimal query method.
Query Optimization suggestions
Some queries consume a large amount of resources by nature. This is related to basic database and index problems. These queries are not very efficient, because the query optimizer will implement these queries in the most effective way possible. However, they do consume a lot of resources, and the collection-oriented nature of transact-SQL makes these queries seem inefficient. The intelligence level of the query optimizer cannot eliminate the inherent resource costs of these structures. Compared with non-complex queries, these queries are inherently expensive. Although Microsoft? SQL Server? 2000 use the best access plan, but limited by the possibility of infrastructure. For example, the following types of queries may occupy a large amount of resources:
Returns a query of a large result set.
Highly unique WHERE clause
However, there are some suggestions for optimizing queries and improving query performance, including:
Add more memory (especially if the server runs many complex queries and several of them are executed slowly)
Run SQL server on a computer with multiple processors. Multiple Processors enable SQL Server to use parallel queries. For more information, see parallel query processing.
Re-compile the query.
If a query uses a cursor, determine whether a more efficient cursor type is used (for example, a quick forward cursor only) or a simple query can write a cursor query more effectively. The performance of queries alone is generally better than that of cursor operations. A set of cursor statements is usually an external loop operation. In this operation, once an internal statement is used, each row in the External Loop is processed, therefore, you can use the group by or case statement or subquery instead.
If the application uses a loop, you can consider placing the loop in the query. Applications often contain loops with parameterized queries, which are executed many times and require a network round-trip between the computer running the application and the SQL server. You can use a temporary table to create a more complex query. You only need to provide a network round-trip, and the query optimizer will better optimize this query.
Do not use multiple aliases for a single table in the same query to simulate index crossover. It is unnecessary to simulate index crossover because SQL server automatically considers index crossover and can use multiple indexes on the same table in the same query.
Use the query prompt only when necessary. If the query uses prompts executed in earlier versions of SQL Server, you should test the query without specifying a prompt. The prompt will prevent the query optimizer from selecting a better execution plan. For more information, see select.
Use query governor to configure options and settings. You can use the query Governor configuration option to prevent long-running queries from consuming system resources. By default, the query Governor configuration option allows all queries to be executed, regardless of the time required for the query. However, you can set the query controller to the maximum number of seconds to allow all queries for all connections or only queries for specific connections. The query controller has no runtime overhead because it is based on the estimated query cost rather than the actual used time. It stops a long-running query before it starts, instead of running the query until it reaches certain predefined limits.
SQL statement optimization principles:
============================
1. Use indexes to traverse tables faster.
The index created by default is a non-clustered index, but sometimes it is not optimal. Non-Cluster Index
The data is physically stored on the data page at random. Reasonable index design should be based on
Analyze and predict various queries. In general: ①. There are a large number of repeated values and frequent range queries
(Between, >,<, >=, <=) and order by, group by columns.
Create a cluster index; ②. Multiple columns are frequently accessed at the same time, and each column contains duplicate values. You can create a composite index.
③. The composite index should try to make the key Query Form an index overwrite, and its leading column must be
Is the most frequently used column. Although the index can improve performance, the more indexes, the better.
Leading causes system inefficiency. You can add an index to each table to maintain the index set.
The integration requires corresponding updates.
2. Is null and is not null
Null cannot be used as an index. Any column containing null values will not be included in the index. Even if the index has
If multiple columns contain null
Excluding the index. That is to say, if a column has a null value, even if the column is indexed, the performance will not be improved. Any
The statement optimizer that uses is null or is not null in the WHERE clause is not allowed.
You can use the index.
3. In and exists
Exists is far more efficient than in. It is related to full table scan and range scan. Almost
Some in operator subqueries are rewritten to subqueries using exists.
4. Use as few formats as possible for massive queries.
5. In SQL Server 2000, if the stored procedure has only one parameter and is of the output type
You must give this parameter an initial value when calling this stored procedure. Otherwise
A call error occurs.
6. Order by and gropu
Using the 'ORDER BY' and 'group by' phrases, any index can improve select performance. Note:
If the index column contains a null value, optimizer cannot optimize it.
7. Any column operation will cause table scanning, including database functions and calculation expressions.
Move the operation to the right of the equal sign as much as possible.
8. In And or clauses usually use worksheets to invalidate indexes. If you do not generate a large number of duplicate values, consider
Separate the clause. The split clause should contain the index.
9. Set showplan_all on to view the execution plan. DBCC checks database data integrity.
DBCC (Database Consistency Checker) is a group of SQL Server data used to verify
Library integrity program.
10. Use cursor with caution
In some cases where a cursor must be used, you can consider transferring qualified data rows to a temporary table before
The table defines the cursor to perform operations, which can significantly improve the performance.
SQL Server has several tools that let you detect, adjust, and optimize SQL server performance. In this article, I will explain how to use SQL server tools to optimize the use of database indexes. This article also involves general knowledge about indexes.
Common knowledge about Indexes
Index is the biggest factor affecting database performance. Due to the complexity of the problem, I can only talk about it briefly. However, there are several good books for you to refer. Here I will only discuss two types of SQL Server indexes: clustered index and nonclustered index. When examining the types of indexes, you should consider the data type and the column that stores the data. Similarly, you must consider the types of queries that the database may use and the most frequently used types of queries.
Index type
If column stores highly relevant data and is frequently accessed in sequence, it is best to use the clustered index. This is because if the clustered index is used, the SQL Server physically goes in ascending order (default) or sort the data columns in descending order to quickly find the queried data. Similarly, when the search is controlled within a certain range, it is best to use clustered indexes for these columns. This is because there is only one clustered index on each table because of the physical data rearrangement.
In contrast to the above, if columns contains poor data relevance, you can use the nonculstered index. You can use up to 249 nonclustered indexes in a table-although I cannot imagine that so many indexes will be used in practical applications.
When a table uses the primary key (primary keys), SQL Server automatically creates a unique Cluster Index for the column (s) containing the key by default. Obviously, creating a unique index for these columns (s) means that the primary key is unique. When establishing a foreign key relationship, if you plan to use it frequently, it is a good method to create a nonclustered index on the external key cloumn. If a table has a clustered index, it uses a linked list to maintain the relationship between data pages. Conversely, if the table does not have a clustered index, SQL Server saves the data page in a stack.
Data Page
When an index is created, sqlserver creates a data page (datapage), which is a pointer to accelerate search. When an index is created, the corresponding fill factor is also set. The fill factor is set to indicate the percentage of data pages in the index. Over time, database updates will consume existing free space, which will cause the page to be split. The consequence of page splitting is that the index performance is reduced. Therefore, queries using this index will result in fragmented data storage. When an index is created, the fill factor of the index is set. Therefore, the fill factor cannot be dynamically maintained.
To update the fill factor on the data page, we can stop the old index, re-create the index, and re-set the fill factor (note: this will affect the operation of the current database, use it with caution in important cases ). DBCC indexdefrag and DBCC dbreindex are two commands used to clear tered and nonculstered index fragments. Indexdefrag is an online operation (that is, it does not block other table actions, such as queries), while dbreindex physically reconstructs the index. In most cases, re-indexing can better eliminate fragmentation, but this advantage is to block other actions on the table where the index is currently located at the cost. When a large fragmented index occurs, indexdefrag takes a long time because the command is run based on a small interactive block (transactional block ).
Fill Factor
When you execute any of the above measures, the database engine can return the indexed data more effectively. The fillfactor topic is beyond the scope of this article, but I still remind you to pay attention to the tables that intend to use the fill factor to create an index.
When executing a query, SQL Server dynamically selects which index to use. Therefore, SQL Server determines which index to use based on the statistics distributed on this keyword on each index. It is worth noting that, after daily database activities (such as inserting, deleting, and updating tables), these statistics used by SQL Server may have expired and need to be updated. You can run DBCC showcontig to view the statistics status. When you think that the statistic has expired, you can execute the update statistics command of the table, so that SQL Server refreshes the information about the index.
Create a database maintenance plan
SQL Server provides a tool to simplify and automatically maintain databases. The database maintenance plan wizard (dmpw) tool also includes index optimization. If you run this wizard, you will see the index statistics in the database. These statistics are used as logs and updated regularly, which reduces the workload caused by manual Index reconstruction. If you do not want to automatically refresh the index statistics on a regular basis, you can also choose to re-organize the data and data pages in dmpw. This will stop the old indexes and re-create indexes based on specific fill factors.
Speed, which affects it by too many factors, and the larger the data size, the more obvious.
1. Storage
2. tempdb
3. Log Files
4. Partition View
5. Cluster Index
Your table must have a cluster index. When you use a cluster index for query, the block query is the fastest. If you use between, it should be physically continuous, you should try to reduce the updaet on it, because this can make it physically discontinuous.
6. Non-Cluster Index
Non-cluster indexes have nothing to do with the physical order. When designing a non-cluster index, it must have a high degree of selectivity, which can improve the query speed. However, during table update, these non-cluster indexes will affect the speed and occupy a large amount of space, if you are willing to exchange space and modification time, you can consider the speed.
7. Index View
If an index is created on a view, the result set of the view is stored, and the query performance can be improved significantly, however, when the update statement is used, it also seriously reduces the performance and is generally used in a data warehouse with relatively stable data.
8. Maintain Indexes
After you have created indexes, regular maintenance is very important. You can use DBCC showcontig to observe page density and scan density, and use DBCC indexdefrag to sort out table or view indexes in a timely manner, if necessary, using DBCC dbreindex to reconstruct the index can be very effective.