[Import] [reprint] to make SQL run faster

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. unreasonable index design
---- For example, a table with 620000 rows of record and rows with different indexes has the following SQL statements:
---- 1. A non-cluster index is created on date.
Select count (*) from record where date>
'20140901' and date <'20140901' and amount>
2000 (25 seconds)
Select date, sum (amount) from record group by date
(55 seconds)
Select count (*) from record where date>
'123' and place in ('bj ', 'sh') (27 seconds)
---- Analysis:
---- There are a large number of duplicate values on date. In non-clustered indexes, data is physically stored on the data page at random
During range search, you must perform a table scan to find all rows in this range.
---- 2. A cluster index on Date
Select count (*) from record where date>
'20140901' and date <'20140901' and amount>
2000 (14 seconds)
Select date, sum (amount) from record group by date
(28 seconds)
Select count (*) from record where date>
'123' and place in ('bj ', 'sh') (14 seconds)
---- Analysis:
---- Under the cluster index, data is physically stored on the data page in order, and duplicate values are arranged together.
When searching, you can first find the start and end points of this range, and only scan the data page within this range to avoid
Perimeter scanning improves the query speed.
---- 3. composite indexes on place, date, and amount
Select count (*) from record where date>
'20140901' and date <'20140901' and amount>
2000 (26 seconds)
Select date, sum (amount) from record group by date
(27 seconds)
Select count (*) from record where date>
'123' and place in ('bj ', 'sh') (<1 second)
---- Analysis:
---- This is an unreasonable composite index, because its leading column is place, and the first and second SQL statements do not reference
Place is used, so no index is used. The third SQL uses place, and all referenced columns are included in the group.
The index overwrite is formed, so it is very fast.
---- 4. composite indexes on date, place, and amount
Select count (*) from record where date>
'20140901' and date <'20140901' and amount>
2000 (<1 second)
Select date, sum (amount) from record group by date
(11 seconds)
Select count (*) from record where date>
'123' and place in ('bj ', 'sh') (<1 second)
---- Analysis:
---- This is a reasonable combination of indexes. It uses date as the leading column so that each SQL can use the index and
In addition, index coverage is formed in the first and third SQL statements, so the performance is optimal.
---- 5. Summary:
---- The index created by default is a non-clustered index, but sometimes it is not the best. A reasonable index design requires
It is based on the analysis and prediction of various queries. Generally speaking:
---- ① There are a large number of repeated values and frequent range queries
(Between, >,<, >=, <=) and order
For columns generated by group by, you can consider creating a cluster index;
---- ②. Multiple columns are frequently accessed at the same time, and each column contains duplicate values. You can consider creating a composite index;
---- ③ The composite index should try to overwrite key queries, and its leading column must be the most frequently used column. 2. Incomplete connection conditions:
---- For example, a table card contains 7896 rows, a non-clustered index on card_no, and a table account contains 191122 rows.
There is a non-clustered index on account_no. Explain execution of two SQL statements under different table connection conditions: Select sum (A. Amount) from account,
Card B where a. card_no = B. card_no (20 seconds)
---- Change SQL:
Select sum (A. Amount) from account,
Card B where a. card_no = B. card_no and.
Account_no = B. account_no (<1 second)
---- Analysis:
---- In the first connection condition, the best query solution is to use the account as the outer table, and use the card as the inner table.
The number of I/O times of indexes on the card can be estimated by the following formula:
---- 22541 page + on the account of the outer table (the second row of the account of the outer table * Corresponds to the outer layer on the card of the inner table
3 pages to be searched in the first row of the table) = 595907 times I/O
---- In the second join condition, the best query solution is to use card as the outer table, and account as the inner table, using
The number of I/O times of an index on an account can be estimated by the following formula:
---- 1944 page + on the card of the outer table (the first row of the card of the outer table * corresponding to each of the outer tables on the account of the inner table
4 pages of the row to be searched) = 33528 times I/O
---- It can be seen that only a full set of connection conditions can be executed for the best solution.
---- Conclusion:
---- 1. Before a multi-table operation is executed, the query optimizer will list several groups of possible connected parties based on the connection conditions.
And find the best solution with the minimum system overhead. The connection conditions must fully consider the tables with indexes and the number of rows
Table; the choice of the internal and external table can be determined by the formula: Number of matched rows in the outer table * Number of times each query is performed in the inner table, multiplied
The minimum product is the best solution.
---- 2. view the execution method -- use set showplanon to open the showplan option and you will see the connection
The order of access and the index information used. For more detailed information, you need to use the SA role to execute DBCC (3604,310, 30
2 ).
3. Where clause that cannot be optimized
---- 1. For example, the columns in the following SQL condition statements have an appropriate index, but the execution speed is very slow:
Select * from record where
Substring (card_no, 5378) = '000000' (13 seconds)
Select * from record where
Amount/30 <1000 (11 seconds)
Select * from record where
Convert (char (10), date, 112) = '000000' (10 seconds)
---- Analysis:
---- Any column operation results in the WHERE clause are calculated by column during SQL Execution, so it has
The index on the column is not used for table search. If these results are obtained during query compilation
It can be optimized by the SQL optimizer and indexed to avoid table search. Therefore, rewrite the SQL statement as follows:
Select * from record where card_no like
'123' (<1 second)
Select * from record where amount
<1000*30 (<1 second)
Select * from record where date = '2014/1/01'
(<1 second)
---- You will find that SQL is obviously getting faster!
---- 2. For example, the stuff table has 200000 rows and the id_no table has non-clustered indexes. Please refer to the following SQL statement:
Select count (*) from stuff where id_no in ('0', '1 ')
(23 seconds)
---- Analysis:
---- 'In' in the where condition is logically equivalent to 'or', so the syntax analyzer converts in ('0', '1 ')
Id_no = '0' or id_no = '1. We expect it to search for each or clause separately, and then return the result
Add, so that the index on id_no can be used; but in fact (according to showplan), it uses the "or policy"
That is, first extract the rows that satisfy each or clause, store them to the worksheet of the temporary database, and then create a unique index to remove
Repeat the rows and finally calculate the results from this temporary table. Therefore, the id_no index is not used in the actual process.
The time is also affected by the performance of the tempdb database.
---- Practice has proved that the more rows in a table, the worse the performance of the worksheet. When stuff has 620000 rows
220 seconds! It is better to separate the or clause:
Select count (*) from stuff where id_no = '0'
Select count (*) from stuff where id_no = '1'
---- Two results are obtained, and the addition is worthwhile. Because each sentence uses an index, the execution time is only 3 seconds,
In the case of Row 3, the time is only 4 seconds. Or, write a simple stored procedure in a better way:
Create proc count_stuff
Declare @ A int
Declare @ B INT
Declare @ C int
Declare @ d char (10)
Begin
Select @ A = count (*) from stuff where id_no = '0'
Select @ B = count (*) from stuff where id_no = '1'
End
Select @ C = @ A + @ B
Select @ d = convert (char (10), @ C)
Print @ d
---- Calculate the result directly, and the execution time is as fast as above!
---- Conclusion:
---- It can be seen that the WHERE clause uses the index and cannot be optimized, that is, table scanning or additional overhead occurs. ---- 1. Any operations on columns will cause table scanning, including database functions and calculation expressions.
Move the operation to the right of the equal sign as much as possible.
---- 2.in, or clauses usually use worksheets to invalidate indexes. If a large number of duplicate values are not generated, consider
Separate the sub-statement. The split sub-statement should contain the index.
---- 3. Be good at using stored procedures to make SQL more flexible and efficient.
---- From the above examples, we can see that the essence of SQL optimization is to use the optimizer
To identify the statement and use indexes in full, reduce the number of I/O scans on the table, and avoid table search as much as possible. Actually s
The performance optimization of Ql is a complex process. These are only a manifestation of the application layer.
Resource Configuration involving the database layer, traffic control at the network layer, and overall design of the operating system layer.
1. Use indexes reasonably
An index is an important data structure in a database. Its fundamental goal is to improve query efficiency. Currently, most database products adopt the isam index structure first proposed by IBM. The index should be used properly. The usage principles are as follows:
● The optimizer automatically generates an index for fields that are frequently connected but not specified as foreign keys.
● Index the columns that are frequently sorted or grouped (that is, group by or order by operations.
● Create a search for columns with different values that are frequently used in conditional expressions. Do not create an index for columns with fewer values. For example, in the "gender" column of the employee table, there are only two different values: "male" and "female", so there is no need to create an index. If an index is created, the query efficiency is not improved, but the update speed is greatly reduced.
● If there are multiple columns to be sorted, you can create a compound index on these columns ).
● Use system tools. For example, the Informix database has a tbcheck tool that can be checked on suspicious indexes. On some database servers, the index may be invalid or the reading efficiency may be reduced due to frequent operations. If an index-based Query slows down, you can use the tbcheck tool to check the index integrity, fix the issue if necessary. In addition, when a database table updates a large amount of data, deleting and re-indexing can increase the query speed. 2. Avoid or simplify sorting
Duplicate sorting of large tables should be simplified or avoided. When indexes can be used to automatically generate output in the appropriate order, the optimizer avoids the sorting step. The following are some influencing factors:
● The index does not contain one or more columns to be sorted;
● The Order of columns in the group by or order by clause is different from that of the index;
● Sort columns from different tables.
In order to avoid unnecessary sorting, We need to correctly add indexes and reasonably merge database tables (although it may affect table standardization sometimes, it is worthwhile to Improve the efficiency ). If sorting is unavoidable, you should try to simplify it, such as narrowing the column range of sorting. 3. Eliminates sequential access to data in large table rows
In nested queries, sequential access to a table may have a fatal impact on query efficiency. For example, the sequential access policy is used to create a nested layer-3 query. IF 1000 rows are queried at each layer, 1 billion rows of data are queried. The primary way to avoid this is to index the connected columns. For example, two tables: Student table (student ID, name, age ......) And Course Selection form (student ID, course number, score ). If you want to connect two tables, you need to create an index on the join field "student ID.
Union can also be used to avoid sequential access. Although all check columns are indexed, some forms of where clauses force the optimizer to use sequential access. The following query forces sequential operations on the orders table:
Select * from orders where (customer_num = 104 and order_num> 1001) or order_num = 1008
Although indexes are created on customer_num and order_num, the optimizer still uses sequential access paths to scan the entire table in the preceding statement. Because this statement is used to retrieve the set of separated rows, it should be changed to the following statement:
Select * from orders where customer_num = 104 and order_num> 1001
Union
Select * from orders where order_num = 1008
In this way, you can use the index path to process queries. 4. Avoid related subqueries
The label of a column appears in both the primary query and the where clause query. It is very likely that after the column value in the primary query changes, the subquery must perform a new query. The more nested query layers, the lower the efficiency. Therefore, avoid subqueries as much as possible. If the subquery is unavoidable, filter as many rows as possible in the subquery. 5. Avoid difficult Regular Expressions
Matches and like keywords support wildcard matching, technically called regular expressions. However, this matching is especially time-consuming. Example: Select * from customer where zipcode like "98 ___"
Even if an index is created on the zipcode field, sequential scanning is used in this case. If you change the statement to select * from customer where zipcode> "98000", the query will be executed using the index, which will obviously increase the speed.
In addition, avoid non-starting substrings. For example, if select * from customer where zipcode [2, 3]> "80" is used in the WHERE clause, non-starting substrings are used. Therefore, this statement does not use indexes. 6. Use temporary tables to accelerate queries
Sort a subset of a table and create a temporary table, which sometimes accelerates query. It helps avoid multiple sorting operations and simplifies the optimizer's work in other aspects. For example:
Select Cust. Name, rcvbles. Balance ,...... Other Columns
From Cust, rcvbles
Where Cust. customer_id = rcvlbes. customer_id
And rcvblls. Balance> 0
And Cust. postcode> 98000"
Order by Cust. Name
If this query is executed multiple times but more than once, you can find all the unpaid customers in a temporary file and sort them by customer name:
Select Cust. Name, rcvbles. Balance ,...... Other Columns
From Cust, rcvbles
Where Cust. customer_id = rcvlbes. customer_id
And rcvblls. Balance> 0
Order by Cust. Name
Into temp cust_with_balance
Then, query the temporary table in the following way:
Select * From cust_with_balance
Where postcode> 98000"
The temporary table has fewer rows than the primary table, and the physical order is the required order, which reduces disk I/O, so the query workload can be greatly reduced.
Note: after a temporary table is created, the modification to the primary table is not reflected. Do not lose data when the data in the master table is frequently modified. 7. Use sorting to replace non-sequential access
Non-sequential disk access is the slowest operation, as shown in the back-and-forth movement of the disk inventory arm. SQL statements hide this situation, making it easy for us to write a query that requires access to a large number of non-sequential pages when writing an application.
In some cases, the database sorting capability can be used to replace non-sequential access to improve queries. 3. Optimize tempdb Performance
General suggestions for setting the physical location and Database options of the tempdb database include:
Enable Automatic scaling of the tempdb database as needed. This ensures that the query is not terminated before execution is complete. The intermediate result set generated by the query stored in the tempdb database is much larger than expected. Set the initial size of the tempdb database file to a reasonable size to avoid Automatic File expansion when more space is needed. If the tempdb database is extended too frequently, performance will be adversely affected. Set the file increment percentage to a reasonable size to avoid the increase in the size of the tempdb database file. If the file growth rate is too small compared to the amount of data written into the tempdb database, the tempdb database may need to be expanded all the time, compromising performance. Place the tempdb database in the fast I/O subsystem to ensure good performance. Strip the tempdb database on multiple disks for better performance. Place the tempdb database on a disk other than the disk used by the user database. For more information, see expand database.
4. Optimize the server: Use the memory configuration option to optimize the server performance
The Microsoft & reg; SQL Server & #8482; 2000 memory management component eliminates the need to manually manage the available memory of SQL Server. At startup, SQL Server dynamically determines the amount of memory to be allocated based on the amount of memory currently used by the operating system and other applications. When the load on the computer and SQL Server changes, the allocated memory also changes. For more information, see memory architecture. The following server configuration options can be used to configure memory usage and affect server performance:
Min Server Memory
Max Server Memory
Max worker threads
Index Create memory min memory per Query
The min Server Memory server configuration option can be used to ensure that the memory is not released after the SQL server reaches this value. You can set this configuration option to a specific value based on the size and activity of SQL Server. If you choose to set this option, you must set enough memory for the operating system and other programs. If the operating system does not have enough memory, it will request memory from SQL Server, which affects SQL server performance. The max Server Memory server configuration option can be used to specify the maximum memory size that SQL server can allocate when SQL Server is started and running. If you know that multiple applications run simultaneously with SQL Server and want to ensure that these applications have enough memory to run, you can set this configuration option to a specific value. If these other applications (such as Web servers or email servers) only request memory as needed, SQL Server releases the memory as needed, therefore, do not set the max Server Memory server configuration option. However, applications generally use available memory without being falsely started, and do not request more memory. If an application with this behavior method runs on the same computer as SQL Server, set the configuration option of the Max Server Memory server to a specific value, to ensure that the memory required by the application is not allocated by the SQL server.
Do not set the configuration options of the min server memory and Max Server Memory servers to the same value. In this way, the memory allocated to the SQL Server is fixed. Dynamic memory allocation can provide optimal overall performance over time. For more information, see server memory options. The max worker threads server configuration option can be used to specify the number of threads that a user can connect to SQL Server. 255 by default, some configurations may be slightly higher, depending on the number of concurrent users. Since each worker thread has been allocated, other operations (such as high-speed buffer memory) can be performed even if the thread is not in use (because the concurrent connection is less than the assigned worker thread) memory resources that are better utilized may also be unused. Generally, this configuration value should be set to the number of concurrent connections, but cannot exceed 32727. Concurrent connections are different from user logon connections. The worker thread pool of the SQL server instance must be large enough to provide services for user connections that are simultaneously performing batch processing in the instance. If the number of worker threads to be added exceeds the default value, the server performance will be reduced. For more information, see the max worker threads option.
Note: When SQL server runs on Microsoft Windows & reg; 98, the configuration option of the maximum working thread server does not work. The index Create memory server configuration option controls the amount of memory used for sorting operations when an index is created. Creating an index on a production system is usually a task that is not often executed. It is usually scheduled for jobs that are executed during non-peak hours. Therefore, adding this value when you do not often create an index and during non-peak hours can improve the index creation performance. However, it is best to keep the min memory per query configuration option at a lower value so that the index creation job can still start even if all requested memory is unavailable. For more information, see the index Create memory option.
The min memory per query server configuration option can be used to specify the minimum amount of memory allocated to query execution. When many queries are executed concurrently in the system, increasing the value of Min memory per query helps improve the performance of queries that consume a large amount of memory (such as large sorting and hash operations. However, do not set the configuration options of the min memory per query server too high, especially on very busy systems, because the query will have to wait until it can ensure the minimum memory occupied by the request, or wait until it exceeds the value specified in the query wait server configuration option. If the available memory is more than the specified minimum memory required for query execution, the extra memory can be used as long as the query can effectively utilize the extra memory. For more information, see Min memory per query and query wait options. Use the I/O configuration option to optimize server performance
The following server configuration options can be used to configure I/O usage and affect server performance: Recovery Interval
The recovery interval server configuration option controls the time for Microsoft & reg; SQL Server & #8482; 2000 to issue checkpoints within each database. By default, SQL Server determines the best time to perform the checkpoint operation. However, to determine whether this is an appropriate setting, you need to use the Windows NT Performance Monitor to monitor disk write activity on database files. The performance may be compromised if the disk usage reaches 100% of the active peak value. If you change this parameter to reduce the number of checkpoint processes, You can generally improve the overall performance in this case. However, you must continue to monitor performance to determine whether the new value has a positive impact on performance. For more information, see the recovery interval option.
Source: http://onewww.net/blog/default.asp? Id = 80

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Import] [reprint] to make SQL run faster

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[Import] [reprint] to make SQL run faster

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support