In the database section, to improve the application performance of the database, it is necessary to focus on applications, optimize the query design and indexing strategy, and even improve the efficiency of database query hundreds of times times, in other aspects of the optimization efforts, the effect is not so obvious (see). This article focuses on the design and use of indexes, design query statements and other considerations in the application of database queries, in order to achieve good database query performance.
I. Index design and usage strategies
Using indexes is the most efficient way for a database to reduce disk I/O. In addition to the fact that there is very little data in the table and you need to return most of the rows in the table, the query performance of the correct index is much higher than the full table scan, and sometimes the query performance can be increased by dozens of times times by simply creating a suitable index, and it is usually the first time that the index is correctly designed and used when the query statement
However, there are two sides of things, the establishment and maintenance of indexes need to consume additional CPU time, additional memory and storage space, in the database table inserts, UPDATE, delete operations, all need to maintain all the indexes of the table, if the index design is improper, not only improve performance, Instead, it degrades performance.
By following these guidelines, we can make the most of the performance improvements that the index brings.
Serial number |
Index design and usage principles |
Explanatory notes |
1 |
Improve the ability to differentiate indexes, build indexes on columns with large data bases and few duplicate values, and whenever possible, define primary and unique keys using the Create UNIQUE index statement |
The higher the degree of differentiation, the more accurate the lookup, the higher the efficiency, and the more you try to avoid indexing on columns that have many duplicate values (including null values) |
2 |
Try to choose a unique, minimal, non-nullable, easily accessible, infrequently changed column as the primary key |
|
3 |
Index on the most frequently used column in the WHERE clause |
|
4 |
Indexes on columns and join connections that have distinct, MIN, Max, order by, GROUP by operations |
Reduce lookup costs and sorting costs |
5 |
Avoid creating redundant similar indexes, such as those that exist on the A,b,c three columns, then the indexes on the A, B two column are redundant |
Redundant indexes do not work, they only add extra overhead |
6 |
The order of the columns in the index takes into account column uniqueness, column width, column data type, and query usage frequency. Unique, column width is small, integer, the most referenced column in the query is in front |
Uniqueness improves sensitivity, small column width reduces space consumption, column data types affect sorting efficiency |
7 |
Try to avoid using more than 5 columns in one index |
|
8 |
Do not create too many indexes on a table. Typically, there are no more than 3 online transaction processing environments (OLTP), more than 5 online analytical processing (OLAP) read-only query environments, and mixed query and OLTP environments, 2~5 |
|
9 |
Use the full index operation whenever possible, that is, all the columns of the query are in the indexed column, and if the required fields are not related to sorting, but are often used when querying, you can use the INCLUDE clause to include the field in the index page when you create INDEX, but not as an indexed field |
The full index operation accesses only the index and does not require access to the data table, which can significantly reduce the query time |
10 |
Indexing on foreign keys improves the efficiency of parent table update and delete operations |
|
11 |
Use a clustered index (Cluster index) when you need to batch process a range of data or read pre-sorted data |
For batch processing operations, the comparison is applicable |
12 |
Do not create clustered indexes on columns that are frequently updated, with large widths |
All non-clustered indexes will be clustered keys as locators, which will result in a decrease in the efficiency of non-clustered index updates |
13 |
Do not use clustered indexes with a large number of parallel sequential inserts, or use indexes on other columns to scramble the insertion Order and distribute the insert operations to the entire table |
If you bulk insert by cluster key, the insert operation is centered on the last page of the table and will form a "hot spot" with very low I/O efficiency |
14 |
Create a clustered index first, and then create a nonclustered index to avoid all nonclustered index rebuilds caused by the clustered index |
|
15 |
Use the DROP_EXISTING option when rebuilding a clustered index to avoid rebuilding a clustered index for all non-clustered index rebuilds |
For SQL Server, MySQL |
16 |
Use full-text indexing when searching for super-large segments, or for fuzzy queries using like '%...% ' for lots of data |
|
II. Considerations for writing SQL statements
In a database system, many factors, such as indexes, fragmentation, statistics, and locks, can affect performance, and optimization of these features may improve the execution efficiency of query statements. However, if the SQL statement is not properly written, it can significantly increase the avoidable overhead.
Here are some things to keep in mind when writing SQL statements:
- Establish and follow the SQL Coding specification, SQL statements should be concise, easy to read, easy to maintain
- To improve query performance, limit the amount of data on the operation, and try to operate on small result sets, reducing the number of rows and columns
² Do NOT use "SELECT * from ..." To return all columns and retrieve only the columns needed to avoid unnecessary program modifications due to changes in the table structure and to reduce the additional resources consumed
² do not retrieve known columns, such as select Cust_no, cust_name from CustInfo where cust_no = ' 10000050 '
² Use a high-selectivity where clause to have the database return a required set of data instead of returning a large amount of data to the application, and then filtering by the application, the cost of returning a large data set is high
- Efficient use of indexes: the columns referenced by the WHERE clause determine the use of the index and need to focus on
The order of the columns in the ²WHERE clause is consistent with the order in which the indexes need to be used, not all of the database's optimizer can optimize this order and maintain good programming habits
² Use of parametric search conditions such as =, >=, <, <=, between, in, is null and like ' <literal>% '
² Try not to use a non-parameterized negative query, which will result in the inability to use the index, such as <>! =,!>,!< <literal> '
² do not operate on fields in the WHERE clause, such as where AMOUNT/2 > 100, even if the Amount field is indexed, and cannot be used, change to where amount > 100 * 2 to use the index on the amount column
² do not use functions on fields in the WHERE clause, such as where substring (Lastname, 1, 1) = ' F ' cannot use the index on the Lastname column, where Lastname like ' f% ' or where Lastname >= ' F ' and Lastname < ' G ' can
² careful use of OR operations, any one clause in an and operation can use an index to improve query performance, but any one of the or conditions that cannot use the index will result in a degraded query performance, such as where member_no = 1 or provider_no = 1, in Member_ No or Provider_no no index on any one of the fields will cause a table scan or clustered index scan
²between are generally much more efficient than in/or, and if you can choose between between and in/or conditions, between conditions are always selected, and >= clauses are substituted for the <= and between conditions. Because not all database optimizers can overwrite the between clause with the >= and <= condition combinations, which will cause the index to be unusable if not overwritten
- The processing of pagination display: In the case of a large number of result sets satisfying the condition, but the interface needs to be paginated, the best way is to limit the number of records that the database returns each time, such as 50 records per page, you can use SELECT ... fetch first rows only
- Avoid unnecessary resource overhead:
² try to compare with the same data type to avoid data type conversions, such as select ' 5678 ' +3
² indexes on columns that have min, Max, DISTINCT, order by, and group by operations to avoid additional sorting overhead
² When you need to verify that you have a qualifying record, use exists, do not use COUNT (*), which is returned at the first matching record, which needs to traverse all matching records
² If the select result set that participates in the Union clause is mutually exclusive or allows duplicate records in the result set, you can use union all instead of union to avoid the overhead of detecting and deleting duplicate records
² Try to use a single statement to complete the operation, such as multi-pen INSERT statement can be combined into a multiple INSERT statement, using case clauses, etc.
² When you do not need to update data, use Select......for Read only or fetch read only to improve query performance
- Reduce transaction overhead
² use EXPLICIT Transaction control statements to control the start and commit/rollback of transactions, extend atomicity to the entire transaction, otherwise, each statement is an atomic operation with a large log overhead
² Timely Commit, if there is no commit insert, UPDATE and delete operations, the data lock will be maintained, timely commit to facilitate the release of locks as soon as possible to improve concurrency, especially in bulk operations, the fragment commit is also conducive to the re-processing of errors , you can continue processing from the last commit operation, and you must start from scratch if there is no commit.
² Keep Short transactions as much as possible, and try not to handle tasks unrelated to database operations, such as mass mailing, etc.
² Use Insert Into......select ... from ..... Equal block operation mode, can greatly reduce the log overhead
² Reduce lock overhead by using the lowest level of isolation based on business logic
- Use Select......for update to protect those rows that may be modified in subsequent update statements, effectively avoiding the most common deadlock conditions
- Using referential integrity constraints of the database, such as NOT NULL constraints and PRIMARY/foreign key constraints, not only ensure data integrity, but also help the optimizer generate more efficient execution plans
- Adjust the join operation order for optimal performance, the join operation is top-down, such as select A.col1, B.col2, c.col3 from a LEFT join B on a.col4 = B.col5
Join C on a.col4 = C.col6
The sequence of operations is a and B to the left join, and the result of a join with C, if a and C join the result set is smaller, you should adjust the above statement order, first A and C join operation, and B for the left JOIN, can improve query performance
Iii. Other matters of note
In addition to efficiently using indexes, writing efficient SQL statements, and in improving query performance, there are a few things to note from the application authoring perspective:
- SQL statements are efficient statements that process result sets (multirow, columns), while cursors (cursor) are a technique that processes one line at a time, and are much less efficient, and do not use cursors if not necessary
- If using cursors, use the lowest-cost cursor and release as soon as possible
- Multi-use dynamic SQL: The execution plan of a static SQL statement is generated at application compile time, even if the subsequent database data changes significantly, the original execution plan becomes very inefficient, as long as the application does not recompile, the database will not adjust, optimize its execution plan. While the dynamic SQL statement needs to compile the execution plan on each execution, it is the optimal execution plan based on the latest statistic data, which can get better query performance.
- Using stored procedures: For situations that do not need to be presented on the user interface, but require multiple, large amounts of data interaction with the database, the best choice is to use stored procedures that can reduce network traffic and reduce network flow, while stored procedures are pre-compiled and fast to execute
- Using a database connection pool in a high-concurrency environment can significantly reduce the overhead of establishing connections and shutting down connections
In fact, to get a high-performance database application, not overnight, once and for all, in addition to the database design to fully consider the business characteristics and data characteristics, good physical design (hardware selection, especially the storage System selection and design) and logical design, the development of efficient applications, Also need to fully cooperate with the database administrator, in the process of running, keep track of the database, irregular index and data defragmentation (REORG), update the statistics, continuous optimization of the application, in order to keep the database application system to maintain good performance.
Considerations for writing high-performance SQL