Query performance tuning is a big topic, where the technology involved is very broad, but we can generally divide it into the following levels:
1. Reduce data access. The related technique is to create an appropriate index that converts time-consuming operations such as full table scans, index scans (scan), to index lookups (seek). Establish the correct index, can make the database query performance raise 100-1000 times even higher, just like a very thick dictionary, if there is no index, you have to check a thing, that is quite laborious, need to check the whole book, indexed can be directly based on the index positioning. This is the most important way to improve performance.
2. Reduce the data returned. Transmission of data in the network, bandwidth is limited, if the need to extract the smallest amount of data, will play a good role. It should be noted that in SQL, do not appear in the select *, but what fields are required to extract what fields.
3. Reduce the number of interactions with the database. Limited network resources, obviously, frequent interaction with the database, but also a constraint on the performance of a factor. A good suggestion is to use stored procedures, or batch statements, to reduce interaction with the database and improve performance.
4. Reduce the CPU load. Here, the main use is the caching plan. In a query, try to use a parameterized query. In this case, the database caches the query parameters to reuse the query plan.
5. Improve hardware performance. This is the last resort, if other aspects have been done very well, performance bottlenecks in the CPU, memory and disk, the adoption of hardware performance to improve the scheme will appear more appropriate, or else first to optimize other places.
The above 5 levels of optimization bring performance improvements that are descending in turn, and are an inverted pyramid.
Let's discuss the index in detail below.
Baidu Encyclopedia on the index description is: "Database index is a database table in one or more columns of the value of a structure, use the index to quickly access specific information in the database table." ”
Indexes are grouped into clustered indexes (clustered index) and nonclustered indexes (nonclustered index).
A. Clustered index
A table that contains a clustered index, called a clustered table, is organized in a way that is consistent with the order of the clustered index. A column that is covered by a clustered index, called a clustered key.
In the case of Xinhua dictionary, the text of each word is a data row, their organizational order is based on pinyin, if the pinyin is the same, will be based on strokes (not necessarily accurate, forgive), so the Xinhua Dictionary of the clustered index covered by the column is pinyin and strokes.
It is easy to understand that the text can only be sorted in an established order, and that, similarly, there can be only one clustered index in a table, which determines how the data rows are organized.
B. Nonclustered indexes
The nonclustered index, in the words of the Xinhua dictionary, is the appendix of the dictionary body that searches by pinyin, searches by radical, and finds by strokes. They describe the sort position of the text in the body, but they are separate from the text. A nonclustered index, which has nothing to do with the organization order of the data, and it uses a series of pointers to point to the data row to describe the location of the data row.
A table that does not contain a clustered index, called a heap table, has an organized order of data rows that is not in a particular order, similar to a pile of books, and adds a book to the top of the pile (in the heap table, the implementation may be different).
Clustered indexes have a very high impact on query performance. Clustered tables, nonclustered indexes are positioned based on the clustered key, and the nonclustered indexes in the heap table are positioned according to the data line number. This will have a great performance difference, the former performance is much better than the latter. Therefore, it is very necessary to establish a proper clustered index. A good suggestion is to index with a column with a small print segment and a value unique, preferably a single column, which can be a surrogate key. Because if the field is too large, the cost of sorting will be very high; If the column value is not unique, the database appends 4 bytes of information to the duplicate value to identify the duplicate value, increasing the unnecessary overhead.
Typically, we specify a primary key when creating a table, and if you do not explicitly specify an index type, the clustered index is created by default. For example, the add Constraint Pk_tbl primary key (SID) will create a clustered index in the SID sequence. You can explicitly specify the index type on the primary key, such as the add Constraint Pk_tbl primary key nonclustered (SID), which creates a primary key for the nonclustered index. So, when creating a primary key, be careful, there are multiple primary key cases, be aware of explicitly specifying the index type.
Indexes can greatly improve query and sorting performance, but in inserting, deleting, and modifying primary keys, you need to maintain an indexed order. If a frequently changing table is not appropriate to create too many indexes, the negative performance impact of the index will outweigh the gains.
Index optimization, is a very elegant thing, it needs to find a balance point.
In general, there are several suggestions to create an appropriate index:
1. More than 300 rows of data tables to create an index (ignore off)
2. Clustered index fields cannot be too much, preferably a single field, and column values are unique
3. For tables with a very large number of data fields, and many of these fields appear in the where, it is not advisable to establish a separate index on each field, but to create a composite index. In a composite index, the order of the columns is exquisite, the more selective and the only column to put in front, which has great help for query optimizer optimization. It is not advisable to index a large number of repeated column values, such as indexing on a true,false column.
4. If there are few fields in the query, consider setting up an overlay index, including the fields in the index, and you can only access the index to query all the data without the table scan.