Index of http://www.blogjava.net/allen-zhe/archive/2010/07/23/326966.html Performance optimization
Recent project needs, did a period of time SQL Server performance optimization, encountered some problems, but also accumulated some experience, now summed up, with June share. SQL Server performance optimization involves many aspects, such as good system and database design, high-quality SQL writing, appropriate data table index design, and even various hardware factors: network performance, server performance, operating system performance, and even network cards, switches, and so on. This article focuses on how to improve the index, and another article discusses how to improve SQL statements.
First of all, we need to emphasize that water can carry a boat and overturn it.establishing an "appropriate" index is the first prerequisite for query optimization.
When searching for data based on the value of the index code, the index provides quick access to the data. In fact, without an index, the database can successfully retrieve results based on the SELECT statement, but as the table gets larger, the effect of using an "appropriate" index becomes more pronounced. Indexes can help improve retrieval performance, but too many or improper indexes can cause system inefficiencies. Because the user adds an index to the table, the database will do more work. Too many indexes can even cause index fragmentation. Therefore, to establish an "appropriate" index system, especially for the creation of the aggregation index, should be better, so that the database can be high-performance play.
Describe the index of SQL Server
SQL Server provides two types of indexes: Clustered indexes (clustered index, also known as clustered indexes, clustered indexes), and nonclustered indexes (nonclustered index, also called nonclustered indexes, non-clustered indexes).
The clustered index determines the physical order of the data in the table. A clustered index is similar to a phone book, which arranges data by last name. Because a clustered index specifies the order in which data is physically stored in a table, a table can contain only one clustered index. However, the index can contain multiple columns (combined indexes), just as the phone book is organized by last name and first name.
Clustered indexes are particularly effective for columns that are frequently searched for range values. When you use a clustered index to find the row that contains the first value, you can ensure that the rows that contain the subsequent index values are physically adjacent. For example, if a query executed by an application frequently retrieves records from a range of dates, using a clustered index can quickly find the row that contains the start date, and then retrieve all adjacent rows in the table until the end date is reached. This helps improve the performance of such queries. Similarly, if a column is often used to sort data retrieved from a table, you can save costs by aggregating (physically sorting) the table on that column, avoiding sorting each time the column is queried.
A nonclustered index is similar to an index in a textbook. The data is stored in one place, and the index is stored in another place, with the index pointing to where the data is stored. The items in the index are stored in the order of the index key values, and the information in the table is stored in a different order (this can be specified by the clustered index). If you do not create a clustered index in the table, you cannot guarantee that the rows will have any particular order.
For a more detailed introduction, refer to the introduction to the index on MSDN. Http://msdn.microsoft.com/zh-cn/library/ms189271.aspx
using the index of SQL Server
The question comes again, since there are two indexes, when and what kind of index? Let's take a look at the table below. Simply put, a clustered index is used for a small number of different values, or columns are often sorted by grouping, or when you need to return data in a range, for large numbers of different values, or for columns that are often sorted by grouping, or when columns are frequently updated with nonclustered indexes.
|
Using Clustered Indexes |
Using Nonclustered indexes |
Columns are often sorted by grouping |
Should |
Should |
Return data in a range |
Should |
should not |
One or very few different values |
should not |
should not |
A small number of different values |
Should |
should not |
A large number of different values |
should not |
Should |
Columns that are frequently updated |
should not |
Should |
FOREIGN key columns |
Should |
Should |
Primary key columns |
Should |
Should |
Frequently modifying index columns |
should not |
Should |
some lessons on how to improve indexing:
1. The index should first meet the most critical or frequently executed queries of your application.
If a query executes only once a month, consider whether it is worth creating an index for its involved tables. Be aware that at other times in the current month the database system's maintenance overhead on the index exceeds the cost of the table scan that satisfies the query. Therefore, good steel is used on the blade, good index is used in the key frequent queries.
2. Index on a column that is frequently connected but not specified as a foreign key.
In nested queries, sequential access to a table can have a fatal effect on query efficiency. For example, a sequential access strategy, a nested 3-tier query, if each layer query 1000 rows, then the query will query 1 billion rows of data. The primary way to avoid this situation is to index the concatenated columns. For example, the following SQL connects the two tables: Tbla (ID, c1, c2, ...). ) and TBLB (ID, ... ), you need to index the ID field on the two tables separately.
Select min (c1) from Tbla where C1 > 1-will trigger clustered Index Seek.
Select min (c1) from Tbla where C1 > 1 and c2 = 2--will trigger clustered Index Seek.
Select min (c1) from Tbla where C1 > 1 and C3 < 3--will trigger clustered Index Scan.
Select min (c1) from Tbla where C2 = 2 and C3 < 3--will trigger clustered Index Scan.
Select min (c1) from Tbla where C1 > 1 and c2 = 2 and C3 < 3--the clustered index Seek is triggered, and an indexed overlay is formed.
Where the clustered Index scan execution plan
where clustered Index seek's execution plan
So visible is the need to form an index overlay.
6. The tacit understanding of nonclustered indexes and exact lookups
When there is a large number of different values for a field in a table, a nonclustered index for that field can be an unexpected effect. Because the database system searches for data values, it searches for a nonclustered index, finds the location of the data value in the table, and then retrieves the data directly from that location. Because the index contains entries that describe the exact location of the data values that the query searches for in the table, this is why nonclustered indexes are the best way to precisely match queries. For example, when the employee table has a nonclustered index built for emp_id, and to search for everyone whose employee ID (emp_id) > 1000, SQL Server jumps directly to emp_id = 1000 after an entry in the index, listing the matching EMP_ The ID column of the page and row in the table, and then go directly to that page of the row.
7. If you are Pierce, the SQL Server execution plan is Rondo
SQL Server 2005 Microsoft SQL Server Management Studio and the database Engine Tuning Advisor (deta) are very good performance debugging assistants that you can use to tune SQL statements. Review estimated execution plan overhead, generate optimization recommendations with DETA, adopt or reference the Index optimization section.
It is important to note that for an estimated execution plan, do not focus too much on the cost ratio shown in it, which is sometimes misleading. During the actual optimization process, I was found that the execution cost of an index scan accounted for only 25%, the other key lookup cost accounted for 50%, and the key lookup part is not optimized, the SEEK predicate is id=xxx this is based on the primary key lookup. And careful analysis can be seen, the latter CPU overhead 0.00015,i/o overhead of 0.0013. And the former, CPU overhead 1.4xxxx,i/o overhead is much larger than the latter. Therefore, the priority of optimization should be put in the former.
This kind of articles on the Internet a lot, here do not repeat. You can refer to an earlier article: Getting Started with SQL Server performance tuning (graphic)
There is also a nice article, shared here: Explore how to achieve fast data extraction and data paging in MS SQL Server databases with 10 million of data
SQL Server Database Performance Optimization Index chapter "Go"