Improve SQL Server data processing efficiency using indexes (1)

Source: Internet
Author: User

Based on a good database design, using indexes effectively is the foundation for SQL Server to achieve high performance. SQL Server uses a cost-based optimization model, which queries related tables for each commit, determines whether to use or which index to use. Because most of the overhead of query execution is disk I/O, an index is used to improve the performance to avoid full table scanning, because a full table scan needs to read every data page of the table from the disk, if an index points to the data value, the query only needs to read the disk several times. Therefore, if a reasonable index is set up, the optimizer can use the index to accelerate the Data Query Process. However, indexing does not always improve the system performance. In addition, deletion, and modification operations, the existence of indexes increases the workload. Therefore, adding appropriate indexes at the right place and deleting sub-optimal indexes from unreasonable places will help optimize SQL Server applications with poor performance. Practice shows that a reasonable index design is based on the analysis and prediction of various queries. Only by correctly combining the index with the program can the best optimization solution be produced. This article analyzes and practices the performance of SQL Server indexes.

I. Use of clustered indexes

A clustered index sorts the actual data on a disk by the values of one or more specified columns. Because the index page pointer of the clustered index points to the data page, using the clustered index to search for data is almost always faster than using a non-clustered index. Each table can only create a clustered index, and creating a clustered index requires at least 120% additional space for the table to store copies of the table and the intermediate index page. The idea of building a clustered index is:

1. Most tables should have clustered indexes or use partitions to reduce competition on the last page of the table. In a highly transactional environment, blocking the last page seriously affects the system throughput.

2. In the clustered index, data is physically arranged on the data page in order, and duplicate values are also arranged together, therefore, when the queries that contain range checks (between, <, <=,>,> =) or use group by or order, once a row with the first key value in the range is found, the row with the subsequent index value is physically contiguous without further searching, avoiding large-scale scanning, this greatly improves the query speed.

3. When you create a clustered index on a table with frequent insert operations, do not create a column with a monotonous appreciation (such as IDENTITY). Otherwise, blocking conflicts may often occur.

4. Do not include frequently modified columns in the clustered index, because after the code value is modified, the data row must be moved to a new location.

5. Select the cluster index based on the where clause and connection operation type.

The optional columns of the clustered index are:

1. Primary Key column, which is used in the where clause and inserted randomly.

2. Columns accessed by range, such as pri_order> 100 and pri_order <200.

3. Columns used in group by or order.

4. columns that are not frequently modified.

5. Columns Used in connection operations.

Ii. Use of non-clustered indexes

The index created by SQL Server by default is a non-clustered index, because the non-clustered index does not re-organize the data in the table, instead, store index column values for each row and point them to the page where the data is located with a pointer. In other words, non-clustered indexes have an extra level between the index structure and the data itself. If a table does not have a clustered index, there are 250 non-clustered indexes. Each non-clustered Index provides different sorting orders for data access. When creating a non-clustered index, you must weigh the advantages and disadvantages of the index between the speed of query and the speed of modification. In addition, we need to consider these issues:

1. How much space does the index need.

2. Check whether the appropriate columns are stable.

3. How to select the index key and check whether the scan effect is better.

4. Whether there are many repeated values.

For tables with frequent updates, non-clustered indexes require more additional costs than clustered indexes and no indexes at all. For each row to be moved to a new page, the page-level rows that point to the data of each non-clustered index must also be updated, and sometimes the index page splitting is required. The process of deleting data from a page has similar overhead. In addition, the deletion process must move the data to the top of the page to ensure data continuity. Therefore, it is very careful to create a non-clustered index. Non-clustered indexes are often used in the following scenarios:

1. A column is often used in set functions (such as Sum ,....).

2. A column is often used for join, order by, and group.

3. The retrieved data cannot exceed 20% of the table's data volume.

Iii. Use of covering indexes

A covered index is a non-clustered index that contains all the information required for search. This index is faster because the index page contains the necessary data for search, you do not need to access the data page. If a non-clustered index contains result data, the query speed is faster than that of the clustered index.

However, it takes up a large amount of space to cover a large number of index items. In addition, the update operation will change the index value. Therefore, if the potential overwrite query is not commonly used or is not critical, increasing the index Overwrite will reduce the performance.

Iv. indexing selection technology

P_detail is a table that records personal details in the housing provident fund management system. There are 890000 rows. Check the running results of queries under different indexes and test the results in the C/S environment, the client is IBM PII350 (64 MB memory), the server is DEC Alpha1000A (128 MB memory), and the database is SYBASE11.0.3.

1. select count (*) from p_detail where
Op_date> '2013' and op_date <'
19991231 'and pri_surplus1> 300

2. select count (*), sum (pri_surplus1) from p_detail
Where op_date> '123' and
Pay_month between '20140901' and '20160301'


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.