Index creation to improve query speed (1)

Source: Internet
Author: User

People tend to fall into a misunderstanding when using SQL, that is, they are too concerned about whether the obtained results are correct, while ignoring the possible performance differences between different implementation methods, this performance difference is particularly evident in large or complex database environments (such as online transaction processing OLTP or decision support system DSS. In my work practice, I found that poor SQL statements often come from inappropriate index design, unfilled connection conditions, and unoptimized where clauses. After they are properly optimized, their running speed is significantly improved! I will summarize the following three aspects:

For more intuitive explanation of the problem, the SQL running time of all instances has been tested, and the SQL running time cannot exceed 1 second is expressed as (<1 second ).

-- Test environment --

  • HOST: HP LH II
  • Clock speed: 330 MHZ
  • Memory: 128 MB
  • Operating System: Operserver5.0.4
  • Database: Sybase11.0.3
I. unreasonable index design

For example, a table with 620000 rows of record and rows with different indexes can run the following SQL statements:

1. Create a non-clustered index on date

 
 
  1. Select count (*) from record where date> '20160301' and date <'20160301' and amount> 19991201 (25 seconds)
  2. Select date, sum (amount) from record group by date (55 seconds)
  3. Select count (*) from record where date> '123' and place in ('bj ', 'sh') (27 seconds)

Analysis:

There are a large number of duplicate values on date. In non-clustered indexes, data is physically stored on the data page at random. During range search, you must perform a table scan to find all rows in this range.

2. A cluster index on date

 
 
  1. Select count (*) from record where date> '20160901' and date <'20160901' and amount> 19991201 (14 seconds)
  2. Select date, sum (amount) from record group by date (28 seconds)
  3. Select count (*) from record where date> '2013' and place in ('bj ', 'sh') (14 seconds)

Analysis:

Under the cluster index, data is physically stored on the data page in order, and duplicate values are arranged together. Therefore, you can first find the start and end points of this range during range search, in addition, only data pages are scanned within this range, which avoids large-scale scanning and improves the query speed.

3. composite indexes on place, date, and amount

 
 
  1. Select count (*) from record where date> '20160301' and date <'20160301' and amount> 19991201 (26 seconds)
  2. Select date, sum (amount) from record group by date (27 seconds)
  3. Select count (*) from record where date> '2013' and place in ('bj, 'sh') (<1 second)

Analysis:

This is an unreasonable composite index, because its leading column is place, the first and second SQL statements do not reference place, so the upper index is not used; the third SQL uses place, all referenced columns are included in the composite index, which forms an index overwrite, so it is very fast.

4. Combined indexes on date, place, and amount

 
 
  1. Select count (*) from record where date> '20160301' and date <'20160301' and amount> 19991201 (<1 second)
  2. Select date, sum (amount) from record group by date (11 seconds)
  3. Select count (*) from record where date> '2013' and place in ('bj ', 'sh') (<1 second)

Analysis:

This is a reasonable composite index. It uses date as the leading column, so that each SQL can use the index, and the index coverage is formed in the first and third SQL statements, so the performance is optimal.

5. Conclusion:

The index created by default is a non-clustered index, but sometimes it is not the best. A reasonable index design should be based on the analysis and prediction of various queries.

. Generally speaking:

2. Incomplete connection conditions:

For example, the table card has 7896 rows, there is a non-clustered index on card_no, the table account has 191122 rows, and there is a non-clustered index on account_no, explain execution of two SQL statements under different table connection conditions:

 
 
  1. Select sum (a. amount) from account a, card B where a. card_no = B. card_no (20 seconds)

Change SQL:

 
 
  1. Select sum (a. amount) from account a, card B where a. card_no = B. card_no and a. account_no = B. account_no (<1 second)

Analysis:

Under the first join condition, the optimal query scheme is to use the account as the outer table, and the card as the inner table. The I/O times of the card can be estimated by the following formula:

Outer table account page 22541 + (the first row of the outer table account * the third page corresponding to the first row of the outer table on the card of the inner table) = 191122 times I/O

Under the second join condition, the best query scheme is to use card as the outer table and account as the inner table. The number of I/O times of the account can be estimated by the following formula:

1944 page + on the outer table card (the fourth row of the outer table card * The fourth page corresponding to each row of the outer table on the inner table account) = 7896 times I/O

It can be seen that only a full set of connection conditions can be executed for the best solution.

Summary:

1. Before a multi-table operation is executed, the query optimizer will list several possible connection solutions based on the connection conditions and find the best solution with the minimum system overhead. The join conditions must fully consider the tables with indexes and tables with multiple rows. The selection of the internal and external tables can be determined by the formula: Number of matched rows in the outer table * Number of times each query is performed in the inner table, the minimum product is the best solution.

2. view the method for executing the solution-use set showplanon to open the showplan option, and you will be able to see the connection sequence and index information. For more details, you must use the sa role to execute dbcc (3604,310,302 ).


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.