Improve query speed using SQL Indexes

Last Update:2018-12-04 Source: Internet

Author: User

Tags sybase

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Improve query speed using SQL Indexes

1. Use indexes reasonably

An index is an important data structure in a database. Its fundamental goal is to improve query efficiency. Currently, most database products adopt the isam index structure first proposed by IBM.

The index should be used properly. The usage principles are as follows:

● The optimizer automatically generates an index for fields that are frequently connected but not specified as foreign keys.
● Index the columns that are frequently sorted or grouped (that is, group by or order by operations.
● Create a search for columns with different values that are frequently used in conditional expressions. Do not create an index for columns with fewer values. For example, in the "gender" column of the employee table, there are only two different values: "male" and "female", so there is no need to create an index. If an index is created, the query efficiency is not improved, but the update speed is greatly reduced.
● If there are multiple columns to be sorted, you can create a compound index on these columns ).
● Use system tools. For example, the Informix database has a tbcheck tool that can be checked on suspicious indexes. On some database servers, the index may be invalid or the reading efficiency may be reduced due to frequent operations. If an index-based Query slows down, you can use the tbcheck tool to check the index integrity, fix the issue if necessary. In addition, when a database table updates a large amount of data, deleting and re-indexing can increase the query speed.

(1) In the following two select statements:
Select * From Table1 where field1 <= 10000 and field1> = 0;
Select * From Table1 where field1> = 0 and field1 <= 10000;
If field1> = 0, the first select statement is much more efficient than the second SELECT statement, because the first condition of the second SELECT statement consumes a lot of system resources.

First principle: In the WHERE clause, place the most restrictive conditions at the beginning.

(2) In the following SELECT statement:
Select * From tab where a =... And B =... And c = ...;
If an index (a, B, c) exists, the field order in the WHERE clause should be the same as that in the index.

The second principle: the field order in the WHERE clause should be consistent with that in the index.

------------------------------------------------------------------------------
The following assumes that there is a unique index I1 on field1 and a non-unique index I2 on field2.
------------------------------------------------------------------------------
(3) Select field3, field4 from TB where field1 = 'sdf 'fast
Select * from TB where field1 = 'sdf 'slow,

Because the latter requires more rowid table access after index scanning.

(4) Select field3, field4 from TB where field1> = 'sdf 'fast
Select field3, field4 from TB where field1> 'sdf 'slow

Because the former can quickly locate the index.

(5) Select field3, field4 from TB where field2 like 'r % 'fast
Select field3, field4 from TB where field2 like '% R' slow,

Because the latter does not use indexes.

(6) use functions such:
Select field3, field4 from TB where upper (field2) = 'rmn 'No index is used.

If a table has 20 thousand records, we recommend that you do not use functions. If a table has more than 50 thousand records, you are strictly prohibited from using functions! There are no restrictions on the following 20 thousand records.

(7) null values are not stored in the index, so
Select field3, field4 from TB where field2 is [not] Null does not use an index.

(8) The inequality is as follows:
Select field3, field4 from TB where field2! = 'Tom 'does not use an index.
Similarly,
Select field3, field4 from TB where field2 not in ('M', 'P') does not use indexes.

(9) Multi-column indexes. indexes can be used only when the first column of an index in a query is used as a condition.

(10) use indexes for functions such as Max and Min.
Select max (field2) from TB. Therefore, you should add an index if you need to take Max, Min, sum, and so on for the field.

Use only one aggregate function at a time, for example:
Select "min" = min (field1), "Max" = max (field1) from TB
Worse: Select "min" = (select Min (field1) from TB), "Max" = (select max (field1) from TB)

(11) indexes with too many duplicate values will not be used by the query optimizer. Because the index is created, the index must be modified when the field value is modified. Therefore, the operation to update this field is slower than that without an index.

(12) The index value is too large (for example, creating an index on a char (40) field ), it causes a large amount of I/O overhead (or even exceeds the I/O overhead of table scanning ). Therefore, try to use an integer index. Sp_estspace can calculate the overhead of tables and indexes.

(13) for multiple-column indexes, order by must be in the same order as the index field.

(14) In Sybase, if the order by field forms a cluster index, order by is not required. The order of records is the same as that of the Cluster Index.

(15) Multi-table join (the specific query scheme must be obtained through testing)
In the WHERE clause, use the associated fields as much as possible and put the associated fields in front.
Select a. field1, B. field2 from a, B where a. field3 = B. field3
1. If there is no index on field3:
Perform a full table scan on table A and sort the results
Perform full table scan on B and sort the results
Merge results.
It is suitable for small tables or large tables.

2. There is an index on field3
In the order of table join, B is the driving table, and a is the driven table.
Perform full table scan for B
Index range scanning for
If matched, access through rowid of

(16) avoid one-to-multiple join operations. For example:
Select tb1.field3, tb1.field4, tb2.field2 from tb1, tb2 where tb1.field2 = tb2.field2 and tb1.field2 = 'bu1032' and tb2.field2 = 'aaa'
Worse:
Declare @ A varchar (80)
Select @ A = field2 from tb2 where field2 = 'aaa'
Select tb1.field3, tb1.field4, @ A from tb1 where field2 = 'aaa'

(16) subquery
Replace in/not in with exists/not exists
Comparison:
Select a. field1 from a where a. field2 in (select B. field1 from B where B. field2 = 100)
Select a. field1 from a where exists (select 1 from B where a. field2 = B. field1 and B. field2 = 100)

Select field1 from a where field1 not in (select field2 from B)
Select field1 from a where not exists (select 1 from B where B. field2 = A. field1)

(17) primary and Foreign keys are mainly used for data constraints. indexes are automatically created when a primary key is created in Sybase. Foreign keys are irrelevant to indexes. To improve performance, you must create indexes again.

(18) Non-indexing of char fields is worse than non-indexing of int fields. After the index is created, the performance is slightly worse.

(19) Use count (*) instead of Count (column_name) to avoid using count (distinct column_name ).

(20) do not use field names on the right of equal signs, for example:
Select * from TB where field1 = field3

(21) avoid using the or condition because or does not use the index.

2. Avoid using the order by and group by statements.

These two clauses occupy a large amount of temporary space (tempspace). If you must use them, you can use views and manually generate temporary tables.
If necessary, check the size of memory and tempdb.
The test proves that it is very slow to avoid using join and group by in a query!

3. Use as few subqueries as possible, especially related subqueries. This will lead to a reduction in efficiency.

The label of a column appears in both the primary query and the where clause query. It is very likely that after the column value in the primary query changes, the subquery must perform a new query. The more nested query layers, the lower the efficiency. Therefore, avoid subqueries as much as possible. If the subquery is unavoidable, filter as many rows as possible in the subquery.

4. Eliminates sequential access to data in large table rows

In nested queries, sequential access to a table may have a fatal impact on query efficiency.
For example, the sequential access policy is used to create a nested layer-3 query. IF 1000 rows are queried at each layer, 1 billion rows of data are queried.
The primary way to avoid this is to index the connected columns.
For example, two tables: Student table (student ID, name, age ......) And Course Selection form (student ID, course number, score ). If you want to connect two tables, you need to create an index on the join field "student ID.
Union can also be used to avoid sequential access. Although all check columns are indexed, some forms of where clauses force the optimizer to use sequential access.
The following query forces sequential operations on the orders table:
Select * from orders where (customer_num = 104 and order_num> 1001) or order_num = 1008
Although indexes are created on customer_num and order_num, the optimizer still uses sequential access paths to scan the entire table in the preceding statement. Because this statement is used to retrieve the set of separated rows, it should be changed to the following statement:
Select * from orders where customer_num = 104 and order_num> 1001
Union
Select * from orders where order_num = 1008
In this way, you can use the index path to process queries.

5. Avoid difficult Regular Expressions

Matches and like keywords support wildcard matching, technically called regular expressions. However, this matching is especially time-consuming. Example: Select * from customer where zipcode like "98 ___"
Even if an index is created on the zipcode field, sequential scanning is used in this case. If you change the statement to select * from customer where zipcode> "98000", the query will be executed using the index, which will obviously increase the speed.
In addition, avoid non-starting substrings. For example, if select * from customer where zipcode [2, 3]> "80" is used in the WHERE clause, non-starting substrings are used. Therefore, this statement does not use indexes.

6. Use temporary tables to accelerate queries

Sort a subset of a table and create a temporary table, which sometimes accelerates query. It helps avoid multiple sorting operations and simplifies the optimizer's work in other aspects. For example:
Select Cust. Name, rcvbles. Balance ,...... Other Columns
From Cust, rcvbles
Where Cust. customer_id = rcvlbes. customer_id
And rcvblls. Balance> 0
And Cust. postcode> 98000"
Order by Cust. Name
If this query is executed multiple times but more than once, you can find all the unpaid customers in a temporary file and sort them by customer name:
Select Cust. Name, rcvbles. Balance ,...... Other Columns
From Cust, rcvbles
Where Cust. customer_id = rcvlbes. customer_id
And rcvblls. Balance> 0
Order by Cust. Name
Into temp cust_with_balance
Then, query the temporary table in the following way:
Select * From cust_with_balance
Where postcode> 98000"
The temporary table has fewer rows than the primary table, and the physical order is the required order, which reduces disk I/O, so the query workload can be greatly reduced.
Note: after a temporary table is created, the modification to the primary table is not reflected. Do not lose data when the data in the master table is frequently modified.

7. Use sorting to replace non-sequential access

Non-sequential disk access is the slowest operation, as shown in the back-and-forth movement of the disk inventory arm. SQL statements hide this situation, making it easy for us to write a query that requires access to a large number of non-sequential pages when writing an application.

From: http://hi.baidu.com/yang0000meng/blog/item/a49ae6fa7133a72a4f4aeaf5.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More