Improve query speed with SQL index

Source: Internet
Author: User
Tags sybase

1. Proper use of indexes

An index is an important data structure in a database, and its fundamental purpose is to improve query efficiency. Most database products now use the ISAM index structure first proposed by IBM.

The use of indexes is just right, and the following principles are used:

    • Indexes are established on columns that are frequently connected but not specified as foreign keys, and fields that are not frequently connected are automatically indexed by the optimizer.
    • An index on a column that is frequently sorted or grouped (that is, a group by or order by operation).
    • Build a search on columns that are often used in conditional expressions with more different values, and do not index on columns with fewer values. For example, in the employee table, the "Gender" column is only "male" and "female" two different values, so there is no need to build an index. If you build an index, it will not improve the query efficiency, but can seriously reduce the update speed.
    • If there are multiple columns to sort, you can create a composite index on those columns (compound index).
    • Use System Tools. If the Informix database has a Tbcheck tool, it can be checked on a suspicious index. On some database servers, indexes can be invalidated or slow to read because of frequent operations, and if a query that uses an index is slowly slowing down, try using the Tbcheck tool to check the integrity of the index and fix it if necessary. In addition, deleting and rebuilding an index can improve query speed when the database table updates a large amount of data.

(1) In the following two SELECT statements:

SELECT * FROM table1 WHERE field1<=10000 and field1>=0; SELECT * FROM table1 WHERE field1>=0 and field1<=10000;

The first SELECT statement is much more efficient than the second SELECT statement if the data in the data table is Field1 >=0, because the first condition of the second SELECT statement consumes a lot of system resources.

The first principle: in the WHERE clause, the most restrictive conditions should be placed at the front.

(2) In the following SELECT statement:

SELECT * from tab WHERE a= ... and b= ... and c= ...;

If indexed index (A,B,C), the order of the fields in the WHERE clause should match the order of the fields in the index.

The second principle: the order of the fields in the WHERE clause should correspond to the order of the fields in the index.

—————————————————————————— The following assumption that there is a unique index I1 on field1 and that there is a non-unique index I2 on Field2. —————————————————————————— (3) SELECT field3,field4 FROM tb WHERE field1=‘sdf‘ fast SELECT * FROM tb WHERE field1=‘sdf‘ slow [/cci]

Because the latter takes one more step rowid table access after the index scan.

(4) SELECT field3,field4 FROM tb WHERE field1>=‘sdf‘ fast and SELECT field3,field4 FROM tb WHERE field1>‘sdf‘ slow

Because the former can quickly locate the index.

(5) SELECT field3,field4 FROM tb WHERE field2 LIKE ‘R%‘ fast SELECT field3,field4 FROM tb WHERE field2 LIKE ‘%R‘ and slow,

Because the latter does not use the index.

(6) Use functions such as: SELECT field3,field4 FROM tb WHERE upper(field2)=‘RMN‘ do not use indexes.

If a table has 20,000 records, it is recommended not to use a function; If a table has more than 50,000 records, the use of functions is strictly forbidden! 20,000 records there is no limit to the following.

(7) The null value is not stored in the index, so the SELECT field3,field4 FROM tb WHERE field2 IS[NOT] NULL index is not used.

(8) Inequality if the SELECT field3,field4 FROM tb WHERE field2!=‘TOM‘ index is not used. Similarly, the SELECT field3,field4 FROM tb WHERE field2 NOT IN(‘M‘,‘P‘) index is not used.

(9) Multiple-column indexes can be used only if the first column of the index in the query is used for the condition.

(ten) Max,min and other functions, using the index. SELECT max(field2) FROM tbTherefore, if you need to take max,min,sum a field, you should index it.

Use only one aggregate function at a time, such as: SELECT “min”=min(field1), “max”=max(field1) FROM tbSELECT “min”=(SELECT min(field1) FROM tb) , “max”=(SELECT max(field1) FROM tb)

(11) An index with too many duplicate values is not used by the query optimizer. And because the index is built, the index is also modified when the field value is modified, so the operation to update the field is slower than no index.

(12) Large index values (such as indexing on one char(40) field) can cause a large amount of I/O overhead (even exceeding the I/O overhead of table scans). Therefore, use an integer index as much as possible. Sp_estspace can calculate the cost of tables and indexes.

(13) For multi-column indexes, ORDER BY the order must be the same as the field order of the index.

(14) In Sybase, if ORDER BY the field consists of a clustered index, then no need to do it ORDER BY . The order in which records are arranged is consistent with the cluster index.

(15) Multi-table junction (the specific query plan needs to be tested) The WHERE clause to use the associated field as far as possible, and to put the associated fields in front.SELECT a.field1,b.field2 FROM a,b WHERE a.field3=b.field3

    1. If there is no index on the field3: A is a full table scan, the results are sorted to B as a full table scan, the results of sorting results are merged. It's more appropriate for a small table or a huge table.
    2. Field3 index in the order of the table junction, B is the driver table, a for the drive table to B for a full table scan of a index range scan if matching, through a rowid access

(16) Avoid a one-to-many join. such as: SELECT tb1.field3,tb1.field4,tb2.field2 FROM tb1,tb2 WHERE tb1.field2=tb2.field2 AND tb1.field2=‘BU1032’ AND tb2.field2= ‘aaa’ Why not:declare @a varchar(80)     SELECT @a=field2 FROM tb2 WHERE field2=‘aaa’      SELECT tb1.field3,tb1.field4,@a FROM tb1 WHERE field2= ‘aaa’

(16) Sub-query with Exists/not exists instead of In/not in operation comparison:SELECT a.field1 FROM a WHERE a.field2 IN(SELECT b.field1 FROM b WHERE b.field2=100)     SELECT a.field1 FROM a WHERE EXISTS( SELECT 1 FROM b WHERE a.field2=b.field1 AND b.field2=100)      SELECT field1 FROM a WHERE field1 NOT IN( SELECT field2 FROM b)      SELECT field1 FROM a WHERE NOT EXISTS( SELECT 1 FROM b WHERE b.field2=a.field1)

(17) Primary and foreign keys are primarily used for data constraints, and indexes are automatically created when a primary key is created in Sybase, and foreign keys are not indexed, and performance must be re-indexed.

A field of type char is not indexed much worse than a field of type int does not build. The performance is only a little bit worse after the index is built.

(19) Use count(*) and do not use count(column_name) , avoid use count(DISTINCT column_name) .

(20) to the right of the equals sign do not use field names, such as:SELECT * FROM tb WHERE field1 = field3

(21) Avoid using an OR condition, because or does not use an index.

2. Avoid using order by and group by words.

Because using these two clauses consumes a lot of temporary space (tempspace), if you must use them, you can replace them with a view, a method of manually generating temporary tables.    If necessary, first check the memory, tempdb size. Testing proves that, in particular, it is very slow to avoid using join and group BY in a query!

3. Minimize the use of sub-queries, especially related sub-queries. Because this can lead to a decrease in efficiency.

When a column's label appears in both the main query and the query in the WHERE clause, it is likely that the subquery must be queried again once the column values in the main query have changed. The more nested levels of queries, the lower the efficiency, so you should avoid subqueries as much as possible. If the subquery is unavoidable, filter out as many rows as possible in the subquery.

4. Eliminate sequential access to large table row data

In nested queries, sequential access to a table can have a fatal effect on query efficiency.    For example, a sequential access strategy, a nested 3-tier query, if each layer query 1000 rows, then the query will query 1 billion rows of data.    The primary way to avoid this situation is to index the concatenated columns. For example, two tables: Student table (school number, name, age ...). ) and select the timetable (school number, course number, results).    If two tables are to be connected, an index will be created on the connection field of the "Learning number". You can also use a set to avoid sequential access.    Although there are indexes on all the check columns, some forms of where clauses force the optimizer to use sequential access. The following query forces a sequential operation on the Orders table: SELECT * FROM orders WHERE (customer_num=104 AND order_num>1001) OR order_num=1008 Although indexes are built on Customer_num and Order_num, the optimizer uses sequential access paths to scan the entire table in the above statement. Because this statement retrieves a collection of detached rows, it should be changed to the following statement: This enables the query to be SELECT * FROM orders WHERE customer_num=104 AND order_num>1001     UNION      SELECT * FROM orders WHERE order_num=1008 processed using the index path.

5. Regular expressions to avoid difficulties

The matches and like keywords support wildcard matching, which is technically known as a regular expression. But this is a particularly time-consuming match. For example SELECT * FROM customer WHERE zipcode LIKE “98_ _ _” , even if an index is established on the ZipCode field, sequential scanning is also used in this case.    If you change the statement to a query, the index will be SELECT * FROM customer WHERE zipcode >“98000” used to query it, and obviously it will greatly increase the speed. Also, avoid non-starting substrings. For example SELECT * FROM customer WHERE zipcode[2,3] >“80” , a non-starting substring is used in the WHERE clause, so the statement does not use an index.

6. Accelerating queries with temporal tables

Sorting a subset of tables and creating temporary tables can sometimes speed up queries. It helps to avoid multiple sorting operations, and in other ways simplifies the work of the optimizer. For example:SELECT cust.name,rcvbles.balance,……other COLUMNS     FROM cust,rcvbles      WHERE cust.customer_id = rcvlbes.customer_id      AND rcvblls.balance>0      AND cust.postcode>“98000”      ORDER BY cust.nameIf the query is to be executed more than once, all unpaid customers can be found in a temporary file and sorted by the customer's name:SELECT cust.name,rcvbles.balance,……other COLUMNS     FROM cust,rcvbles      WHERE cust.customer_id = rcvlbes.customer_id      AND rcvblls.balance>;0      ORDER BY cust.name      INTO TEMP cust_with_balanceThen query in the temporary table in the following way:SELECT * FROM cust_with_balance     WHERE postcode>“98000”The rows in the staging table are less than the rows in the primary table, and the physical order is the required order, reducing disk I/O, so the query effort can be significantly reduced. Note: Changes to the primary table are not reflected when the staging table is created. When data is frequently modified in the primary table, be careful not to lose data.

7. Replace non-sequential access with sorting

Non-sequential disk access is the slowest operation, which is represented by the movement of the disk's access arms back and forth. The SQL statement hides this situation, making it easy to write queries that require access to a large number of non-sequential pages when writing applications.

Improve query speed with SQL index

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.