How to make your SQL run faster (SQL statements run under the Sybase Database)

Source: Internet
Author: User
Tags sybase sybase database

Many people tend to fall into a misunderstanding when using SQL, that is, they are too concerned about whether the obtained results are correct, while ignoring the possible performance differences between different implementation methods, this performance difference is particularly evident in large or complex database environments (such as online transaction processing OLTP or decision support system DSS.

In my work practice, I found that poor SQL statements often come from inappropriate index design, unfilled connection conditions, and unoptimized where clauses.

After they are properly optimized, their running speed is significantly improved!
I will summarize the following three aspects:
For more intuitive explanation of the problem, the SQL running time of all instances has been tested, and the SQL running time cannot exceed 1 second is expressed as (<1 second ).
Test environment: Host: hp lh ii ---- clock speed: 330 MHz ---- memory: 128 MB ----
Operating System: operserver5.0.4 ----
Database: sybase11.0.3
 
 
I. unreasonable index design
 
For example, a table with 620000 rows of record and rows with different indexes can run the following SQL statements:
1. A non-cluster index is created on date.
Select count (*) from record
Where date> '123' and date <'123' and amount> 19991201 (25 seconds)

Select date, sum (amount) from record
Group by date (55 seconds)

Select count (*) from record
Where date> '123' and place in ('bj ', 'sh') (27 seconds)
Analysis: there are a large number of duplicate values on date. In non-clustered indexes, data is physically stored on the data page at random. During range search, you must perform a table scan to find all rows in this range.
 
2. A cluster index on Date

Select count (*) from record
Where date> '123' and date <'123' and amount> 19991201 (14 seconds)

Select date, sum (amount) from record
Group by date (28 seconds)

Select count (*) from record
Where date> '123' and place in ('bj ', 'sh') (14 seconds)
 
Analysis: under the cluster index, data is physically stored on the data page in order, and duplicate values are also arranged together. Therefore, during range search, you can first find the start and end points of this range, and only scan the data page within this range, avoiding large-scale scanning and improving the query speed.
3. composite indexes on place, date, and amount
 
Select count (*) from record
Where date> '123' and date <'123' and amount> 19991201 (26 seconds)

Select date, sum (amount) from record
Group by date (27 seconds)

Select count (*) from record
Where date> '2013' and place in ('bj, 'sh') (<1 second)
 
Analysis: This is an unreasonable composite index, because its leading column is place, and the first and second SQL statements do not reference place, so the index is not used; the third SQL uses place, and all referenced columns are included in the composite index, which forms index coverage, so it is very fast.
4. Combined indexes on date, place, and amount
Select count (*) from record
Where date> '123' and date <'123' and amount> 19991201 (<1 second)

Select date, sum (amount) from record
Group by date (11 seconds)

Select count (*) from record
Where date> '2013' and place in ('bj ', 'sh ')

Analysis: this is a reasonable combination of indexes. It uses date as the leading column, so that each SQL can use the index, and the index coverage is formed in the first and third SQL statements, so the performance is optimal.
 
5. Conclusion: The index created by default is a non-clustered index, but sometimes it is not the best. A reasonable index design should be based on the analysis and prediction of various queries.

Generally speaking:
You can create a cluster index for columns with a large number of duplicate values and frequent range queries (between, >,<<=, <=), order by, and group;
Multiple columns are frequently accessed at the same time, and each column contains duplicate values. You can consider creating a composite index;
Composite indexes should try to overwrite key queries. The leading column must be the most frequently used column.

 
2. Incomplete connection conditions:
 
For example, the table card has 7896 rows, there is a non-clustered index on card_no, the table account has 191122 rows, and there is a non-clustered index on account_no, explain execution of two SQL statements under different table connection conditions:
Select sum (A. Amount) from account A, card B
Where a. card_no = B. card_no (20 seconds)

Select sum (A. Amount) from account A, card B
Where a. card_no = B. card_no and A. account_no = B. account_no (<1 second)

Analysis:> under the first connection condition, the best query solution is to use the account as the outer table, card as the inner table, and card as the index, the number of I/O operations can be estimated by the following formula:
 
Outer table account page 22541 + (the first row of the outer table account * the third page corresponding to the first row of the outer table on the card of the inner table) = 191122 times I/O

Under the second join condition, the best query scheme is to use card as the outer table and account as the inner table. The number of I/O times of the account can be estimated by the following formula: 1944 page + on the outer table card (the fourth row of the outer table card * The fourth page corresponding to each row of the outer table on the inner table account) = 7896 times I/O

It can be seen that only a full set of connection conditions can be executed for the best solution.
 
Summary: 1. Before a multi-table operation is executed, the query optimizer will list several possible connection solutions based on the connection conditions and find the best solution with the minimum system overhead. The join conditions must fully consider the tables with indexes and tables with multiple rows. The selection of the internal and external tables can be determined by the formula: Number of matched rows in the outer table * Number of times each query is performed in the inner table, the minimum product is the best solution.
2. view the method for executing the solution-use set showplanon to open the showplan option, and you will be able to see the connection sequence and index information. For more details, you must use the SA role to execute DBCC (3604,310,302 ).

3. Where clause that cannot be optimized

1. For example, the columns in the following SQL condition statements have an appropriate index, but the execution speed is very slow:
Select * from record where substring (card_no, 5378) = '000000' (13 seconds)
Select * from record where amount/30 <1000 (11 seconds)
Select * from record where convert (char (10), date, 112) = '000000' (10 seconds)
 
 
Analysis:
Any operation results on the column in The WHERE clause are calculated by column one by one during SQL Execution. Therefore, it has to perform table search without using the index on the column;
If these results are obtained during query compilation, they can be optimized by the SQL optimizer and indexed to avoid table search. Therefore, rewrite the SQL statement as follows:
Select * from record where card_no like '000000' (<1 second)
Select * from record where amount <1000*30 (<1 second)
Select * from record where date = '2014/1/01' (<1 second)
You will find that SQL is getting faster!
 
2. For example, the stuff table has 200000 rows and the id_no table has non-clustered indexes. See the following SQL statement:

Select count (*) from stuff where id_no in ('0', '1') (23 seconds)
 
Analysis: ---- 'in' in the where condition is logically equivalent to 'or', so the syntax analyzer will set in ('0', '1 ') convert to id_no = '0' or id_no = '1' for execution
We expect it to search for each or clause separately, and then add the result, so that the index on id_no can be used;

But in fact (according to showplan), it uses the "or policy", that is, first extract the rows that satisfy each or clause and store them in the worksheet of the temporary database, create a unique index to remove duplicate rows and then calculate the results from the temporary table. Therefore, the index on id_no is not used in the actual process, and the completion time is also affected by the performance of the tempdb database.
 
Practice has proved that the more rows in a table, the worse the performance of the worksheet. When stuff has 620000 rows, the execution time reaches 220 seconds! It is better to separate the or clause:

Select count (*) from stuff where id_no = '0'
Select count (*) from stuff where id_no = '1'
 
 
Two results are obtained, and the addition is more cost-effective. Because each sentence uses an index, the execution time is only 3 seconds. In the case of 620000 rows, the execution time is only 4 seconds.
Or, write a simple stored procedure in a better way:
 
Create proc count_stuff
As
Declare @ A int
Declare @ B INT
Declare @ C int
Declare @ d char (10)
Begin
Select @ A = count (*) from stuff where id_no = '0'
Select @ B = count (*) from stuff where id_no = '1'
End
Select 3 Z/I y (k0 U; F9 y @ C = @ A + @ B
Select @ d = convert (char (10), @ C)
Print @ d

Calculate the result directly, and the execution time is as fast as above!
Summary:
As you can see, optimization means that the WHERE clause uses the index. If it is not optimized, table scanning or additional overhead occurs.

1. Any operation on the column will cause the table to scan, including database functions and calculation expressions. During the query, try to move the operation to the right of the equal sign.
2. the in and or clauses usually use worksheets to invalidate the index. If there are no large number of duplicate values, consider splitting the clause. The split clause should contain the index.
 
3. Be good at using stored procedures to make SQL more flexible and efficient.
From the above examples, we can see that the essence of SQL optimization is to use the statements that can be recognized by the optimizer and use indexes in full to reduce the number of I/O scans on the table, avoid table search as much as possible. In fact, SQL Performance optimization is a complex process. The above is only a manifestation of the application layer, in-depth research will also involve resource configuration at the database layer, traffic control at the network layer, and the overall design of the operating system layer.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.