Database and Query statement optimization

Source: Internet
Author: User
Tags count execution join range first row advantage
Data | database | optimization | Statement 1. Refer to the following to see if the database and query statements have been optimized

How to make your SQL run faster
----people in the use of SQL often fall into a misunderstanding, that is too focused on the results are correct, and ignore
The possible performance differences between different implementations, which differ in large or complex databases
In the environment, such as online transaction processing OLTP or decision support system DSS, is particularly evident. The author in the work Practice
found that bad SQL often comes from improper indexing design, inadequate connectivity conditions, and an whe
The RE clause. After the proper optimization of them, the speed of their operation has been significantly improved! Below I will be from these three
Aspects are summarized separately:
----in order to illustrate the problem more intuitively, the SQL runtime in all instances has been tested for no more than 1 seconds
expressed as (< 1 seconds).
----test environment--
----Host: HP LH II
----Frequency: 330MHZ
----Memory: 128 MB
----Operating system: Operserver5.0.4
----Database: Sybase11.0.3
First, unreasonable index design
----Example: The table record has 620000 rows, and the following SQL runs under different indexes:
----1. A non-clustered index was built on date
Select COUNT (*) from the record where date >
' 19991201 ' and date < ' 19991214 ' and amount >
2000 (25 seconds)
Select Date,sum (amount) from record group by date
(55 seconds)
Select COUNT (*) from the record where date >
' 19990901 ' and place in (' BJ ', ' SH ') (27 seconds)
----Analysis:
----date has a large number of duplicate values, which are physically randomly stored on a data page under a non-clustered index,
Range lookup, you must perform a table scan to find all the rows in the range.
----2. A clustered index on date
Select COUNT (*) from the record where date >
' 19991201 ' and date < ' 19991214 ' and amount >
2000 (14 seconds)
Select Date,sum (amount) from record group by date
(28 seconds)
Select COUNT (*) from the record where date >
' 19990901 ' and place in (' BJ ', ' SH ') (14 seconds)
----Analysis:
----under the cluster index, the data is physically sequentially on the data page, and the duplicate values are grouped together, so that in the van
When looking around, you can find the starting point of this range, and only scan the data page in this range, avoiding the big fan
The scanning speed is improved.
----3. Combined index on Place,date,amount
Select COUNT (*) from the record where date >
' 19991201 ' and date < ' 19991214 ' and amount >
2000 (26 seconds)
Select Date,sum (amount) from record group by date
(27 seconds)
Select COUNT (*) from the record where date >
' 19990901 ' and place in (' BJ ', ' SH ') (< 1 seconds)
----Analysis:
----This is an unreasonable combination index because its leading column is place, and the first and second SQL are not cited
Use place, so there is no use of the index; the third SQL is in use, and all the columns referenced are included in the group
In the index, an index overlay is formed, so its speed is very fast.
----4. Combined index on Date,place,amount
Select COUNT (*) from the record where date >
' 19991201 ' and date < ' 19991214 ' and amount >
Watts (< 1 seconds)
Select Date,sum (amount) from record group by date
(11 seconds)
Select COUNT (*) from the record where date >
' 19990901 ' and place in (' BJ ', ' SH ') (< 1 seconds)
----Analysis:
----This is a reasonable combination of indexes. It takes date as the leading column so that each SQL can take advantage of the index and
And the index overlay is formed in the first and third SQL, so the performance is optimal.
----5. Summary:
----The index that is established by default is not a clustered index, but sometimes it is not optimal; a reasonable index design
Based on the analysis and prediction of various queries. Generally speaking:
----①. Have a large number of duplicate values, and often have scope queries
(Between, >,<,>=,< =) and ORDER BY
, GROUP by-by-occurrence column, consider establishing a clustered index;
----②. Multiple columns are frequently accessed at the same time, and each column contains duplicate values to consider establishing a composite index;
----③. Combined indexes as much as possible to make critical queries indexed, the leading columns must be the most frequently used columns.

Second, not sufficient conditions of the connection:
----Example: The card has 7896 rows, there is a nonclustered index on the card_no, and the table account has 191122 rows,
There is a nonclustered index on the account_no to take a picture of the execution of two SQL under different table join conditions:

Select SUM (a.amount) from account A,
Card b where a.card_no = B.card_no (20 seconds)
----Change SQL to:
Select SUM (a.amount) from account A,
Card b where a.card_no = B.card_no and A.
Account_no=b.account_no (< 1 seconds)
----Analysis:
----in the first connection, the best query scheme is to make account as the outer table, card as the inner table, using
The index on the card, whose I/O count can be estimated by the following formula:
----The outer table account 22541 page + (outer table account 191122 Line * Inner table card on the corresponding outer layer
3 pages to find in the first row of the table = 595,907 I/o
----in the second connection condition, the best query scheme is to make the card as the outer table, account as the inner table, using
The index on account, whose I/O count can be estimated by the following formula:
----on the Outer table card 1944 pages + (outer table card 7896 lines * Inner table account on account the corresponding outer table each
4 pages to find for rows = 33,528 I/o
----can be seen, the real best solution will be implemented only if there is a sufficient connection condition.
----Summary:
----1. The query optimizer lists several possible sets of connections, based on the join conditions, before it is actually executed
and find out the best solution for the least cost of the system. The connection condition takes into account the table with the index, the number of rows
table; The selection of the internal and external tables can be determined by the formula: the number of rows in the outer table that matches each lookup in the inner table, multiplied by
Product minimum is the best solution.
----2. View the method of executing a scenario--with Set Showplanon open the SHOWPLAN option, you can see the
The order in which the information is used and what index to use; To see more detailed information, the SA role is required to perform DBCC (3604,310,30
2).
Third, not-optimized where clause
----1. For example: the columns in the following SQL conditional statement have the appropriate indexes, but the execution speed is very slow:
SELECT * FROM record where
SUBSTRING (card_no,1,4) = ' 5378 ' (13 seconds)
SELECT * FROM record where
amount/30< 1000 (11 seconds)
SELECT * FROM record where
Convert (char, date,112) = ' 19991201 ' (10 seconds)
----Analysis:
The result of any operation of the column in the----WHERE clause is computed by column in SQL Runtime, so it has to
Do a table search without using the index above the column; If the results are available when the query is compiled,
Can be optimized by the SQL optimizer, using indexes to avoid table searches, so rewrite the SQL as follows:
SELECT * from record where card_no like
' 5378% ' (< 1 seconds)
SELECT * FROM record where amount
< 1000*30 (< 1 seconds)
SELECT * from record where date= ' 1999/12/01 '
(< 1 seconds)
----you'll find that SQL is obviously fast up!
----2. For example: The table stuff has 200000 rows and the Id_no index is not clustered, see the following sql:
Select COUNT (*) from stuff where id_no in (' 0 ', ' 1 ')
(23 seconds)
----Analysis:
The "in" in the----where condition is logically equivalent to ' or ', so the parser converts in (' 0 ', ' 1 ')
Executes for id_no = ' 0 ' or id_no= ' 1 '. We expect it to be looked up separately according to each or clause, and then the result
Added so that you can take advantage of the index on the Id_no, but actually (according to Showplan), it takes the "or policy"
, the row that satisfies each or clause is first fetched into the worksheet in the temporary database, and the unique index is used to remove
Repeat the row, and finally compute the result from this temporary table. As a result, the actual process does not use the Id_no index and ends
It's time to



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.