----people in the use of SQL often fall into a misunderstanding, that is too focused on the results are correct, and ignore the different implementation methods may exist between
Performance variance, which is particularly evident in large or complex database environments, such as online transaction processing OLTP or decision support system DSS
Explicitly In the work practice, the author finds that bad SQL often comes from improper index design, inadequate connection condition and the WHERE clause which is not optimized. In the
The speed of their operation is obviously improved after proper optimization. I will summarize these three aspects separately:
----to illustrate the problem more intuitively, the SQL run time in all instances is tested, and no more than 1 seconds is represented as (< 1 seconds).
----test environment--
----Host: HP LH II
----Frequency: 330MHZ
----Memory: 128 MB
----Operating system: Operserver5.0.4
----Database: Sybase11.0.3
First, unreasonable index design
----Example: The table record has 620000 rows, and the following SQL runs under different indexes:
----1. A non-clustered index was built on date
Select COUNT (*) from the record where date > ' 19991201 ' and date < ' 19991214 ' and amount >2000 (25 seconds)
Select Date,sum (amount) from record Group by date (55 seconds)
Select COUNT (*) from the record where date > ' 19990901 ' (' BJ ', ' SH ') (27 seconds)
----Analysis:
----date has a large number of duplicate values, the data is physically randomly placed on the data page under a non-clustered index, and a table scan must be performed to find all the rows in the range when it is searched.
----2. A clustered index on date
Select COUNT (*) from the record where date > ' 19991201 ' and date < ' 19991214 ' and amount >2000 (14 seconds)
Select Date,sum (amount) from record Group by date (28 seconds)
Select COUNT (*) from the record where date > ' 19990901 ' (' BJ ', ' SH ') (14 seconds)
----Analysis:
----under the cluster index, the data is physically in order on the data page, the duplicate values are also arranged together, so in the range lookup, you can first find the starting point of the range, and only in this range scan data pages, avoid a large range of scanning, improve the query speed.
----3. Combined index on Place,date,amount
Select COUNT (*) from the record where date > ' 19991201 ' and date < ' 19991214 ' and amount >2000 (26 seconds)
Select Date,sum (amount) from record Group by date (27 seconds)
Select COUNT (*) from the record where date > ' 19990901 ' and place in (' BJ, ' SH ') (< 1 seconds)
----Analysis:
----This is an unreasonable combination index because its leading column is place, the first and second SQL does not reference place, and therefore does not use the index; the third SQL uses place, and all the columns referenced are included in the composite index, which forms an index overlay. So its speed is very fast.
----4. Combined index on Date,place,amount
Select COUNT (*) from the record where date > ' 19991201 ' and date < ' 19991214 ' and amount >2000 (< 1 sec)
Select Date,sum (amount) from record Group by date (11 seconds)
Select COUNT (*) from the record where date > ' 19990901 ' and place in (' BJ ', ' SH ') (< 1 seconds)
----Analysis:
----This is a reasonable combination of indexes. It takes date as the leading column so that each SQL can take advantage of the index, and an index overlay is formed in the first and third SQL, thus achieving optimal performance.
----5. Summary:
----The index established by default is not a clustered index, but sometimes it is not optimal; a reasonable index design should be based on analysis and prediction of various queries
On Generally speaking:
----①. Have a large number of duplicate values, and often have scope queries
(Between, >,<,>=,< =) and ORDER BY
, GROUP by-by-occurrence column, consider establishing a clustered index;
----②. Multiple columns are frequently accessed at the same time, and each column contains duplicate values to consider establishing a composite index;
----③. Combined indexes as much as possible to make critical queries indexed, the leading columns must be the most frequently used columns.
Second, not sufficient conditions of the connection:
----Example: The card has 7896 rows, there is a nonclustered index on the card_no, the table account has 191122 rows, there is a nonclustered index on the account_no, and a preview of the execution of the two SQL under the different table join conditions:
Select SUM (a.amount) from the account A,card b where a.card_no = B.card_no (20 seconds)
----Change SQL to:
Select SUM (a.amount) from the account A,card b where a.card_no = B.card_no and A.account_no=b.account_no (< 1 seconds)
----Analysis:
----in the first connection, the best query solution is to make account as the outer table, card as the inner table, using the index card, the number of I/O can be estimated by the following formula:
----The outer table account 22541 pages + (outer table account 191122 Line * Inner table card on card on the first row of the outer table to find 3 pages) = 595,907 times I/O
----in the second connection, the best query scheme is to make the card as the outer table, account as an inner table, using the index on account, the number of I/O can be estimated by the following formula:
----on the Outer table card 1944 pages + (outer table card 7896 rows * Inside the table account for each row of the outer table to find 4 pages) = 33,528 times I/O
----can be seen, the real best solution will be implemented only if there is a sufficient connection condition.
----Summary:
----1. The query optimizer lists several sets of possible connectivity scenarios and finds the best solution for the least cost of the system, based on the join conditions, before the multiple table operations are actually executed. The join condition takes into account the table with the index, the table with many rows, and the selection of the inner and outer tables can be determined by the formula: the number of matches in the outer table * The count of each lookup in the inner-layer table, and the product minimum is the best scheme.
----2. View the method of executing a scenario--with Set Showplanon open the SHOWPLAN option, you can see the connection order, what index information to use;
For more detailed information, the SA role is required to perform DBCC (3604,310,302).
Third, not-optimized where clause
----1. For example: the columns in the following SQL conditional statement have the appropriate indexes, but the execution speed is very slow:
SELECT * FROM record where substring (card_no,1,4) = ' 5378 ' (13 seconds)
SELECT * from record where amount/30< 1000 (11 seconds)
SELECT * FROM record where convert (char, date,112) = ' 19991201 ' (10 seconds)
----Analysis:
The result of any operation of the column in the----WHERE clause is computed by column in the SQL runtime, so it has to do a table search instead of using the index above the column, and if the results are available at query compilation, you can optimize the SQL optimizer, use indexes, avoid table searches, So rewrite the SQL as
Here's how:
SELECT * from record where card_no like ' 5378% ' (< 1 seconds)
SELECT * FROM record where amount < 1000*30 (< 1 seconds)
SELECT * from record where date= ' 1999/12/01 ' (< 1 seconds)
----you'll find that SQL is noticeably faster.
----2. For example: The table stuff has 200000 rows and the Id_no index is not clustered, see the following sql:
Select COUNT (*) from stuff where id_no in (' 0 ', ' 1 ') (23 seconds)
----Analysis:
----the ' in ' of the Where condition is logically equivalent to ' or ', the parser converts in (' 0 ', ' 1 ') to Id_no = ' 0 ' or id_no= ' 1 ' to execute. We expect it to look for each or clause separately, then add the result, so that the index on the id_no can be used, but in fact (according to Showplan) it takes an "or policy", that is, the row that satisfies each or clause is first fetched into the worksheet in the temp database, A unique index is created to remove the duplicate row, and the result is computed from this temporary table. As a result, the actual process does not take advantage of the Id_no index, and the completion time is also affected by the performance of the tempdb database.
----practice shows that the more rows the table has, the worse the performance of the worksheet, when the stuff has 620000 rows, the execution time reaches 220 seconds. You might as well divide the OR clause
Open:
Select COUNT (*) from stuff where id_no= ' 0 '
Select COUNT (*) from stuff where id_no= ' 1 '
----Get two results and add it again. Because each sentence uses an index, the execution time is only 3 seconds, and the time is only 4 seconds under 620000 lines. Or, in a better way, write a simple stored procedure:
create proc Count_stuff as
DECLARE @a int
DECLARE @b int
DECLARE @c int
Declare @d char (10)
Begin
Select @a=count (*) from stuff where id_no= ' 0 '
Select @b=count (*) from stuff where id_no= ' 1 '
End
Select @c=@a+@b
Select @d=convert (char (), @c)
Print @d
----The results are calculated directly, the execution time is as fast as the above.
----Summary:
----visible, the so-called optimization where clause takes advantage of the index, the table scan or extra overhead is not optimized.
----1. Any action on the column will result in a table scan, which includes database functions, evaluation expressions, and so on, to move the action to the right of the equal sign whenever possible.
----2.in, or clauses often use worksheets to invalidate indexes; If you do not produce a large number of duplicate values, you can consider the opening of the sentence; The open clause should contain an index.
----3. To be adept at using stored procedures, it makes SQL more flexible and efficient.
----from the above examples, we can see that the essence of SQL optimization is in the premise of correct results, with the optimizer can identify the statement, full use of the index, reduce the number of I/O table scan, as far as possible to avoid the occurrence of table search. In fact, SQL performance optimization is a complex process, these are only a manifestation of the application level, in-depth research will also involve the database layer of resource allocation, network layer flow control and the overall design of the operating system layer.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.