About database Optimization Issues Collection Rollup _mssql

Source: Internet
Author: User
Tags ming
People ChildrenIn useSqlWhentend to get into aMissDistrict, that is tooOffNote on the resultingKnotWhether the fruit is correct, but ignores the differentRealizeMethod ofBetweenPossible performance differences,ThisPerformance differences in large orComplexof dataLibrary ringIn the environment (such asTheMachine thingAffairs DepartmentActingOltpor Decision Support DepartmentEcDss) in the tableIsTo be especiallyForMingExplicitly。

The author in the workRealIn the fulfillment ofFound, The BadSqlOften comes from improper indexing.Design, not sufficient.EvenAccess conditions and notExcellentof thewhereClause.
after the proper optimization of the line , the speed of its operation has been improved obviously!
The following three aspects are summarized separately:
For a more direct question , the SQL runtime in all instances has been tested and does not exceed over 1 seconds are expressed as (< 1 seconds). ----
Testing the Environment : Host:HP LH II---- main frequency :330MHZ---- Memory:128 trillion ----
Operation system :Operserver5.0.4----
Data base :Sybase11.0.3
An unreasonable index design ----
Example: The table record has 620000 rows, and try to see how the following SQL works under different indexes:
----1. A non-clustered index was built on date
Select COUNT (*) from the record where date > ' 19991201 ' and date < ' 19991214 ' and amount >2000 (25 seconds)
Select Date, sum (amount) from record Group by date (55 seconds)
Select COUNT (*) from the record where date > ' 19990901 ' (' BJ ', ' SH ') (27 seconds)
---- Analysis:----
Date has a large number of duplicate values , under the non-clustered index, the data is physically randomly stored on the data page , in fan Wai search , must be a row of table sweep Strokes to find all the lines within the enclosure.
----2. A clustered index on date
Select COUNT (*) from the record where date > ' 19991201 ' and date < ' 19991214 ' and amount >2000 (14 seconds)
Select Date,sum (amount) from record Group by date (28 seconds)
Select COUNT (*) from the record where date > ' 19990901 ' (' BJ ', ' SH ') (14 seconds)
---- Analysis:---- in the cluster index, the data is physically sorted on the data page, and the values are grouped together, so when you look in Fan Wai , You can find the end of this fan Wai , and only in this fan wai scan Data page , to avoid the large fan around sweeping strokes, improve the query speed.
----3. the amount index on Place, date, and the
Select COUNT (*) from the record where date > ' 19991201 ' and date < ' 19991214 ' and amount >2000 (26 seconds)
Select Date,sum (amount) from record Group by date (27 seconds)
Select COUNT (*) from the record where date > ' 19990901 ' and place in (' BJ, ' SH ') (< 1 seconds)
---- Analysis:---- This is a not very reasonable index, because its front row is place , first and second SQL There is no reference to place and therefore no use of the index; the third SQL uses Place, and all the columns referenced are included in the Group Index, which forms an index overlay. So its speed is very fast.
----4. the amount index on date, place, and the
Select COUNT (*) from the record where date > ' 19991201 ' and date < ' 19991214 ' and amount >2000 (< 1 sec)
Select Date,sum (amount) from record Group by date (11 seconds)
Select COUNT (*) from the record where date > ' 19990901 ' and place in (' BJ ', ' SH ') (< 1 seconds)
---- Analysis:---- This is a reasonable index of the group . It uses date as the forward column so that each SQL can take advantage of the index, and an index overlay is formed in the first and third SQL. Thus the performance achieves the most excellent .
----5. Summary:----
The index that is established by default is not a clustered index, but it is not optimal when it is available; a reasonable index design should be based on the Analysis and prediction of each query .
Generally to say :
①. There is a large number of re values , and frequently fan wai Query (between, >,< ,>=,< =) and order by , GROUP BY sending the column, may consider establishes the cluster index;
②. Multiple columns are accessed frequently at the same time, and each column contains the re value to establish a group index;
③. In order to make the key query to be indexed, the index of the index must be used as the most frequent columns.
Second, the not sufficient connection condition:
Example: The table card has 7896 row, has a nonclustered index on the card_no, the table account has 191122 row, in account_no with a nonclustered index on it, try to see the two- SQL rows under different table-connection conditions :
Select SUM (a.amount) from the account A,card b where a.card_no = B.card_no (20 seconds)
Select SUM (a.amount) from the account A,card b where a.card_no = B.card_no and A.account_no=b.account_no (< 1 seconds)
---- Analysis:---- in the first connection conditions, the best query scheme is to account for the external table, cardas the inner layer table, using the index on the card, the number of I/O can be estimated by the following formula :
22541 page + on external table account ( 191122 Line of external table account) * internal table card on the corresponding outer floor table of the first line to look up the 3 pages )= 595907 Secondary I/O
Under the second connecting condition, the best query scheme is to make the card as the outer layer table, account as the inner table, use account Index, the number of I/O can be estimated by the following formula : 1944 page + on outer table card +(outer layer table card of the 7896 line * inside the table account corresponding to the outer layer table per a line of 4 pages to look for )= 33528 secondary I/O
As can be seen , only the full connection conditions, the real best plan will be the line.
Summarize:
1. Multi-table operation in the actual practice before, the query will be based on the connection conditions, listed several groups of possible connection schemes and find out the system cost The smallest best solution. The connection condition should be fully examined with indexed tables and rows; The selection of the inner and outer tables can be determined by the formula: the number of matching rows in the outer-layer table * inner table The number of times per search is determined, and the minimum multiplier is the best scheme.
2. Look up the method of the practice line- - with set SHOWPLAN on, Open the Showplan option , you can see even the information to follow the sequence, what kind of index to use; For more detailed information, the SA role is required for DBCC (3604,310,302) .
Iii. the where clause is not optimized
1. Example: The columns in the following SQL conditional sentence have the appropriate index, but the speed is very slow:
SELECT * FROM Record wheresubstring (card_no,1,4) = ' 5378 ' (13 seconds)
SELECT * FROM record whereamount/30< 1000 (11 seconds)
SELECT * FROM Record Whereconvert (char (), date,112) = ' 19991201 ' (10 seconds)
Analysis:
any action on a column in the WHERE clause is obtained by a column-by-row count at SQL runtime , so it has to be searched in the rows table without using the the index above the column;
If these results are available when the query is compiled, then it can be optimized by the SQL optimization , using indexes to avoid table searches, So rewrite the SQL as follows :
SELECT * from record where card_no like ' 5378% ' (< 1 seconds)
SELECT * from record where amount< 1000*30 (< 1 seconds)
SELECT * from record where date= ' 1999/12/01 ' (< 1 seconds)
You will find that the SQL Ming is fast!
2. Example: The table stuff has 200000 rows, andid_no has non-clustered indexes, see the following SQL:
Select COUNT (*) from stuff where id_no in (' 0 ', ' 1 ') (23 seconds)
Analysis: The "in" in the----where condition is logically equivalent to ' or ', so the parser will be in (' 0 ', ' 1 ') Convert to id_no = ' 0 ' or id_no= ' 1 ' to hold the line.
We expect it to be divided according to each or clause , then add the fruit, so that you can use the index on the id_no;
But in practice (according to Showplan), it uses the "OR strategy ", that is, the full of every the rows of the OR clause, stored in the worksheet on the temporary database, and then set up a unique index to remove the redo rows, and finally count from this temporary table calculate the fruit of the knot . As a result, the actual pass does not use the id_no Index, and the finish time is also affected by The performance of the tempdb database.
In practice, the greater the number of rows in the table, the worse the performance of the worksheet, when the stuff has a 620000 line , the length of the line is up to 220 seconds! You might as well separate the or clause :
Select COUNT (*) from stuff where id_no= ' 0 ' select COUNT (*) from stuff where id_no= ' 1 '
Get two fruit and add it again. Because the index is used for each sentence, there is only 3 seconds to hold the line, and the time is only 4 seconds under the 620000 line.
Or, in a better way, write a simple storage process:
create proc Count_stuff asdeclare @a intdeclare @b intdeclare @c intdeclare @d char (a) beginselect @a=count (*) from stuff Where id_no= ' 0 ' select @b=count (*) from stuff where id_no= ' 1 ' endselect @c=@a+ @bselect @d=convert (char (), @c) print @d
Calculate the fruit directly , and hold the line of time with the above a kind of fast!
----Summary: ----
Can see , the so-called optimization is the where clause to use the index, can not be optimized that is to send out the table sweeping strokes or the amount of the outside Overhead .
1. any action on the column will lead to table sweeping , which includes data library functions, accounting expressions, and so on, to the query when possible to move the operation to the equal sign right Side .
2.in,or clauses often use the worksheet, so that the index is invalid, if not produce a large number of duplicate values , you can take into account the sentence of the split open ; should include an index in the clause .
3. It makes SQL more flexible and efficient by using a stored storage process.
From the above examples, we can see that the essence of theoptimization of SQL is in the fruit of the correct premise, with the gifted can be recognized by the words , Make full use of the index, reduce the number of I/O to the table scan , try to avoid the birth of the table search . The performance optimization of its real SQL is a complex process, these are only in the layer of a seed , in-depth research will also involve The data base layer of the source configuration, network layer flow control and Operation Department the General body design of the unified layer .
1, if developers use other libraries of table or view, it is necessary to establish a view in the current library to achieve cross library operations, it is best not to directly use "Databse.dbo.table_name", because sp_ Depends cannot display the cross Library table or view used by the SP, which is inconvenient to verify.

2. Before submitting the SP, the developer must have already analyzed the query plan using SET SHOWPLAN on, and has done its own query optimization check.

3, high program operation efficiency, optimize the application, in the SP writing process should pay attention to the following points:

A) Usage specification for SQL:

I. Avoid big business operation, use HOLDLOCK clause carefully, improve system concurrent ability.

Ii. try to avoid repeated access to the same or several tables, especially the large data table, you can consider the conditions to extract data to the temporary table, and then do the connection.

Iii. avoid the use of cursors, because the cursor is inefficient, if the cursor operation of more than 10,000 rows of data, it should be overwritten, if you use a cursor, you should try to avoid the cursor loop in the table join the operation.

Iv. Note where the sentence is written, must consider the order of the sentence, should be based on the index order, range size to determine the order of the conditional clauses, as far as possible to make the field order and the index order consistent, ranging from large to small.

V. Do not perform functions, arithmetic operations, or other expression operations on the left side of "=" in the WHERE clause, or the system may not use the index correctly.

VI. Try to use exists instead of select COUNT (1) to determine whether a record exists, the Count function is only used in all the rows in the statistics table, and COUNT (1) is more efficient than count (*).

VII. Try to use ">=" and do not use ">".

Viii. note the substitution between some or clauses and the Union clause

IX. note the data types of the connections between tables to avoid connections between different types of data.

X. Note the relationship between parameters and data types in stored procedures.

XI. Note the amount of data in the INSERT, update operation to prevent conflicts with other applications. If the data volume exceeds 200 data pages (400k), the system will lock up and the page level lock will be upgraded to a table-level lock.

b The usage specification of the index:

I. Index creation to be considered in conjunction with the application, it is recommended that large OLTP tables not exceed 6 indexes.

Ii. use indexed fields as much as possible as query criteria, especially clustered indexes, which can be enforced by index index_name if necessary

Iii. Avoid table scan when querying large tables and consider new indexes if necessary.

Iv. when using an indexed field as a condition, if the index is a federated index, the first field in the index must be used as a condition to ensure that the index is used by the system, otherwise the index will not be used.

V. To pay attention to the maintenance of the index, periodically rebuild the index, recompile the stored procedure.

(c) Specification for use of tempdb:

I. Try to avoid using distinct, order BY, group BY, have, join, ***pute, because these statements will aggravate the burden of tempdb.

Ii. avoid frequent creation and deletion of temporary tables and reduce the consumption of system table resources.

Iii. in creating a new temporary table, if the amount of data inserted at a time is large, you can use SELECT INTO instead of CREATE table, to avoid log, improve speed, if the amount of data is small, in order to ease the resources of the system table, we recommend create table first, and then insert.

Iv. if the data for a temporary table is large and needs to be indexed, the process of creating a temporary table and indexing should be placed in a separate child stored procedure to ensure that the system is able to use the index of the temporary table well.

V. If you use a temporary table, be sure to explicitly delete all temporary tables at the end of the stored procedure, truncate the table, and then drop the table, which avoids the longer locking of the system tables.

Vi. careful use of large temporary tables and other large table connection query and modification, reduce the burden of the system table, because this operation will be in a statement many times using the system table of tempdb.

d) Reasonable algorithm use:
based on the SQL optimization described above and the SQL optimized content in the ASE Tuning manual, a combination of algorithms is used to compare to achieve the least-consumed and most efficient method. Specific ASE tuning commands are available: SET STATISTICS IO on, SET statistics time on, set Showplan on, and so on  

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.