SQL Optimization Problems

Last Update:2018-12-05 Source: Internet

Author: User

Tags how to use sql server how to use sql

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I was in the stage of solving the SQL problem and producing results. Recently, my colleagues started to pay attention to this problem because of slow SQL, I want to pay more attention to this issue today.

I sorted out some materials:

People tend to fall into a misunderstanding when using SQL, that is, they are too concerned about whether the obtained results are correct, while ignoring the possibility of different implementation methods.
Performance differences, especially in large or complex database environments (such as online transaction processing (OLTP) or decision support system DSS)
. In my work practice, I found that poor SQL statements often come from inappropriate index design, unfilled connection conditions, and unoptimized where clauses. In
After they are properly optimized, their running speed is significantly improved! I will summarize the following three aspects:

For more intuitive explanation of the problem, the SQL running time of all instances has been tested, and the SQL running time cannot exceed 1 second is expressed as (<1 second ).

Test environment --
HOST: HP LH II
Clock speed: 330 MHz
Memory: 128 MB
Operating System: operserver5.0.4
Database: sybase11.0.3

　　I. unreasonable index design
For example, a table with 620000 rows of record and rows with different indexes can run the following SQL statements:

1. Create a non-clustered index on Date

Select count (*) from record where date> '20160301' and date <'20160301' and amount> 19991201 (25 seconds)
Select date, sum (amount) from record group by date (55 seconds)
Select count (*) from record where date> '123' and place in ('bj ', 'sh') (27 seconds)

Analysis:
There are a large number of duplicate values on date. In non-clustered indexes, data is physically stored on the data page at random. During range search, you must perform a table scan to find all rows in this range.

2. A cluster index on Date

Select count (*) from record where date> '20160901' and date <'20160901' and amount> 19991201 (14 seconds)
Select date, sum (amount) from record group by date (28 seconds)
Select count (*) from record where date> '2013' and place in ('bj ', 'sh') (14 seconds)

Analysis:
Under the cluster index, data is physically stored on the data page in order, and duplicate values are arranged together. Therefore, you can first find the start and end points of this range during range search, in addition, only data pages are scanned within this range, which avoids large-scale scanning and improves the query speed.

3. composite indexes on place, date, and amount

Select count (*) from record where date> '20160301' and date <'20160301' and amount> 19991201 (26 seconds)
Select date, sum (amount) from record group by date (27 seconds)
Select count (*) from record where date> '2013' and place in ('bj, 'sh') (<1 second)

Analysis:
This is an unreasonable composite index, because its leading column is place, the first and second SQL statements do not reference place, so the upper index is not used; the third SQL uses place, all referenced columns are included in the composite index, which forms an index overwrite, so it is very fast.

4. Combined indexes on date, place, and amount
Select count (*) from record where date> '20160301' and date <'20160301' and amount> 19991201 (<1 second)
Select date, sum (amount) from record group by date (11 seconds)
Select count (*) from record where date> '2013' and place in ('bj ', 'sh') (<1 second)

Analysis:
This is a reasonable composite index. It uses date as the leading column, so that each SQL can use the index, and the index coverage is formed in the first and third SQL statements, so the performance is optimal.

5. Conclusion:

The index created by default is a non-clustered index, but sometimes it is not the best. A reasonable index design should be based on the analysis and prediction of various queries. Generally speaking:
① There are a large number of repeated values and frequent range queries
(Between, >,<>=, <=) and order by, group by columns, you can consider creating a cluster index;

②. Multiple columns are frequently accessed at the same time, and each column contains duplicate values. You can consider creating a composite index;

③ Composite indexes should try to overwrite key queries, and the leading column must be the most frequently used column.

　　　 2. Incomplete connection conditions:
For example, the table card has 7896 rows, there is a non-clustered index on card_no, the table account has 191122 rows, and there is a non-clustered index on account_no, explain execution of two SQL statements under different table connection conditions:

Select sum (A. Amount) from account A, card B where a. card_no = B. card_no (20 seconds)

Change SQL:
Select sum (A. Amount) from account A, card B where a. card_no = B. card_no and A. account_no = B. account_no (<1 second)

Analysis:
Under the first join condition, the optimal query scheme is to use the account as the outer table, and the card as the inner table. The I/O times of the card can be estimated by the following formula:

Outer table account page 22541 + (the first row of the outer table account * the third page corresponding to the first row of the outer table on the card of the inner table) = 191122 times I/O

Under the second join condition, the best query scheme is to use card as the outer table and account as the inner table. The number of I/O times of the account can be estimated by the following formula:

1944 page + on the outer table card (the fourth row of the outer table card * The fourth page corresponding to each row of the outer table on the inner table account) = 7896 times I/O

It can be seen that only a full set of connection conditions can be executed for the best solution.

Summary:

1. Before a multi-table operation is executed, the query optimizer will list several possible connection solutions based on the connection conditions and find the best solution with the minimum system overhead. The join conditions must fully consider the tables with indexes and tables with multiple rows. The selection of the internal and external tables can be determined by the formula: Number of matched rows in the outer table * Number of times each query is performed in the inner table, the minimum product is the best solution.

2. view the method of execution solution-use set showplanon to open the showplan option, and you will be able to see the connection sequence and index information.
For more details, you must use the SA role to execute DBCC (3604,310,302 ).

　　　 3. Where clause that cannot be optimized
1. For example, the columns in the following SQL condition statements have an appropriate index, but the execution speed is very slow:

Select * from record where substring (card_no, 5378) = '000000' (13 seconds)
Select * from record where amount/30 <1000 (11 seconds)
Select * from record where convert (char (10), date, 112) = '000000' (10 seconds)

Analysis:
Any operation results on the column in The WHERE clause are calculated by column one by one during SQL Execution. Therefore, it has to perform table search without using the index on the column; if these results are obtained during query compilation, they can be optimized by the SQL optimizer and indexed to avoid table search. Therefore, the SQL statement is rewritten
As follows:

Select * from record where card_no like '000000' (<1 second)
Select * from record where amount <1000*30 (<1 second)
Select * from record where date = '2014/1/01' (<1 second)
You will find that SQL is getting faster!

2. For example, the stuff table has 200000 rows and the id_no table has non-clustered indexes. See the following SQL statement:

Select count (*) from stuff where id_no in ('0', '1') (23 seconds)

Analysis:
The 'in' in the where condition is logically equivalent to 'or', so the syntax analyzer will set in ('0', '1 ') convert to id_no = '0' or id_no = '1' for execution. We expect that it will find the results separately based on each or clause, and then add the results, so that the index on id_no can be used; but in fact (according to showplan), it adopts the "or policy ", that is, first extract the rows that satisfy each or clause, store them to the worksheet of the temporary database, create a unique index to remove duplicate rows, and finally calculate the results from this temporary table. Therefore, the index on id_no is not used in the actual process, and the completion time is also affected by the performance of the tempdb database. Practice has proved that the more rows in a table, the worse the performance of the worksheet. When stuff has 620000 rows, the execution time reaches 220 seconds! It is better to separate the or clause:

Select count (*) from stuff where id_no = '0'
Select count (*) from stuff where id_no = '1'

Two results are obtained, and the addition is more cost-effective. Because each sentence uses an index, the execution time is only 3 seconds. In the case of 620000 rows, the execution time is only 4 seconds. Or, write a simple stored procedure in a better way:
Create proc count_stuff
Declare @ A int
Declare @ B INT
Declare @ C int
Declare @ d char (10)
Begin
Select @ A = count (*) from stuff where id_no = '0'
Select @ B = count (*) from stuff where id_no = '1'
End
Select @ C = @ A + @ B
Select @ d = convert (char (10), @ C)
Print @ d

Calculate the result directly, and the execution time is as fast as above!
Summary:

It can be seen that the WHERE clause uses the index and cannot be optimized, that is, table scanning or additional overhead occurs.

1. Any operation on the column will cause the table to scan, including database functions and calculation expressions. During the query, try to move the operation to the right of the equal sign.

2. the in and or clauses usually use worksheets to invalidate the index. If there are no large number of duplicate values, consider splitting the clause. The split clause should contain the index.

3. Be good at using stored procedures to make SQL more flexible and efficient.

From the above examples, we can see that the essence of SQL optimization is to use the statements that can be recognized by the optimizer and use indexes in full to reduce the number of I/O scans on the table, avoid table search as much as possible. In fact, SQL Performance optimization is a complex process. The above is only a manifestation of the application layer, in-depth research will also involve resource configuration at the database layer, traffic control at the network layer, and the overall design of the operating system layer.

SQL Server has several tools that let you detect, adjust, and optimize SQL server performance. In this article, I will
Describes how to use SQL server tools to optimize the use of database indexes. This article also involves general knowledge about indexes.
Common knowledge about Indexes
　　
Index is the biggest factor affecting database performance. Due to the complexity of the problem, I can only simply talk about it.
But there are several good books for you to refer. I will only discuss two types here
SQL Server indexes, that is, clustered indexes and nonclustered indexes. When you look at the types of indexes you have created
Consider the data type and the column that stores the data. Similarly, you must consider the types of queries that the database may use.
Index type
If column stores highly relevant data and is frequently accessed in sequence, it is best to use the clustered index,
This is because if clustered indexes are used, SQL Server physically sorts data in ascending (default) or descending order.
Column to quickly find the queried data. Similarly, when the search is controlled within a certain range
It is also best to use the clustered index for column. This is because there is only one physical data rearrangement on each table.
In contrast to the above, if columns contains poor data relevance, you can use the nonculstered index.
Can you use up to 249 nonclustered indexes in a table ?? Even though I can't imagine it will be used in practical applications
When the table uses the primary key (primary keys), SQL Server automatically
Column (s) creates a unique Cluster Index. Obviously, creating a unique index for these columns (s) means that the primary
The uniqueness of a key word. When establishing a foreign key relationship, if you plan to use it frequently
Creating a nonclustered index on the keyword cloumn is a good method. If the table has a clustered index
It uses a linked list to maintain the relationship between data pages. If the table does not have a clustered index
Data Page
When an index is created, sqlserver creates a data page (datapage) to accelerate search.
Pointer. When an index is created, the corresponding fill factor is also set. The purpose of setting the fill factor is
Indicates the percentage of data pages in the index. Over time, database updates consume existing free space.
The page is split. The consequence of page splitting is that the index performance is reduced. Therefore, queries using this index will cause data storage.
Storage is fragmented. When an index is created, the index fill factor is set. Therefore, the fill factor cannot be dynamic.
To update the fill factor on the data page, we can stop the old index, recreate the index, and reset the fill factor.
(Note: This will affect the operation of the current database and should be used with caution in important cases ). DBCC indexdefrag and
DBCC dbreindex is two commands used to clear clustering and nonculstered index fragments. Indexdefrag is a type
Line operations (that is, it does not block other table actions, such as queries), while dbreindex physically reconstructs the index.
. In most cases, re-indexing can better eliminate fragments, but this advantage is to block the current
Other actions on the table are quoted as the cost. When a large shard index appears, indexdefrag takes a while
It takes a long time because the command is run based on a small interactive block (transactional block ).
Fill Factor
When you execute any of the above measures, the database engine can return the indexed data more effectively. About Filling
The topic of fillfactor is beyond the scope of this article, but I still remind you to pay attention to
When executing a query, SQL Server dynamically selects which index to use. Therefore, SQL Server divides each index
The statistic on the keyword to determine which index to use. It is worth noting that, after routine database activities (such as inserting
In, delete, and update Tables). These statistics used by SQL Server may have expired and need to be updated. You can
You can run DBCC showcontig to view the statistics status. You can
Execute the update statistics command of the table, so that SQL Server refreshes the information about the index.
Create a database maintenance plan
SQL Server provides a tool to simplify and automatically maintain databases. This is called the database maintenance plan wizard (
Database Maintenance Plan wizard, dmpw) tool also includes the optimization of the index. If you run this
Wizard, you will see the index statistics in the database, these statistics are used as logs and updated regularly, this
This reduces the workload caused by manual Index reconstruction. If you do not want to automatically refresh index statistics regularly, you can
In dmpw, re-organize the data and data pages. This will stop the old index and re-create the index based on the specific fill factor. Suppose you want to find a sentence in the book. You can search one page by one, but it takes a lot of time. By using indexes, you can quickly find the topic you want to search.
The index of a table is very similar to the index attached to a book. It can greatly improve the query speed. For a large table, adding an index usually takes several hours to complete a query. Therefore, there is no reason to add indexes to tables that require frequent queries.
Note:
When your memory capacity or hard disk space is insufficient, you may not want to add an index to a table. For databases that contain indexes, SQL server requires a considerable amount of extra space. For example, to create a clustered index, it takes about 1.2 times the data size. You can use SystemSp_spaceused: Specifies the name of the table to be indexed.

Clustered index and non-clustered Index
Suppose that the index of your book finds the page number of a sentence. Once you know the page number, you may find the book without any reason until you find the correct page number. Through random lookup, you can finally reach the correct page number. However, there is a more effective way to find the page number.
First, flip the book to about half of the place. If the page number to be searched is smaller than the page number at the half-book, the book will go to 1/4; otherwise, the book will go to 3/4. In this way, you can continue to divide the book into smaller parts until you find the correct page number. This is a very effective way to find a book page.
SQL Server table indexes work in a similar way. A table Index consists of a group of pages, which constitute a tree structure. By pointing to the other two pages, the root page logically divides the records of a table into two parts. The two pages pointed to by the root page split the records into smaller parts. Each page divides the record into smaller segments until it reaches the leaf-level page.
There are two types of indexes: clustered index and non-clustered index. In a clustered index, the leaf page of the index tree contains actual data: The index order of the records is the same as that of the physical order. In a non-clustered index, the leaf-level page points to the records in the table: the physical sequence of records is not necessarily related to the logical sequence.
The cluster index is very similar to a directory table. The order of the Directory table is the same as that of the actual page number. Non-clustered indexes are more like the standard index table of books. The order of the index table is usually different from that of the actual page number. A book may have multiple indexes. For example, it may have both the subject index and the author index. Similarly, a table can have multiple non-clustered indexes.
Generally, you use clustered indexes, but you should understand the advantages and disadvantages of both types of indexes.
Each table can have only one clustered index, because records in one table can only be stored in one physical order. Generally, you need to create a clustered index for a table based on the Identification field. However, you can also create a clustered index for other types of fields, such as numeric, numeric, and datetime fields.
Retrieving data from a table with a clustered index is faster than creating a non-clustered index. When you need to retrieve data within a certain range, it is better to use clustered indexes than to use non-clustered indexes. For example, suppose you use a table to record the activities of visitors on your site. If you want to retrieve the login information within a certain period of time, you should create a clustered index for the datetime type field in this table.
The main restriction on clustering indexes is that each table can only create one clustering index. However, a table can have more than one non-clustered index. In fact, you can create up to 249 non-clustered indexes for each table. You can also create clustering indexes and non-clustering indexes for a table at the same time.
Assume that you want to retrieve data not only by date, but also by user name from your site activity log. In this case, creating a clustered index and a non-clustered index at the same time is effective. You can create a clustered index for the date and time fields and a non-clustered index for the user name field. If youFoundYou need more indexing methods. You can add more non-clustered indexes.
Non-clustered indexes require a large amount of hard disk space and memory. In addition, although non-clustered indexes can speed up data retrieval from tables, they can also speed up data insertion and update to tables. Whenever you change the data in a table with a non-clustered index, you must update the index at the same time. Therefore, you must carefully consider creating a non-clustered index for a table. If you expect a table to frequently update data, do not create too many non-clustered indexes on it. In addition, if the hard disk and memory space are limited, you should also limit the number of non-clustered indexes.

Index attributes
Both types of indexes have two important attributes: You can use either of the two types to create an index (Composite Index) for multiple fields at the same time ); both types of indexes can be specified as unique indexes.
You can create a composite index or even a composite clustered index for multiple fields. Assume that a table records the surnames and names of visitors at your sites. If you want to retrieve data from the table based on the full name, you need to create an index for both the Last Name field and the name field. This is different from creating a separate index for the two fields respectively. When you want to query more than one field at the same time, you should create an index for multiple fields. If you want to query each field separately, you should create an independent index for each field.
Both types of indexes can be specified as unique indexes. If you create a unique index for a field, you cannot enter duplicate values for the field. An ID field automatically becomes a unique value field, but you can also create a unique index for other types of fields. Assume that you use a table to save the user password of your website. Of course, you do not want two users to have the same password. By forcing a field to become a unique value field, you can prevent this situation.

Create an index using SQL
To create an index for a table, start the iSQL/W program in the SQL server program group on the taskbar. Enter the query window and enter the following statement:

Create index mycolumn_index on mytable (myclumn)

This statement creates an index named mycolumn_index. You can give an index any name, but you should include the field name of the index in the index name, which is helpful for you to figure out the intention of creating the index in the future.
Note:
After the command is executed, the following information is received:
This command did not return data, and it did not return any rows
This indicates that the statement is successfully executed.
Index mycolumn_index to the mycolumn field of the table mytable. This is a non-clustered index and a non-unique index. (This is the default attribute of an index)
If you want to change the index type, you must delete the original index and recreate it. After an index is created, you can use the following SQL statement to delete it:

Drop index mytable. mycolumn_index

Note that you must include the table name in the drop index statement. In this example, the index you deleted is mycolumn_index, which is the index of mytable.
To create a clustered index, you can use the keyword clustered. Remember that a table can only have one clustered index.
Here is an example of how to create a clustered index for a table:

Create clustered index mycolumn_clust_index on mytable (mycolumn)

If the table contains duplicate records, an error occurs when you try to use this statement to create an index. However, you can create indexes for tables with duplicate records. You only need to use the keyword allow_dup_row to tell SQL Sever:

Create clustered index mycolumn_cindex on mytable (mycolumn) with allow_dup_row

This statement creates a clustered index that allows Repeated Records. You should try to avoid repeated records in a table. However, if the record already appears, you can use this method.
To create a unique index for a table, you can use the keyword unique. This keyword can be used for clustered indexes and non-clustered indexes. Here is an example:

Create unique coustered index myclumn_cindex on mytable (mycolumn)

This is the index creation statement that you will frequently use. Whenever possible, you should try to create a unique clustered index for a table to enhance query operations.
Finally, you need to create an index for multiple fields -- composite index -- the index creation statement contains multiple field names at the same time. The following example creates an index for the firstname and lastname fields:

Create index name_index on username (firstname, lastname)

In this example, a single index is created for two fields. In a composite index, You Can index up to 16 fields.

Create an index using the Transaction Manager
Creating an index using the Transaction Manager is much easier than using SQL statements. Using the Transaction Manager, you can view the list of created indexes and select the index option on the GUI.
You can use the Transaction Manager to create an index in two ways: Use the manage tables window or use the manage indexes window.
Create a new index in the manage tables window and click the advanced Options button (it looks like a table with a plus sign in front ). The advanced Options dialog box is displayed. In this dialog box, some labels are named primary key.

To create a new index, select the field name you want to create an index from the drop-down list. If you want to create an index for multiple fields, you can select multiple field names. You can also choose whether the index is clustered or non-clustered. After the table information is saved, the index is automatically created. A key appears next to the field name in the manage tables window.
You have created a "primary index" for your table ". The primary index must be created for fields that do not contain null values. In addition, the primary index forces a field to become a unique value field.
To create indexes without these restrictions, you need to use the manage indexes window. Select Manage | indexes from the menu to open the manage indexes window. In the manage indexes window, you can select a table and a specific index from the drop-down list. (See Figure 11.2 ). To create a new index, select new index. From the index drop-down box, and then select the field to be indexed. Click Add to add the field to the index.

You can select many different options for your index. For example, you can choose whether the index is clustered or non-clustered. You can also specify this index as a unique index. After the index is designed, click build to create the index.

Note:
A unique index means that this field cannot have duplicate values, rather than creating this index.Maintain the index of a table in SQL Server

Step 1: Check whether maintenance is required and check whether the scan density/scan density is 100%

Declare @ table_id int

Set @ table_id = object_id (table name)

DBCC showcontig (@ table_id)

Step 2: rebuild the table Index

DBCC dbreindex (Table Name, pk_index name, 100)

Redo the first step. If the scan density/scan density is less than 100%, rebuild all the indexes in the table.

-- YANG Jing: Not necessarily 100%.

DBCC dbreindex (Table Name, 100)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More