After unremitting efforts, you and your team finally successfully launched the website. At the beginning, there were fewer registered users and the website performance was good. However, as the number of registered users increased, the access speed began to slow down. Some users began to send emails to protest, and things became worse and worse. In order to keep users, you began to investigate the reasons for slow access. After intense investigation, you
After unremitting efforts, you and your team finally successfully launched the website. At the beginning, there were fewer registered users and the website performance was good. However, as the number of registered users increased, the access speed began to slow down. Some users began to send emails to protest, and things became worse and worse. In order to keep users, you began to investigate the reasons for slow access. After intense investigation, you
After unremitting efforts, you and your team finally successfully launched the website. At the beginning, there were fewer registered users and the website performance was good. However, as the number of registered users increased, the access speed began to slow down. Some users began to send emails to protest, and things became worse and worse. In order to keep users, you began to investigate the reasons for slow access.
After intense investigation, you find that the problem lies in the database. When the application attempts to access/update data, the database runs quite slowly. After further investigating the database, you find that database tables have grown a lot, and some tables even have tens of millions of rows of data. The test team started testing on the production database and found that the order submission process took 5 minutes, however, in the test before the website goes online, it takes 2/3 seconds to submit an order.
Similar stories are staged every day in every corner of the world, and almost every developer has encountered such things in his development career. I have also encountered such a situation many times, therefore, I hope to share my experience in solving such problems with you.
If you are in such a project, it is not a way to escape, but to face the reality bravely. First of all, I think your application does not write data access programs. I will introduce how to write the best data access programs in this series of articles, and how to optimize existing data access programs.
Range
Before the official start, it is necessary to clarify the writing boundary of this series of articles. I want to talk about "optimizing data access performance in transactional (OLTP) SQL Server databases ", however, the techniques described in this article can also be used on other database platforms.
At the same time, the skills I introduced are mainly intended for programmers. Although DBA is also a major force in optimizing databases, the optimization methods used by DBA are not covered in my discussion.
When a database-based application runs slowly, 90% of the applications may be due to data access issues, either not being optimized or failing to write code in the best way, therefore, you need to review and optimize your data access/processing programs.
I will talk about 10 steps to optimize the Data Access Program. Let's start with the basic index!
Step 1: Apply the correct index
The reason why I start with the index is that using the correct index will improve the performance of the production system. Another reason is that the creation or modification of the index is performed on the database and will not involve modifying the program, and you can see the results immediately.
Let's take a look at the basic indexing knowledge. I believe you already know what indexes are, but I still don't understand many people. I will give you a story first.
A long time ago, there were thousands of books in the big library of an ancient city, but the books on the shelves were not placed in any order. So whenever someone asks a book, the librarians only have to look for them one by one, and each time it takes a lot of time.
[This is like the fact that a data table does not have a primary key. When searching data in a table, the database engine must perform a full table scan, which is extremely inefficient.]
Worse, there are more and more books in the library, and the work of the librarians has become very painful. One day, a clever young man came up with a solution after seeing the painful work of the librarians, he suggested that each book be numbered and placed on the shelf by number. If someone specifies the book number, the librarians will soon be able to locate it.
[Just like creating a primary key for a table, a clustered index tree is created when a primary key is created. All rows in the table are physically sorted by the primary key value in the file system, when querying any row in a table, the database first uses the clustered index tree to find the corresponding data page (just like the first to find the bookshelves ), then find the target row based on the primary key value on the data page (just like finding a book on the shelf).]
As a result, the librarian began to give the book number and put the book on the shelf according to the number. It took him a whole day, but after the test, he found that the efficiency of searching for the book was greatly improved.
[Only one clustered index can be created for a table, just as books can only be placed according to one rule.]
But the problem is not completely solved, because many people can't remember the book number, but only remember the book name. The bookkeepers can only scan all the book numbers to find one by one, however, this time it took him only 20 minutes, but it took him 2-3 hours before he assigned a book number. However, it was too long to search for a book by number, so he asked the smart guy for help.
[It seems that you have added a primary key ProductID to the Product table, but no other indexes have been created. When you use the Product Name for retrieval, the database engine only needs to scan the entire table, searching one by one.]
The smart guy told the librarians that they had already created a book number. Now they only need to create another index or directory to store the book name and corresponding number together, however, this time it is sorted by book name. If someone wants to find the book "Database Management System", you only need to jump to the directory starting with "D, then you can find the book by number.
As a result, the librarians excitedly spent several hours creating a "book name" directory. After testing, now the time for searching for a book is reduced to 1 minute (30 seconds is used to find the number from the "book name" directory, and 30 seconds is used to find the book by the number ).
The librarians started to think about new things. Readers may also find books based on other attributes of the books, such as the author. So he created directories for the authors in the same way, now you can search for any book in one minute based on the book number, book name, and author. The work of the librarians has become easy and the story is over.
At this point, I believe you have fully understood the true meaning of the index. Suppose we have a Products table that creates a clustered index (automatically created based on the table's primary key). We also need to create a non-clustered index on the ProductName column. When creating a non-clustered index, the database engine automatically creates an index tree for non-clustered indexes (just like the "book name" directory in the story), and the product name is stored on the index page, each index page contains a certain range of product names and their corresponding primary key values. When the product name is used for retrieval, the database engine first searches for the non-clustered index tree based on the product name to find the primary key value, and then uses the primary key value to search for the clustered index tree to find the final product.
Displays the structure of an index tree.
Figure 1 Index Tree Structure
It is called the B + tree (or the Balance Tree). The intermediate node contains the range of values and guides the SQL engine to find specific index values. The leaf node contains the true index values, if this is a clustered index tree, the leaf node is the physical data page. If this is a non-clustered index tree, the leaf node contains the index value and the clustered index key (the database engine uses it to find the corresponding row in the clustered Index Tree ).
Generally, it takes no time to search for the target value in the index tree and then jump to the actual row. Therefore, indexes generally increase the data retrieval speed. The following steps will help you correctly apply the index.
Make sure that each table has a primary key.
This ensures that each table has a clustered index (the physical storage of the table on the disk is arranged in the order of the primary key). You can use the primary key to retrieve the data in the table or sort the data in the primary key field, or when the where clause specifies any range of primary key values, the speed is very fast.
Create a non-clustered index on the following columns:
1) frequently used in search;
2) used to connect other tables;
3) used for foreign key fields;
4) highly selective and neutral;
5) used BY the order by clause;
6) XML type.
The following is an example of creating an index:
CREATE INDEX
NCLIX_OrderDetails_ProductID ON
Dbo. OrderDetails (ProductID)
You can also use the SQL Server console to create an index on a table, as shown in figure 2.
Figure 2 create an index using the SQL Server console
Step 2: create an appropriate overwriting Index
Assume that you have created an index on the ProductID in the Sales table (SelesID, SalesDate, SalesPersonID, ProductID, Qty). Assume that the ProductID column is a highly-selected neutral column, any select query that uses the index column (ProductID) in the where clause will be faster. If no index is created on the foreign key, all scans will occur, however, there are still ways to further improve the query performance.
Assume that the Sales table has 10,000 rows of records, and the following SQL statement selects 400 rows (4% of the total number of rows ):
SELECT SalesDate, SalesPersonID FROM Sales WHERE ProductID =112
Let's take a look at how this SQL statement is executed in the SQL Execution engine:
1) The Sales table has a non-clustered index on the ProductID column, so it looks for the non-clustered index tree to find the records with ProductID = 112;
2) The index page containing ProductID = 112 records also contains all clustered index keys (all primary key values, that is, SalesID );
3) For each primary key (400 in this example), the SQL Server engine searches for the clustered index tree to find the location of the actual row on the corresponding page;
The SQL Server engine queries the values of the SalesDate and SalesPersonID columns from the corresponding rows.
For each primary key record with ProductID = 112 (400 here), the SQL Server engine searches for 400 clustered index trees to retrieve other columns specified in the query (SalesDate, salesPersonID ).
If the non-clustered index page contains the clustered index key and the values of the other two columns (SalesDate, SalesPersonID), the SQL Server engine may not perform the preceding steps 3rd and 4, the speed of directly searching the ProductID column from the non-clustered index tree is faster, and the values of these three columns are directly read from the index page.
Fortunately, there is a way to implement this function, which is called "overwriting Index". When you create a overwriting index on a table column, additional column values need to be specified and stored in the index page together with the clustered index key value (primary key. The following is an example of creating a covered index on the ProductID column of the Sales table:
Create index NCLIX_Sales_ProductID -- Index name
ON dbo. Sales (ProductID) -- Column on which index is to be created
INCLUDE (SalesDate, SalesPersonID) -- Additional column values to include
Overwrite indexes should be created on the columns that are frequently used in select queries, but not on the columns that include too many indexes, because the value that overwrite the index column is stored in the memory, this will consume too much memory and lead to performance degradation.
Use Database adjustment consultant to create covered Indexes
We know that when an SQL problem occurs, the optimizer in the SQL Server engine automatically generates different query plans based on the following factors:
1) Data Volume
2) Statistical data
3) index changes
4) parameter values in TSQL
5) server load
This means that for a specific SQL statement, even if the table and index structure are the same, the execution plan generated on the production server and the test server may be different, this also means that indexes created on the test server can improve the performance of the application, but creating the same index on the production server may not improve the performance of the application. Because the execution plan in the test environment uses the newly created index, the execution plan in the production environment may not use the newly created index (for example, A non-clustered index column is not a highly selected neutral column in the production environment, but may be different in the test environment ).
Therefore, when creating an index, we need to know whether the execution plan will actually use it, but how can we know it? The answer is to simulate the production environment load on the test server, and then create an appropriate index for testing. If this test finds that the index can improve the performance, therefore, it is more likely to improve the performance of applications in the production environment.
It is difficult to simulate a real load, but there are already many tools to help us.
SQL profiler is used to track production servers. Although it is not recommended to use SQL profiler in the production environment, sometimes there is no way to diagnose the key to performance problems. It must be used in profiler.
Use the trace file created by SQL profiler and use the database adjustment consultant on the test server to create a similar load. Most of the time, the adjustment consultant will give some suggestions on indexes that can be used immediately.
Step 3: Sort index fragments
You may have already created an index and all the indexes are working, but the performance is still poor. It is likely that index fragmentation is generated and you need to perform index fragmentation.
What is index fragmentation?
Due to the excessive insert, modify, and delete operations on the table, the index page is divided into multiple parts to form index fragmentation. If the index fragmentation is serious, the time for scanning the index will become longer, the index may even be unavailable, so the data retrieval operation slows down.
There are two types of index fragmentation: internal fragmentation and external fragmentation.
Internal fragmentation: In order to effectively use the memory and make the memory produce less fragments, the memory should be used in pages and pages. The last page is often not satisfied, therefore, internal fragments are formed.
External fragments: External fragments are formed when the segments are swapped in and out for sharing purposes. For example, after a 5 K segment is swapped out, a 4 K segment is placed in the original 5 k segment, therefore, 1 K external fragments are formed.
How do I know if index fragmentation has occurred?
Run the following SQL statement (the following statement can be run in SQL Server 2005 and later versions, and replace the AdventureWorks here with your database name ):
SELECT object_name (dt. object_id) Tablename, si. name
IndexName, dt. avg_fragmentation_in_percent
ExternalFragmentation, dt. avg_page_space_used_in_percent
InternalFragmentation
FROM
(
SELECT object_id, index_id, avg_fragmentation_in_percent, avg_page_space_used_in_percent
FROM sys. dm_db_index_physical_stats (db_id ('adventureworks'), null, null, 'detail'
)
WHERE index_id <>0) AS dt inner join sys. indexes si ON si. object_id = dt. object_id
AND si. index_id = dt. index_id AND dt. avg_fragmentation_in_percent>10
AND dt. avg_page_space_used_in_percent <75Order by avg_fragmentation_in_percent DESC
After execution, the index fragmentation information of the AdventureWorks database is displayed.
Figure 3 index fragmentation Information
Using the following rule analysis results, you can find out where the index fragmentation occurs:
1) The value of ExternalFragmentation is greater than 10, indicating that the corresponding index has undergone external fragmentation;
2) The value of InternalFragmentation <75 indicates that the corresponding index has internal fragmentation.
How to sort index fragments?
There are two ways to sort index fragments:
1) reorganize the index with fragments: execute the following command
Alter index all on TableName REORGANIZE
2) re-indexing: execute the following command
Alter index all on TableName rebuild with (FILLFACTOR = 90, ONLINE = ON)
You can also use the index name to replace the "ALL" keyword here to restructure or recreate a single index, or you can use the SQL Server console to sort index fragments.
Figure 4 Use the SQL Server console to sort index fragments
When Will restructuring and reconstruction be used?
When the external fragment value of the corresponding index is between 10-15 and the internal fragment value is between 60-75, re-build should be used in other cases.
It is worth noting that during index reconstruction, the table corresponding to the index will be locked, but the table will not be locked for reorganization. Therefore, in the production system, you should be careful when rebuilding the index of a large table, it may take several hours to create an index on a large table. Fortunately, since SQL Server 2005, Microsoft has proposed a solution, set the ONLINE option to ON to ensure that the table can still be used normally when the index is re-created.
Although indexes can increase the query speed, if your database is a transactional database, most of the time it is an update operation, updating data means updating the index, at this time, both query and update operations should be taken into account, because creating too many indexes in the OLTP database table will reduce the overall database performance.
I would like to give you a suggestion: If your database is transactional, each table cannot have more than five indexes on average. If your database is a data warehouse, on average, 10 indexes can be created for each table.