Setting filter conditions to improve indexing efficiency
Good indexes are key to SQL Server database performance, but efficient indexes are carefully designed. As we all know, a primary key is a unique identifier for a data object, and if there is no clustered index in the datasheet, the SQL Server database will create a clustered index (Clustered index) for the primary key by default in order to maintain the uniqueness of the key. Unless the user specifically specifies that the index should be created as a nonclustered index (non-clustered index).
There is no doubt that we should create clustered indexes for frequently accessed data, and of course frequently accessed fields should be analyzed and carefully selected, and the index values should be as short as possible. When you mention creating an index, people tend to think of primary keys first, but primary key data is not necessarily accessed frequently, and many times in order to ensure primary key uniqueness, primary key values are often not very short, such as we often choose the global Unique identifier (GUID) type as the primary key data type, The length of a unique identifier is typically 16 bytes, which is not the ideal clustered index option in terms of length, in which case nonclustered indexes can be created for the primary key, because the primary key value is highly efficient in the where statement to query for a particular record. Creating a nonclustered index can bring the query to a higher level of efficiency. If you selected an integral type as the primary key data type, consider generating a clustered index for the primary key.
SQL Server 2008 provides us with another index--Set filter criteria index (filtered index), a set filter condition index is a special nonclustered index, it is a specific subset of some fields. In other words, the Set filter criteria index is generated based on a subset of the selected fields. For example, in the sales performance data sheet, the branch's city data is stored in the Cities field, if we create a nonclustered index, then all the branches of the city will be included in the index. But if we use the Set Filter criteria index, we can select only a portion of the city to be indexed, such as Beijing, Shanghai and Guangzhou, the code is as follows:
CREATE nonclustered INDEX filteredcities on Sales [city] where city in (' Beijing ', ' Shanghai ', ' Guangzhou ')
The difference with regular indexes is that we use the WHERE statement to set our filter conditions. Assuming that most of the company's revenue comes from these three cities, then our database queries will frequently access sales records generated in these three cities, in which case the Set filter condition index takes up less disk space because only the City field values are indexed in Beijing, Shanghai and Guangzhou. These records are only part of the entire Sales data table.
Indexing can be used to improve the performance of the database, first of all, it is necessary to rebuild the index only when the record changes. For example, a sales record in Beijing needs to be adjusted, and the index is updated after the update operation, as is the case with other indexes. However, if the sales record in Xi ' an has changed, no matter how many records are added or deleted, we have established a filter condition index that does not require any action because only sales records in Beijing, Shanghai and Guangzhou branch offices affect the index. The other advantage of setting the filter criteria index is that it can reduce disk read and write operations, such as we want to query all sales records of Beijing branch, then use the Set filter criteria just established to reduce many unnecessary disk operations than regular nonclustered indexes.
In order to verify the performance advantages of setting the filter criteria index, we conducted a comparative test.
First, we install the Windows Server 2008 R2 and the SQL Server 2008 R2 Chinese version in the VirtualBox virtual machine, and by the way, we install a trial version that can be tested for 180 days and can be downloaded directly from Microsoft's official website. And now the trial version does not need to apply for serial number, in the installation process can be directly selected to install 180-day trial, you can directly install, the experiment and learning has brought a lot of convenience.
We created a 5 million-record sales datasheet in the database, of course, the sales amount is randomly generated, and the City field, we randomly produce 1 to 9 of these 9 different numbers, and then according to the need to replace them in different cities, in this experiment, we put Beijing, The total proportion of sales recorded in Shanghai and Guangzhou is set at 67%.
Next, we will replicate the virtual machines completely so that we can get two identical operating systems and databases that already contain the datasheet we just created, and the process can refer to VirtualBox's technical documentation. The purpose of replicating an entire virtual machine is to ensure that the hardware and operating systems have minimal impact on database performance, so that we focus on the performance of the database performance under different indexing modes.
Next, we create the complete nonclustered index of the City field in the first virtual machine, the code is as follows:
CREATE nonclustered INDEX filteredcities on Sales (city)
In the second virtual machine, we create a set filter criteria index, the code is as follows
CREATE nonclustered INDEX filteredcities on Sales [city] where city in (' Beijing ', ' Shanghai ', ' Guangzhou ')
We then calculated the total sales amount for the three cities of Beijing, Shanghai and Guangzhou in the database of two virtual machines, the following code
SELECT SUM (Value) from Sales WHERE city = ' Beijing ' or city = ' Shanghai ' or city = ' Guangzhou '
In the case of using the complete nonclustered index, we spent 27 seconds and using the Set filter condition index, we only need 14 seconds to get the results, we can see the nonclustered index in large-scale data calculation, the performance of the upgrade is very considerable, we intercepted the screen as follows, for your reference:
When choosing a filter, we need to consider what data will change over time, for example, whether the newly added records are added to the middle or end of the index. Does the index value need to be deleted when the record is deleted? The answers to these questions will affect our design of the index.
Here, we need to use fill factor (fill Factor), the fill factor is a value in percent, when the index is rebuilt, the fill factor determines the percentage of space to fill the data on each page, so that some of the remaining space as the future expansion of the index of free space, The following code shows how to set the fill factor to 80 and set the fill factor only if the advanced option is turned on:
Use databsename; go sp_configure ' show advanced options ', 1; go reconfigure; go sp _configure ' Fill factor ', 80; go
If the value of the fill factor is 100, the index page is populated. We generally consider setting the fill factor to a value between 50 and 80 to ensure that the new value is added without a page split. If you often need to add field values at the end of the index, consider setting the fill factor to a value between 90 and 100. Ideally, a minimum number of page splits and index rebuilds are guaranteed at the same time