SQL Server sets filter conditions to improve index Efficiency

Source: Internet
Author: User

BKJIA exclusive Article] low carbon index: here we use Intel Xeon X7500 processor TDP as the standard for computing power consumption TDP = 130 W/h = 2.167 W/m = 0.036 W/s) to facilitate computing and visualization ). In addition, according to data from the China Forestry Administration, a tree absorbs 5.023 kg of carbon dioxide a day and generates 0.785 kg of carbon dioxide per power.

If the execution time of the database is reduced from 27 seconds to 14 seconds after optimization, that is, the energy consumption is 47.8% less per unit time. In a day, we will reduce the energy consumption by 167kg kW, which is about 0.232 carbon dioxide emissions. Our calculation is to reduce the carbon dioxide absorption by trees a day.

BKJIA database channel recommends "database performance optimization and debugging" and "SQL Server 2008/2005 full solution" topics for you to better understand this article.

Set filter conditions to improve index Efficiency

Excellent indexes are the key to SQL Server database performance. However, efficient indexes are carefully designed. As we all know, a primary key is the unique identifier of a data object. If there is no clustered index in the data table, in order to maintain the uniqueness of the primary key,By default, the SQL Server database creates a Clustered index clustered index for the primary key, unless you specify to create an index as a Non-Clustered index Non-clustered index ).

Undoubtedly, we should create a clustered index for frequently accessed data. Of course, frequently accessed fields should be carefully analyzed and selected, and the index value should be as short as possible. When it comes to index creation, you often think of the primary key first, but the data of the primary key is not necessarily frequently accessed. In many cases, to ensure the uniqueness of the primary key, the value of the primary key is often not very short, for example, we often choose a globally unique identifier GUID) as the Data Type of the primary key. The length of a unique identifier is generally 16 bytes, this data type is not the ideal clustering index option. In this case, you can create a non-clustering index for the primary key, because it is very efficient to query specific records in the WHERE statement, creating a non-clustered index can further improve the query efficiency. If you select an integer as the Data Type of the primary key, you can consider generating a clustered index for the primary key.

SQL Server 2008 provides us with another index-set the filter Condition index Filtered index). A set filter condition index is a special non-clustered index, it is a specific subset of some fields. In other words, the set filter Condition Index is generated based on some selected fields. For example, in the sales performance data table, data in the City where the branch is located is stored in the City field. If we create a non-clustered index, will be included in the index. However, if we use a set filter condition index, we can select only a portion of the cities to be indexed, such as Beijing, Shanghai, and Guangzhou. The Code is as follows:

 
 
  1. CREATENONCLUSTEREDINDEXFilteredCitiesONSales (City)
  2. WHERECity IN ('beijing', 'shanghai', 'shanghai ')

The difference from conventional indexes is that we use the WHERE statement to set our filter conditions. Assuming that most of the company's revenue comes from these three cities, our database queries will frequently access the sales records generated in these three cities. In this case, setting the filter Condition Index will occupy less disk space, because only records with the City Field Values in Beijing, Shanghai, and Guangzhou will be indexed. These records are only part of the entire sales data table.

You can use the configured filter Condition Index to improve the database performance. First, you must re-create the index only when the index is changed to the record. For example, if a sales record in Beijing needs to be adjusted, the index will be updated after the update operation, which is the same as other indexes. However, if the sales record changes in Xi'an, no matter how many records are added or deleted, the previously created index for filtering conditions does not require any operation, this index is affected only by the sales records in Beijing, Shanghai, and Guangzhou. Another advantage of setting a filter Condition Index is that it can reduce disk read/write operations. For example, we want to query the sales records of all Beijing branches, therefore, using the configured filter Condition Index is much less necessary than the conventional non-clustered index.

In order to verify the performance advantages brought about by setting the filter condition index, we conducted a comparative test.

First, we have installed Windows Server 2008 R2 and SQL Server 2008 R2 in the VirtualBox Virtual Machine. By the way, we can try the 180-day trial version, you can download it directly on the Microsoft official website, and you do not need to apply for a serial number for the trial version. During the installation process, you can directly install the 180-day trial, this experiment and study have brought a lot of convenience.

We created a 5 million-record sales data table in the database. Of course, the sales amount is generated randomly, and the city field, we randomly generate 9 numbers from 1 to 9, then replace them with different cities as needed. In this experiment, we set the total proportion of sales records in Beijing, Shanghai and Guangzhou to 67%.

Click to view the clear chart

Next, we will perform a complete copy of the virtual machine, so that we can get two identical operating systems and databases. The database already contains the data table we just created, for detailed procedures, refer to the VirtualBox technical documentation. The purpose of copying the entire virtual machine is to minimize the impact of hardware and operating systems on database performance, so that we can focus on the performance of the database in different indexing methods.

Next, create a complete non-clustered index for the city field in the first virtual machine. The Code is as follows:

 
 
  1. CREATE NONCLUSTERED INDEX FilteredCities ON Sales(City) 

In the second virtual machine, we create a set filter condition index. The Code is as follows:

 
 
  1. CREATENONCLUSTEREDINDEXFilteredCitiesONSales (City)
  2. WHERECity IN ('beijing', 'shanghai', 'shanghai ')

Click to view the clear chart

Then, we calculate the total sales amount of the three cities, Beijing, Shanghai and Guangzhou, in the database of two virtual machines. The Code is as follows:

 
 
  1. SELECTSUM (Value)FROMSales
  2. WHERECity = 'beijing' or City = 'shanghai'

It took 27 seconds to use a complete non-clustered index, and 14 seconds to get the calculation result when we used a set filter Condition Index, it can be seen that the performance of non-clustered indexes is greatly improved in the case of large-scale data computing. The screen we captured is as follows for your reference:

Before optimization: Click to view the clear picture

After optimization: Click to view the clear picture

When selecting a filter condition, we need to consider which data will change over time. For example, is the newly added record added to the middle or the end of the index? When a record is deleted, does the index value need to be deleted? The answers to these questions will affect our index design.

Here, we need to useFill Factor), Fill Factor is a value expressed in percentageWhen re-indexing, the value of the fill factor determines the space percentage of data to be filled on each page, so that the remaining space can be reserved as the available space for expanding the index in the future, the following code shows how to set the fill factor to 80. the fill factor can be set only when the advanced option is enabled:

 
 
  1. Use DatabseName;  
  2. GO  
  3. sp_configure 'show advanced options', 1;  
  4. GO  
  5. RECONFIGURE;  
  6. GO  
  7. sp_configure 'fill factor', 80;  
  8. GO 

If the value of the fill factor is 100, the index page will be fully filled. We generally consider setting the fill factor to a value between 50 and 80 to ensure that no page splitting will occur when a new value is added. If you often need to add a field value at the end of the index, you can set the fill factor to a value between 90 and 100. The ideal state is to ensure a minimum number of page splits and index rebuilding at the same time.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.