ArticleDirectory
- Create highly selective Indexes
- Create multi-column Indexes
- Avoid creating indexes for small tables
- Use indexes with filter clauses
- Create an index for the order-by/group-by/distinct column to shorten the response time
- Use Outer Join for restrictions
Optimize Indexes
Creating useful indexes is one of the most important ways to improve query performance. Useful indexes can help reduce the disk I/O operations and system resources required for data search.
To create a useful index, you must know the following: data usage, query type and running frequency, and how the query processor uses the index to quickly search for data.
When selecting the index to be created, check your key queries. The performance of these queries will greatly affect the user experience. You need to create indexes dedicated to these queries. After adding an index, run the query again to check whether the query performance has been improved. If not, the index is deleted.
Like most performance optimization methods, this method also has some disadvantages. For example, when multiple indexes are used, the SELECT query may run faster. However, the speed of DML (insert, update, and delete) operations will be significantly slowed down because more indexes must be maintained for each operation. Therefore, if your query mainly contains select statements, it is very helpful to use more indexes. If your applicationProgramTo perform many DML operations, it is necessary to control the number of created indexes.
SQL Server compact edition supports display plans, which can help evaluate and optimize queries. In addition to SQL Server compact edition, SQL Server compact Edition uses the same display plan architecture as SQL Server 2005. For more information, see Microsoft display plan architecture in a http://schemas.microsoft.com/sqlserver/2004/07/showplan.
The following sections describe other information about creating useful indexes.
Create highly selective Indexes
Creating indexes for columns used in the where clause of key queries usually improves performance. However, this depends on the selection of indexes. Selectivity refers to the ratio of the number of qualified rows to the total number of rows. If the ratio is low, the index is highly selective. It can filter out most rows, greatly reducing the size of the result set. Therefore, such an index is a useful index to be created. In comparison, low-selective indexes are less useful.
Unique indexes have the highest selectivity. Only one row can be matched, which is most useful for queries that only wish to return one row. For example, the index of the unique ID column can help you quickly find specific rows.
You can evaluate the selectivity of indexes by running the sp_show_statistics stored procedure on the SQL Server compact edition table. For example, to evaluate the selectivity of two columns ("Customer ID" and "ship via"), you can run the following stored procedure:
Sp_show_statistics_steps 'Orders ', 'customer id ';
Range_hi_key range_rows eq_rows distinct_range_rows
------------------------------------------------------------
Alfki 0 7 0
Anasd 0 4 0
Anton 0 13 0
Arout 0 14 0
Bergs 0 23 0
Blaus 0 8 0
Blonp 0 14 0
Bolid 0 7 0
Bonap 0 19 0
Bottm 0 20 0
Bsbev 0 12 0
Cactu 0 6 0
CENTC 0 3 0
Chops 0 12 0
Commi 0 5 0
Consh 0 4 0
Dracd 0 9 0
Dumon 0 8 0
Eastc 0 13 0
Ernsh 0 33 0
(90 rows affected)
And
Sp_show_statistics_steps 'Orders ', 'referen2010 ';
Range_hi_key range_rows eq_rows distinct_range_rows
------------------------------------------------------------
1 0 320 0
2 0 425 0
3 0 333 0
(3 rows affected)
The results show that the "Customer ID" column has a lower degree of repetition. This means that the index selectivity is higher than that of the "ship via" column.
For more information about using these stored procedures, see sp_show_statistics (SQL Server compact edition), sp_show_statistics_steps (SQL Server compact edition), and sp_show_statistics_columns (SQL Server compact edition ).
Create multi-column Indexes
Multi-column index is a natural extension of a Single-Column index. For filtering expressions that match a specified set of key columns in advance, multi-column indexes are very useful. For example, composite indexCreate index idx_emp_name on employees ("last name" ASC, "first name" ASC)This helps calculate the following queries:
- ... Where "last name" = 'doe'
- ... Where "last name" = 'doe 'and "first name" = 'john'
- ... Where "first name" = 'john' and "last name" = 'doe'
However, it is useless for the following query:
- ... Where "first name" = 'john'
When creating multi-column queries, you should place the columns with the highest selectivity on the leftmost side of the key. In this way, when matching multiple expressions, you can make the index more selective.
Avoid creating indexes for small tables
A small table is a table with only one or several data pages. Because table scanning is usually more efficient, you should avoid creating indexes for very small tables. This reduces the overhead of loading and processing index pages. If you do not create an index for a very small table, you can avoid selecting this index by the optimizer.
SQL Server compact edition stores data on a 4 kb page. The following formula can be used to calculate the approximate number of pages, although the actual number of pages may be slightly larger due to storage engine overhead.
<Total column size (in bytes)> * <number of rows>
<Page number> = -----------------------------------------------------------------
4096
For example, assume that the architecture of a table is as follows:
Column name |
Type (size) |
Order ID |
INTEGER (4 bytes) |
Product ID |
INTEGER (4 bytes) |
Unit Price |
Money (8 bytes) |
Quantity |
Smallint (2 bytes) |
Discount |
Real (4 bytes) |
The table contains 2820 rows. According to the formula, it takes about 16 pages to store the table data:
<Page number> = (4 + 4 + 8 + 2 + 4) * 2820)/4096 = 15.15 page
Select the object to create the index
We recommend that you always create an index for the primary key. In addition, foreign key index creation is also very useful. This is because the primary key and foreign key are usually used to join the table. After creating indexes for these keys, the optimizer can use more efficient index connections.Algorithm. If you use other columns to join a table in your query, it is usually useful to create indexes for these columns for the same reason.
After you create a primary key and a foreign key constraint, SQL Server compact edition automatically creates an index for it and uses it to optimize the query. Remember to use a smaller primary key and a foreign key. In this way, the connection runs faster.
Use indexes with filter clauses
Indexes can be used to increase the computing speed of certain types of filter clauses. Although all filter clauses reduce the final result set of the query, some filter clauses can also help reduce the amount of data to be scanned.
The search parameter (Sarg) can specify the exact match, the range of values, or the connection between two or more items connected by and, thus limiting the search range. Search parameters are in one of the following formats:
- Column operator <constant or variable>
- <Constant or variable> operator Column
The Sarg operators include =,>, <,> =, <=, In, and between, and sometimes like (like 'John % 'For prefix matching '). Sarg can include multiple conditions connected by and. Sarg can also be a query that matches a specific value, for example:
- "Customer ID" = 'anton'
- 'Doe '= "last name"
Sarg can also be a query that matches a certain range of values, for example:
- "Order Date"> '2014/1/123'
- "Customer ID"> 'abcde' and "Customer ID" <'edcba'
- "Customer ID" in ('anton ', 'arout ')
Expressions that do not use the Sarg operator cannot improve performance because the SQL Server compact edition query processor must evaluate all rows to determine whether the filter clause is satisfied. Therefore, indexes are useless for expressions that do not use the Sarg operator. Non-Sarg operators include not, <>, not exists, not in, not like, and internal functions.
Use the Query Optimizer
When determining the access method of the base table, the SQL Server compact edition optimizer determines whether the index of the Sarg clause exists. If an index exists, the optimizer evaluates the index by calculating the number of rows returned. The optimizer then estimates the overhead of using indexes to search for qualified rows. If the index overhead is lower than the table scan overhead, the optimizer selects index access. If the first or pre-specified column of the index is used in Sarg and Sarg specifies the lower limit, upper limit, or both of them limit the search range, then the index may be useful.
Understand the response time and total time
The response time is the time used to query the first record returned. The total time is the time used to query and return all records. For interactive applications, the response time is very important because it determines how long the user has to wait before seeing the first returned record to determine that the query is being processed. For batch processing applications, the total time reflects the overall throughput. You must determine the performance standards of the application and query before designing the application.
For example, if the query returns 100 records, the first five records are used to fill a list. In this case, you do not care about the time used to return all 100 records. Instead, you only want to query and quickly return the first few records to fill the list.
You can perform many query operations without storing intermediate results. These operations are called pipeline operations. Pipeline operations are mapped, selected, and joined. You can return results immediately after performing these operations. Other operations (such as sort and group-by) must use all input information to return the results to the parent operation. This is what we call the specific operation. For specific reasons, queries executed through these operations usually have an initial delay. After the initial delay, such queries can usually return records quickly.
Query with response time requirements should not be specific. For example, the response time for executing order-by using an index is shorter than that for sorting. The next section describes this in detail.
Create an index for the order-by/group-by/distinct column to shorten the response time
Order-by, group-by, and distinct operations are all sort operations. The SQL Server compact edition query processor uses two methods for sorting. If records are sorted by indexes, the processor only uses indexes. Otherwise, the processor must first use a temporary worksheet to sort records. On devices with slow CPU and low memory, such pre-sorting will cause significant initial latency. Therefore, when response time is important, pre-sorting should be avoided.
To enable order-by or group-by to process a specific index, the order-by or group-by column must match the pre-specified index column, the order must be identical. For example, IndexCreate index emp_name on employees ("last name" ASC, "first name" ASC)Helps optimize the following queries:
- ... Order by/group by "last name "...
- ... Order by/group by "last name", "first name "...
However, it does not help with the following queries:
- ... Order by/group by "first name "...
- ... Order by/group by "first name", "Last Name "...
To enable the distinct operation to process multiple index columns, the ing list must match all index columns, although the order does not have to be exactly the same. The index above helps optimize the following queries:
- ... Distinct "last name", "first name "...
- ... Distinct "first name", "Last Name "...
However, it does not help with the following queries:
- ... Distinct "first name "...
- ... Distinct "last name "...
Note: |
If your query always returns a unique row, do not specify the distinct keyword because it only adds overhead. |
Rewrite the subquery to use join
If yes, you can rewrite the subquery to use join to improve performance. The benefit of creating a join operation is that you can evaluate a table in a sequence different from the order defined by the query. The benefit of using a subquery is that the subquery expression can be computed without scanning all rows in the subquery. For example, the exists subquery returns true when the first row that meets the condition is found.
Note: |
The SQL Server compact edition query processor always overwrites the in subquery to use join. This method is not required for queries that contain in subquery clauses. |
For example, to determine all orders that contain at least one item with a discount rate greater than or equal to 25%, you can use the following exists subquery:
Select "Order ID" from orders o
Where exists (select "Order ID"
From "Order details" OD
Where o. "Order ID" = OD. "Order ID"
And discount >=0.25)
You can also rewrite it using join:
Select distinct O. "Order ID" from orders o inner join "Order details"
OD on O. "Order ID" = OD. "Order ID" where discount> = 0.25
Use Outer Join for restrictions
The Processing Method for Outer Join and inner join is different: For inner join tables, the optimizer will try to rearrange the join order, but not for Outer Join tables. The External table (the left table in left Outer Join and the right table in right Outer Join) will be accessed first before accessing the internal table. This fixed join sequence may cause the execution plan to fail to reach the optimal level.
Note: |
The SQL Server compact edition query processor assumes that the inner join column contains an index (created by the user or database ). |
Use parameterized Query
If only some constants of a series of queries run by the application, you can use parameterized queries to improve the performance. For example, to return orders according to different customers, you can run the following query:
Select "Customer ID" from orders where "Order ID" =?
By compiling only one query and executing the compiled plan multiple times, parameterized query can provide better performance. In programming, you must always keep the command objects that contain the cache query plan. If the previous command object is damaged and a new command object is created, the cache plan will be damaged. This requires re-compiling the query. If multiple parameterized queries must be run alternately, you can create several command objects, each of which contains a cache execution plan for parameterized queries. In this way, you can effectively avoid re-compiling all these queries.
Query only when necessary
The SQL Server compact edition query processor is a powerful tool for querying data stored in relational databases. However, any query processor may cause a certain amount of internal overhead. Before starting a real execution plan, the query processor must compile, optimize, and generate the execution plan. This is especially true for simple queries that are quickly completed. Therefore, you can execute the query by yourself, which can greatly improve the performance. If every millisecond is important to your key components, we recommend that you consider performing simple queries on your own. For complex large-scale queries, it is best to leave this work to the query processor.
For example, suppose you want to find the customer IDs of a series of orders by order ID. You can perform this operation in two ways. The first method is to perform the following steps for each search:
- Open orders base table
- Use a specific "Order ID" to find rows
- Retrieve "Customer ID"
Alternatively, the following query is sent for each query:
Select "Customer ID" from orders where "Order ID" = <The specific order ID>
Compared with the manual solution, the query-based solution is simpler but slower, because the SQL Server compact edition query processor needs to convert the SQL statement into three identical operations that are performed manually. The three steps are then executed in sequence. The method to choose depends on the simplicity and performance of the application.