Why create an index? This is because creating an index can greatly improve the system performance. First, you can create a unique index to ensure the uniqueness of each row of data in the database table. Second, it can greatly speed up data retrieval, which is also the main reason for creating an index. Third, it can accelerate the connection between tables, especially in achieving Data Reference integrity. Fourth, when you use grouping and sorting clauses to retrieve data, you can also significantly reduce the time for grouping and sorting in queries. Fifth, by using indexes, you can use the optimizer during the query process to improve system performance.
Some may ask: why not create an index for each column in the table because increasing Indexes has so many advantages? Although such an idea has its own rationality, it also has its own one-sidedness. Although indexes have many advantages, it is unwise to add indexes to every column in the table. This is because adding indexes also has many disadvantages. First, it takes time to create and maintain indexes. This time increases with the increase of data volume. Second, indexes occupy physical space. In addition to data tables, each index occupies a certain amount of physical space. To create a clustered index, the required space is larger. Third, when adding, deleting, and modifying data in the table, the index must also be dynamically maintained, which reduces the Data Maintenance speed.
Indexes are created on certain columns in the database table. Therefore, when creating an index, you should carefully consider which columns can create an index and which Columns cannot create an index. In general, you should create indexes on these columns. For example, you can speed up the search for columns that frequently need to be searched, force the uniqueness of the column and the data arrangement structure in the organization table. These columns are usually used in connected columns. These columns are mainly foreign keys, which can speed up the connection; create an index on a column that often needs to be searched by range. Because the index has been sorted, the specified range is continuous. Create an index on a column that frequently needs to be sorted because the index has been sorted, in this way, the sorting of indexes can be used to speed up the sorting query time. indexes are often created on the columns in the WHERE clause to accelerate the condition judgment speed.
Similarly, indexes should not be created for some columns. In general, these columns that should not be indexed have the following characteristics: first, indexes should not be created for those columns that are rarely used in queries or referenced. This is because, since these columns are rarely used, there is an index or no index, and the query speed cannot be improved. On the contrary, the addition of indexes reduces the system maintenance speed and space requirements. Second, indexes should not be added for those columns with few data values. This is because these columns have very few values, such as gender columns in the personnel table. In the query results, the data rows in the result set account for a large proportion of the data rows in the table, that is, the proportion of data rows to be searched in the table is large. Adding indexes does not significantly accelerate the search speed. Third, indexes should not be added for columns defined as text, image, and BIT data types. This is because the data volume of these columns is either large or small. Fourth, when the modification performance is much higher than the retrieval performance, you should not create a cable reference. This is because the modification performance and retrieval performance are inconsistent. When an index is added, the search performance is improved, but the modification performance is reduced. When the index is reduced, the modification performance is improved and the retrieval performance is reduced. Therefore, when the modification performance is much higher than the retrieval performance, you should not create an index.
Index creation methods and features
How to create an index
There are multiple ways to create an index. These methods include directly creating an index and indirectly creating an index. Directly create an index. For example, you can use the create index statement or the create index Wizard to indirectly create an index. For example, you can also create an index when defining a primary key constraint or a unique key constraint in a table. Although both methods can create indexes, the specific content of the indexes they create is different.
Using the create index statement or using the index creation Wizard to create an index is the most basic method for creating an index. This method is the most flexible and can be customized to create an index that meets your needs. When using this method to create an index, you can use many options, such as specifying the page fullness, sorting, and sorting statistics, to optimize the index. Using this method, you can specify the index type, uniqueness, and composite. That is to say, you can create a clustered index or a non-clustered index. You can create an index on a column, you can also create an index on two or more columns.
You can also create indexes indirectly by defining primary key constraints or uniqueness key constraints. A primary key constraint is a logic that maintains data integrity. It limits that records in a table have the same primary key record. When you create a primary key constraint, the system automatically creates a unique clustered index. Although, logically, the primary key constraint is an important structure, in terms of physical structure, the structure corresponding to the primary key constraint is a unique clustered index. In other words, in physical implementation, there is no primary key constraint, but only a unique clustered index. Similarly, an index is also created when a unique key constraint is created. This index is a unique non-clustered index. Therefore, when using constraints to create an index, the index type and features are basically determined, and there is little room for customization.
When you define a primary key or unique key constraint on a table, if the table already has a standard index created using the create index statement, then, the index created by the primary key constraint or the unique key constraint overwrites the previously created standard index. That is to say, the primary key constraint or the unique key constraint takes precedence over the index created using the create index statement.
Index features
An index has two features: a unique index and a composite index.
The unique index ensures that all data in the index column is unique and does not contain redundant data. If a table already has a primary key constraint or a unique key constraint, SQL Server automatically creates a unique index when creating or modifying a table. However, if uniqueness must be ensured, a primary key constraint or a unique key constraint should be created instead of a unique index. When creating a unique index, you should carefully consider these rules: when creating a primary key constraint or a unique key constraint in a table, SQL Server automatically creates a unique index. If the table already contains data, when an index is created, SQL Server checks the redundancy of existing data in the SQL Server checklist. Whenever you use an insert statement to insert data or use a modify statement to modify data, SQL Server checks data redundancy: if there is a redundant value, SQL Server cancels the execution of the statement and returns an error message. Make sure that each row of data in the table has a unique value, this ensures that each object can be uniquely identified. You can only create a unique index on a column that guarantees the integrity of the object. For example, you cannot create a unique index on the name column in the personnel table, because people can have the same name.
A composite index is an index created in two or more columns. When you search for two or more columns as a key value, it is best to create a composite index on these columns. When creating a composite index, consider these rules: You can combine up to 16 columns into a separate composite index. The total length of a composite index Column cannot exceed 900 bytes, that is to say, the composite Column Length cannot be too long. In composite indexes, all columns must come from the same table and cannot create Composite Columns across tables. In composite indexes, the order of columns is very important. Therefore, we must carefully sort the order of columns. In principle, we should first define the most unique column, for example, in (col1, col2) the index on is different from the index on (col2, col1) because the order of the two index columns is different. To enable the query optimizer to use a composite index, the where clause in the query clause must refer to the first column in the composite index. When multiple key columns exist in the table, the composite index is very useful. The composite index can improve the query performance, reduce the number of indexes created in a table.
Index type
You can divide an index into two types based on the order of the index and the physical order of the data table. One is the clustered index with the same physical order and index order as the data table, and the other is the non-clustered index with different physical order and index order of the data table.
Architecture of clustered Index
The index structure is similar to the tree structure. The top of the tree is called the leaf level. The rest of the tree is called the non-leaf level, and the root of the tree is in the non-leaf level. Similarly, in clustering indexes, the leaf-level and non-leaf-level of clustering indexes constitute a tree structure, and the lowest level of the index is the leaf-level. In a clustered index, the data page of the table data is at the leaf level, the index page on the leaf level is at the non-leaf level, and the index page on the index data is at the non-leaf level. In clustering indexes, data values are always sorted in ascending order.
Create a clustered index for frequently searched columns in the table or columns accessed in sequence. When creating a clustered index, consider these factors: Each table can have only one clustered index, because the physical order of the data in the table can only be one; the physical order of the row in the table is the same as that of the row in the index. You can create a clustered index before creating any non-clustered index, this is because the clustering index changes the physical order of the rows in the table. The data rows are arranged in a certain order and maintained automatically. The uniqueness of key values is either explicitly maintained using the unique keyword, either it is explicitly maintained by an internal unique identifier. These unique identifiers are used by the system and cannot be accessed by users. The average size of the clustered index is about 5% of the data table size. However, the actual size of the clustered index often varies according to the size of the index column. During the index creation process, SQL Server temporarily uses the disk space of the current database, when creating a clustered index, it requires 1.2 times the size of the tablespace. Therefore, make sure you have enough space to create a clustered index.
When the system accesses the data in the table, first determine whether there is an index in the corresponding column and whether the index is meaningful to the data to be retrieved. If the index exists and makes sense, the system uses the index to access records in the table. The system browses data from the index, and the index browses starts from the root of the tree index. From the root, compare the search value with each key value to determine whether the search value is greater than or equal to the key value. This step repeats until a key value greater than the search value is met, or the search value is greater than or equal to all the key values on the index page.
Architecture of non-clustered Indexes
The structure of the non-clustered index is also a tree structure, which is similar to that of the clustered index, but it is also significantly different.
In non-clustered indexes, leaf-level indexes only contain key values, but not data rows. Non-clustered indexes indicate the logical order of rows. Non-clustered indexes have two architectures: one is to create non-clustered indexes on tables without clustered indexes, another architecture is to create non-clustered indexes on tables with clustered indexes.
If a data table does not have a clustered index, this data table is also called a data heap. When a non-clustered index is created at the top of the Data heap, the system uses the row identifier on the index page to point to the record on the data page. The row identifier stores information about the data location. The data heap is maintained by using the index distribution graph (IAM) page. The Iam page contains the storage information of the cluster where the data heap is located. In the system table sysindexes, a pointer points to the first Iam page related to the data heap. The system uses the iam page to browse and find the space where new record rows can be inserted in the Data heap. These data pages and records on these data pages are not in any order and are not linked together. The only connection between these data pages is the order recorded in IAM. When a non-clustered index is created on the data stack, the leaf level contains the row identifier pointing to the data page. The row identifier specifies the logical sequence of the record rows. It consists of the file ID, page number, and row ID. The identifiers of these rows must be unique. The order of leaf-level pages of non-clustered indexes is different from the physical order of table data. These key values are maintained in ascending order at the leaf level.
When a non-clustered index is created on a table with a clustered index, the system uses the clustering key pointing to the clustered index on the index page. The cluster key stores the data location information. If a table has a clustering index, the leaf level of the non-clustering index contains the clustering key value mapped to the clustering key, rather than the physical row identifier. When the system accesses data in a table with a non-clustered index, and the non-clustered index is created on the clustered index, then it first finds the pointer to the clustered index from the non-clustered index, and then finds the data by using the clustered index.
When you need to retrieve data in multiple ways, non-clustered indexes are very useful. When creating a non-clustered index, consider these situations: by default, the created index is not a clustered index; on each table, you can create no more than 249 non-clustered indexes, but a maximum of one clustered index can be created.
How does the system access table data?
Generally, you can use two methods to access data in a database: Table scan and index search. The first method is table scanning, which means that the system places the pointer on the data page where the table's header data is located, and then sorts the data pages according to the order, scan all the data pages occupied by the table from the front to back one page until all the records in the table are scanned. During scanning, if a record that meets the query conditions is found, this record is selected. Finally, all records that meet the query statement conditions are selected and displayed. The second method is to use index search. An index is a tree structure that stores keywords and pointers to data pages that contain records of the key words. When an index is used for search, the system finds records that meet the query conditions based on the keywords and pointers in the index along the tree structure of the index. Finally, all the records found that meet the query statement conditions are displayed.
In SQL Server, when accessing data in the database, SQL Server determines whether an index exists in the table. If no index exists, SQL server uses the table scan method to access data in the database. The query processor generates an optimization execution plan for the query statement based on the statistical information of the distribution to improve data access efficiency. determine whether to use table scanning or indexes.
Index options
You can specify some options when creating an index. You can use these options to optimize the index performance. These options include the fillfactor option, pad_index option, and sorted_data_reorg option.
The fillfactor option can optimize the performance of insert statements and modify statements. When an index page becomes full, SQL Server must take time to break down the page to free up space for new record rows. The fillfactor option is used to allocate a certain percentage of free space on the leaf index page to reduce the page Decomposition Time. When creating an index for a table with data, you can use the fillfactor option to specify the percentage of data filled in each leaf-level index node. The default value is 0, which is equivalent to 100. When creating an index, the internal index node always leaves a certain amount of space, which is sufficient to accommodate records in one or two tables. In a table without data, do not use this option when creating an index, because this option has no practical significance. In addition, the value of this option cannot be dynamically maintained after it is specified during creation. Therefore, it should only be used when an index is created in a table with data.
The pad_index option also applies the value of the fillfactor option to the internal index node, so that the internal index node fills in the same degree as the node of the leaf index. If the fillfactor option is not specified, it is meaningless to specify the pad_index option separately, because the value of the pad_index option is determined by the value of the fillfactor option.
When you create a clustered index, the sorted_data_reorg option clears the sorting, which reduces the time required to create a clustered index. When you create or recreate a clustered index on a table that has been broken into blocks, you can use the sorted_data_reorg option to compress the data page. This option is also used when you need to apply the fill level on the index again. When using the sorted_data_reorg option, consider these factors: SQL Server determines whether each key value is higher than the previous key value. If not, the index cannot be created; SQL Server requires 1.2 times of tablespaces to physically reorganize data. With the sorted_data_reorg option, the index creation process is accelerated by clearing the sorting process and data is physically copied from the table; when a row is deleted, the space occupied by the row can be reused. All non-clustered indexes can be created. If you want to fill the leaf-level page with a certain percentage, you can use both the fillfactor option and sorted_data_reorg option.
Index Maintenance
To maintain system performance, indexes must be maintained after they are created, because index pages are broken due to frequent operations such as adding, deleting, and modifying data.
You can use the DBCC showcontig statement to display the table data and index fragmentation information. When the DBCC showcontig statement is executed, SQL Server browses the entire index page at the leaf level to determine whether the table or specified index is severely broken. The DBCC showcontig statement can also determine whether the data page and index page are full. When a table is modified or a large amount of data is added, or the Table query is very slow, DBCC showcontig statements should be executed on these tables. When running the DBCC showcontig statement, consider these factors: when running the DBCC showcontig statement, SQL server requires that the table ID or index Id be specified, the table ID or index ID can be obtained from the system table sysindexes. You should determine how long the DBCC showcontig statement will be used. This length of time depends on the table's activity, daily, weekly, or monthly.
Use the DBCC dbreindex statement to recreate one or more indexes of a table. Execute the DBCC dbreindex statement when you want to recreate the index and when the table has a primary key constraint or a unique key constraint. In addition, you can execute the DBCC dbreindex statement to re-organize the storage space of the leaf index page, delete chunk blocks, and re-calculate index statistics. When running the DBCC dbreindex statement, consider these factors: the system refills each leaf-level page according to the specified fill degree; use the DBCC dbreindex statement to re-create an index with the primary key constraint or unique key constraint. Use the sorted_data_reorg option to create a clustered index faster. If key values are not sorted, you cannot use the DBCC dbreindex statement; DBCC dbreindex statements do not support system tables. In addition, you can use the Database Maintenance Planning Wizard to automatically rebuild the index process.
Statistics are samples of column data stored in SQL Server. These data are generally used for index columns, but statistics can also be created for non-index columns. SQL Server maintains the distribution statistics of key values of an index and uses these statistics to determine which index is useful in the query process. Query Optimization depends on the distribution accuracy of these statistics. The query optimizer uses these data samples to determine whether to use table scans or indexes. When the data in the table changes, SQL Server automatically modifies the statistics periodically. Index statistics are automatically modified, and key values in the index change significantly. The frequency of Statistics modification is determined by the amount of data in the index and the amount of data changes. For example, if the table contains 10000 rows of data and 1000 rows of data are modified, the statistical information may need to be modified. However, if there are only 50 rows of records modified, the current statistics will remain unchanged. In addition to automatic system modification, you can also manually modify the statistical information by executing the update statistics statement or sp_updatestats system stored procedure. You can use the update statistics statement to modify both all indexes in the table and the specified indexes.
You can use the showplan and Statistics Io statements to analyze the index and query performance. You can use these statements to better adjust queries and indexes. The showplan statement shows each step of the query optimizer used in the connection table and the index used to access data. You can use the showplan statement to view the query plan of a specified query. When using the showplan statement, consider these factors. The output results returned by the set showplan_all statement are more detailed than those returned by the set showplan_text statement. However, the application must be able to process the output results returned by the set showplan_all statement. The information generated by the showplan statement can only be used for one session. If you re-connect to SQL Server, you must re-execute the showplan statement. The statistics Io statement indicates the number of input and output, which is used to return the result set and display the logical and physical I/O information of the specified query. You can use this information to determine whether to rewrite the query statement or redesign the index. You can use the statistics Io statement to view the I/O information used to process the specified query.
Just like the showplan statement, the optimizer hide is also used to adjust the query performance. Optimizer hiding can provide minor improvements to the query performance. If the index policy changes, this optimizer hiding will be useless. Therefore, the optimizer is limited to hide because it is more efficient and flexible. When using the optimizer to hide, consider these rules: specifying the index name, using Table scan when index_id is 0, and using clustered index when index_id is 1; the optimizer hides and overwrites the query optimizer. If data or environment changes, you must modify the optimizer to hide it.
Index adjustment wizard
The index adjustment wizard is a tool that analyzes a series of database query statements, provides recommendations for using a series of database indexes, and optimizes the performance of the entire query statement. You must specify the following content for the query statement:
Query Statement, which is the workload to be optimized
Databases that contain these tables can create indexes in these tables to improve query performance.
Tables Used in Analysis
Constraints to be considered during analysis, such as the maximum disk space that an index can use
The workload here can come from two aspects: the trajectory captured by SQL Server and the files containing SQL statements. The index adjustment wizard is always based on a defined workload. If a workload does not reflect normal operations, the recommended index is not the best performance index in actual workload. The index adjustment wizard calls the query analyzer to evaluate the performance of each query statement in this workload using all possible combinations. Then, we recommend that you increase the index of the performance of the entire query statement during the entire workload. If there is no workload for the index adjustment Wizard to analyze, you can use the annotator to create it immediately. Once you decide to track a description of a normal database activity, the wizard can analyze the workload and recommend the index configuration that can improve the database performance.
After the index adjustment wizard analyzes the workload, you can view a series of reports and make the wizard immediately create the recommended optimal index, you can also create a job that can be scheduled, or generate a file containing the SQL statements for creating these indexes.
The index adjustment wizard allows you to select and create an ideal index combination and statistics for the SQL Server database, instead of understanding the database structure, workload, or internal SQL Server. In short, the index adjustment wizard can do the following:
You can use the query optimizer to analyze query tasks in the workload and recommend an Optimal Index Hybrid Method for databases with a large workload.
Analyze the effects after the changes are made according to the recommendations, including the usage of indexes, the distribution of queries between tables, and the work effects of a large number of queries at work.
Recommended methods for adjusting databases for a small number of query tasks
You can customize the recommendation method by setting advanced options such as disk space constraints, the maximum number of query statements, and the maximum number of columns for each index.
Graphical Device
The annotator can capture consecutive images running on the server in real time, and can select projects and events to be monitored, includes Transact-SQL statements and batch commands, object usage, locking, security events, and errors. The annotator can filter these events and only display users' concerns. You can use the same server or other servers to repeat recorded trace events and re-execute commands that have been recorded. By handling these events in a centralized manner, you can easily monitor and debug problems in SQL Server. By studying specific events, it is much easier to monitor and debug SQL server problems.
Query Processor
The query processor is a versatile tool that can accomplish a lot of work. In the query processor, You can interactively enter and execute various Transact-SQL statements. In a window, you can view both the transact-SQL statements and their result sets; you can execute multiple Transact-SQL statements in the query processor at the same time, or some statements in the script file. A graphical method for analyzing the execution plan of query statements is provided, you can report the data retrieval method selected by the query processor, adjust the execution of the query statement according to the query plan, and propose the optimization and cited suggestions that can improve the performance, this suggestion is only for the index of a query statement. It can only improve the query performance of this query statement.
The system creates a distribution page for each index. Statistical information refers to the distribution of key values of one or more indexes stored in a table on the distribution page. When a query statement is executed, the system can use the distribution information to determine which index of the table is used to improve the query speed and performance. The query processor generates the execution plan of the query statement based on the statistical information of these distributions. The optimization degree of the execution plan depends on the accuracy of the distribution statistics. If the statistical information of these distributions is very consistent with the physical information of the index, the query processor can generate a highly optimized execution plan. On the contrary, if the statistical information differs greatly from the actual storage information of the index, the execution plan generated by the query processor is relatively low.
The query processor extracts the distribution information of index keywords from the statistical information. In addition to manual update statistics, the query processor can also automatically collect statistics on the distribution information. In this way, the query processor can use the latest statistics to ensure that the execution plan has a high degree of optimization, reducing the need for maintenance. Of course, there are also some restrictions on the execution plan generated by the query processor. For example, the execution plan can only improve the performance of a single query statement, but may have a positive or paying impact on the performance of the entire system. Therefore, to improve the query performance of the entire system, you should use tools like the index adjustment wizard.
Conclusion
In earlier versions of SQL Server, a maximum of one index can be used in a query statement. In SQL Server 7.0, index operations are enhanced. SQL Server now uses the index insertion and index combination algorithms to implement Multiple indexes in a query statement. The shared row identifier is used to connect two indexes on the same table. If a table has a clustering index and therefore a clustering key, all the leaf nodes of the table that are not clustered indexes use this clustering key as the row positioner, instead of using a physical record identifier. If the table does not have a clustered index, the non-clustered index continues to use the physical record identifier to point to the data page. In the above two cases, the row positioner is very stable. When the leaf nodes of the clustered index are separated, the non-clustered index does not need to be modified because the row locator is valid. If the table does not have a clustered index, page separation does not occur. In earlier versions, non-clustered indexes used physical record identifiers such as page numbers and row numbers as row locators. For example, if a clustered index (data page) is broken down, many record rows are moved to a new data page, so there are multiple physical record identifiers. Then, all non-clustered indexes must use these new TSL record identifiers for modification, which requires a lot of time and resources.
The index adjustment wizard is a good tool for both skilled users and new users. Skilled users can use this wizard to create a basic index configuration, and then adjust and customize the basic index configuration. New users can use this wizard to quickly create optimized indexes.