Http://hi.baidu.com/hlwang/blog/item/d8ccb67750ba6c18b051b98a.html
Http://hi.baidu.com/hlwang/blog/item/d8ccb67750ba6c18b051b98a.html
SQL Index
I. Index concept
Index is the method to accelerate the retrieval of table data. A database index is similar to a book index. In books, indexes allow users to quickly find the desired information without having to read the complete book. In the database, the index also allows the database program to quickly find the data in the table without scanning the entire database.
Ii. Index features
1. indexing can speed up Database Retrieval
2. Indexes reduce the speed of maintenance tasks such as database insertion, modification, and deletion.
3. indexes are created on tables and cannot be created on views.
4. indexes can be created directly or indirectly.
5. You can use indexes in optimization hiding.
6. Use the query processor to execute SQL statements. Only one index can be used at a time on a table.
7. Others
Iii. Advantages of Indexes
1. Create a unique index to ensure the uniqueness of each row of data in the database table
2. The data retrieval speed is greatly accelerated, which is also the main reason for index creation.
3. Accelerate the connection between tables, especially for Data Reference integrity.
4. When you use grouping and sorting clauses to retrieve data, you can also significantly reduce the time for grouping and sorting in queries.
5. By using indexes, you can use the optimizer during the query process to improve system performance.
Iv. Index disadvantages
1. It takes time to create and maintain indexes. This time increases with the increase of data volume.
2. Indexes occupy physical space. In addition to data tables, each index occupies a certain amount of physical space. To create a clustered index, the required space will be larger.
3. When adding, deleting, and modifying table data, the index must be maintained dynamically, reducing the Data Maintenance speed.
V. Index classification
1. directly create indexes and indirectly create indexes
Directly create an index: Create index mycolumn_index on mytable (myclumn)
Indirect index creation: defines the primary key constraint or unique key constraint. indexes can be created indirectly.
2. Common and unique Indexes
Normal index: Create index mycolumn_index on mytable (myclumn)
Unique index: ensure that all data in the index column is unique and can be used for clustered indexes and non-clustered indexes.
Create unique coustered index myclumn_cindex on mytable (mycolumn)
3. single index and Composite Index
Single index: Non-composite index
Composite Index: Also called composite index. The index creation statement contains multiple field names, up to 16 fields.
Create index name_index on username (firstname, lastname)
4. Clustered index and non-clustered index (clustered index and clustered index)
Clustered index: physical index. The physical order is the same as that of the base table. The data values are always sorted in order.
Create clustered index mycolumn_cindex on mytable (mycolumn)
Allow_dup_row (clustering indexes with repeated records are allowed)
Non-clustered index: Create unclustered index mycolumn_cindex on mytable (mycolumn)
Vi. Use of Indexes
1. When the field data update frequency is low, the query frequency is high and there are a large number of duplicate values. We recommend that you use clustered indexes.
2. Multiple columns are frequently accessed at the same time, and each column contains duplicate values. You can consider creating a composite index.
3. The pre-column of the composite index must be properly controlled; otherwise, the index will not work. If the leading column is not in the query condition during the query, the composite index will not be used. The leading column must be the most frequently used column.
4. Before a multi-table operation is executed, the query optimizer will list several possible connection solutions based on the connection conditions and find the best solution with the minimum system overhead. The join conditions must fully consider the tables with indexes and tables with multiple rows. The selection of the internal and external tables can be determined by the formula: Number of matched rows in the outer table * Number of times each query is performed in the inner table, minimum product is the best solution
5. any operation results on the column in The WHERE clause are calculated by column one by one during SQL Execution. Therefore, it has to perform table search without using the index on the column; if these results are obtained during query compilation, they can be optimized by the SQL optimizer and indexed to avoid table search (for example, select * from record where substring (card_no) = '000000'
& Select * from record where card_no like '000000') Any Column Operations will cause table scans, including database functions and calculation expressions, when querying, try to move the operation to the right of the equal sign
6. the 'in' in the where condition is logically equivalent to 'or', so the syntax analyzer will set in ('0', '1 ') convert to column = '0' or column = '1' for execution. We expect it to search for each or clause separately and then add the results so that the index on the column can be used. However, it actually uses the "or policy ", that is, first extract the rows that satisfy each or clause, store them to the worksheet of the temporary database, create a unique index to remove duplicate rows, and finally calculate the results from this temporary table. Therefore, the index on the column is not used in the actual process, and the completion time is also affected by the performance of the tempdb database. The in and or clauses usually use worksheets to invalidate the index. If a large number of duplicate values are not generated, consider splitting the clause. The split clause should contain the index.
7. Be good at using stored procedures to make SQL more flexible and efficient
Before creating a clustered index, you should first understand how your data is accessed. You can consider using clustered indexes:
A column that contains a large number of non-repeated values.
Use the following operators to return the query of a range value: Between,>, >=, <, and <=.
Columns that are continuously accessed.
Returns a query of a large result set.
Columns frequently accessed by queries using join or group by clauses. Generally, these columns are foreign key columns. Index the columns specified in the order by or group by clause, so that SQL server does not have to sort the data because these rows have been sorted. This improves the query performance.
OLTP applications that require fast single-row Lookup (typically through primary keys ). Create a clustered index on the primary key.
Clustered indexes are not applicable:
Columns frequently changed
This will move the entire row (because SQL server must retain the data values in the row in physical order ). Pay special attention to this because the data in the transaction processing system with large data volumes is easy to lose.
Width key
Key values from clustered indexes are used by all non-clustered indexes as search keys, so they are stored in the leaf entries of each non-clustered index.
Normal indexes can be created as needed, but the more indexes, the better !!
After searching for such a paragraph, the question is justified :)
If column stores highly relevant data and is frequently accessed in sequence, it is best to use the clustered index. This is because if the clustered index is used, the SQL Server physically goes in ascending order (default) or sort the data columns in descending order to quickly find the queried data. Similarly, when the search is controlled within a certain range, it is best to use clustered indexes for these columns. This is because there is only one clustered index on each table because of the physical data rearrangement. In contrast to the above, if columns contains poor data relevance, you can use the nonculstered index. You can use up to 249 nonclustered indexes in a table-although I cannot imagine that so many indexes will be used in practical applications.
When a table uses the primary key (primary keys), SQL Server automatically creates a unique Cluster Index for the column (s) containing the key by default. Obviously, creating a unique index for these columns (s) means that the primary key is unique. When establishing a foreign key relationship, if you plan to use it frequently, it is a good method to create a nonclustered index on the external key cloumn. If a table has a clustered index, it uses a linked list to maintain the relationship between data pages. Conversely, if the table does not have a clustered index, SQL Server saves the data page in a stack.
Experience in improving the query speed-SQL Server 2000/20052007-12-07 experience in improving the query speed:
1. First of all, I have a profound understanding of the requirements and flexible and rational design of the database and table structure. This is not a one-day task and I will not discuss it here for the moment.
2. Whether blob and text data types exist in database tables. These two data types are stored on different data pages through the page chain.
Slow query speed is normal. There are still many special parts of these two types. For example, there cannot be the same records, let alone the query speed problem.
3. the establishment and use of database and table indexes are the most important means to improve the query speed. Normally, the program automatically uses indexes during operation, however, you can also specify the index in the program based on its special needs.
Select/+ using Index name +/into: variable from Table Name
There are also many rules for the establishment and use of indexes. I hope you will not create junk indexes. Temporary databases are highly correlated with indexes, because one of the indexing principles is re-sorting, and the default temporary databases of the database system (usually 2 m) are too few, an error is prompted. Another problem is that using indexes in tags may occur.
I 'd like to add some more information. I remember that the SQL Optimization Index rules include: When performing multi-table join queries, the tables with fewer records are placed at the beginning (including on the left ).
If you do not know whether it is true or not, please kindly advise...
4. The database uses memory allocation and Cache Usage is also very important. The database I use is Sybase and the configuration method is sp_configure.
5. Different SQL statements are written, and the query speed is greatly affected. (The third query method is the fastest, but not the fastest .)
Table titles contains more records than table titleauthor.
Select title_id, title from titles, titleauthor
Where title. title_id = titleauthor. title_id
And titleauthor. title_id = 't81002'
Select title_id, title from titles, titleauthor
Where title. title_id = titleauthor. title_id
And title. title_id = 't81002'
Select title_id, title from titles, titleauthor
Where title. title_id = titleauthor. title_id
And title. title_id = 't81002'
And titleauthor. title_id = 't81002'
6. flexible use of data sharing will produce unexpected results.
It is normal to query 0.8 million pieces of data within 20 seconds.
A clustered index sorts the actual data on a disk by the values of one or more specified columns. Because the index page pointer of the clustered index points to the data page, using the clustered index to search for data is almost always faster than using a non-clustered index. Each table can only create a clustered index, and creating a clustered index requires at least 120% additional space for the table to store copies of the table and the intermediate index page, so we need to resize the temporary database.
In the clustered index, data is physically arranged on the data page in order, and duplicate values are also arranged together. Therefore, when the include range check (between,> =) or use group
By or order
If a row with the first key value in the range is found, the row with the subsequent index value is physically contiguous without further searching, avoiding large-scale scanning, this greatly improves the query speed.
When you create a clustered index on a table with frequent insertion operations, do not create a column with monotonous appreciation (such as identity). Otherwise, blocking conflicts are often caused.
Do not include columns that are frequently modified in the clustered index, because after the code value is modified, the data row must be moved to a new location.
Columns with only two or three values, such as male and female (yes or no), will never benefit from the index.
In view of the fact that the index accelerates the query speed, but slows down the data update speed. You can create a table on one segment, and create a non-clustered index on the other segment, which improves the operation performance on a separate physical device.
Because a large number of indexes take more time to insert, modify, and delete than no index.
Select the cluster index based on the where clause and connection operation type. The optional columns of the clustered index are:
● Primary Key column, which is used in the WHERE clause and inserted randomly.
● Columns accessed by range, such as pri_order> 100 and pri_order <200.
● Columns used in group by or order.
● Columns that are not frequently modified.
● Columns Used in connection operations.
The cursor is in the use of select/+ using Index
The index name +/must be noted that the database and table after the index have been re-ordered, and the data sequence has indeed changed during program execution, whether it meets your own programming needs.
To improve the query efficiency, the conditions in the where statement should be written as full as possible. For example, if two tables are used in a query statement, in addition to processing the association between the two tables, the associated fields of the two tables are also associated with the variables.
For example:
Select * from Dec, MST where dec. ID = MST. ID and Dec. ID =
: Ls_id and MST. ID =: ls_id;
Instead of select * from Dec, MST where dec. ID = MST. ID and Dec. ID =
: Ls_id;
Or
Select * from Dec, MST where dec. ID = MST. ID and MST. ID =
: Ls_id;
1. SQL
The index created by the server by default is a non-clustered index, because the non-clustered index does not re-organize the data in the table, instead, store index column values for each row and point them to the page where the data is located with a pointer. In other words, non-clustered indexes have an extra level between the index structure and the data itself. If a table does not have a clustered index, there are 250 non-clustered indexes. Each non-clustered Index provides different sorting orders for data access. When creating a non-clustered index, you must weigh the advantages and disadvantages of the index between the speed of query and the speed of modification.
2. For tables with frequent updates, non-clustered indexes require more additional costs than clustered indexes and no indexes at all. For each row to be moved to a new page, the page-level rows that point to the data of each non-clustered index must also be updated, and sometimes the index page splitting is required.
By default, the tempdb database is placed on the master device with a capacity of 2 MB, temporary databases are the most common databases used for sorting, creating temporary tables, and re-formatting. Therefore, tempdb optimization should be paid special attention:
Step 1: Adjust the location of the temporary database
1. The tempdb database is deployed on the master device by default. It is more desirable to send the temporary database to the isolated device.
2. Open the tempdb database and delete the master device from the segment.
Step 2: bind the temporary database to the cache.
Due to the creation and use of temporary tables, the temporary database frequently uses the data cache. Therefore, a high-speed cache should be created for the temporary database, so that it can be resident in the memory and help to distribute I/O:
1. Create a named Cache
Sp_cacheconfig "tempdb_cache", "10 m", "mixed"
2. Restart the server.
3. Bind a temporary database to the tempdb_cache Cache
Sp_bindcache "tempdb_cache", tempdb
4. If there is a large I/O, configure the memory pool.
Step 3: optimize temporary tables
Most temporary tables are simple to use and seldom need to be optimized. However, complex access to temporary tables is required.
Table creation and indexing should be separated by multiple processes or batch processing. The following technologies can improve the optimization of temporary tables:
1. Create an index on a temporary table
2. The complex use of temporary tables is dispersed into multiple batch processing or processes to provide information for the optimizer (for example, there is a complicated storage process for operations, this complex stored procedure is divided into several simple stored procedures and then executed .)