Database indexing Principle 2

Last Update:2018-12-05 Source: Internet

Author: User

Tags sql server books

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the previous article (database indexing Principle 1), we elaborated on the basic concepts and data query principles of indexes, we know that when creating an index, we must make a balance between "speeding up the query speed" and "Reducing the modification speed". If there is a loss, this will take a long time. So how does SQLS consume resources when maintaining indexes? In what ways should indexes be managed and optimized? Here are seven answers to these questions.

I. Page splitting

Microsoft MOC teaches us that when a data page reaches 8 K capacity, if data is inserted or updated at this time, the page will be split (another name page split ):

1. With clustered indexes: clustered indexes direct inserted and updated rows to specific pages, which are determined by the clustered index keywords;

2. if there is only a heap, a new row can be inserted as long as there is space. However, if we need more space to update row data, as a result, the row is moved to the new page, and a forwarding pointer is left in the original position, pointing to the new row to be moved, if the row with the forwarding pointer is moved again, the original pointer points to a new position;

3. If a non-clustered index exists in the heap, although the insert and update operations do not split pages in the heap, the non-clustered index still splits pages.

About half of the data is retained on the old page, regardless of whether there is any index, and the other half is placed on the new page, and the new page may be allocated to any available page. Therefore, frequent page splitting has serious consequences. This will cause a large amount of data fragments to the physical table, resulting in a sharp drop in I/O efficiency. Finally, stopping SQLS and re-indexing is our only choice!

II,Fill Factor

However, at the beginning of chaos, we can avoid unpleasant appearance to some extent: when creating an index, we can specify a fill factor for this index, this allows you to retain a certain percentage of space on each leaf-level page of the index. In the future, data can be expanded to reduce page split. Fill factor is a percentage value from 0 to 100. If it is set to 100, the data page is filled. This setting is used only when no changes are made to the data (such as read-only tables. The smaller the value, the larger the free space on the data page. This reduces the need for page splitting during index growth. However, this operation requires more disk space.

The fill factor is only executed when the index is created. After the index is created, the fill factor is not maintained when data is added, deleted, or updated in the table, if you want to keep extra space on the data page, it is contrary to the intention of using the fill factor, because with the data input, SQLS must split pages on each page, to keep the idle space specified by the fill factor. Therefore, the free space of the data page can be filled only when the data in the table is greatly changed. In this case, you can easily re-create the index, re-specify the fill factor, and re-distribute the data.

Conversely, improper filling factor setting reduces the read performance of the database. The reduction is inversely proportional to the filling factor setting value. For example, when the fill factor value is 50, the read performance of the database will be twice reduced! Therefore, it makes sense to set the fill factor only when the table creates a new index based on the existing data and predicts which changes will be made to the data in the future.

3. Two mathematical questions

Assuming there is no problem with the database design, is it as analyzed in the previous article? When you build a large number of indexes, in the query work, SQLS can only process each submitted query with an index according to the "highest indicator? The answer is no!

The four indexing schemes mentioned in the previous chapter "how data is accessed" are only static, standard, and theoretical analysis and comparison. In fact, military orders will not be available, SQLS is almost "autonomous" to determine whether to use or which index to use!

What's going on?

Let's first calculate a question: if a table's record occupies 1000 bytes (1 K) on disk, we will create an index for a field of 10 bytes, the index size of the record is only 10 bytes (0.01 K ). As mentioned in the previous article, the minimum space allocation unit of SQLS is "Page". A Page occupies 8 K space on the disk. Therefore, only 8 "records" can be stored on one Page ", however, 800 "indexes" can be stored ". Now we need to retrieve records that meet a certain condition (with a Where clause) from a table with 8000 records. If there is no index, we need to traverse 8000x1000 bytes/8 K Bytes = 1000 pages to find the results. If the above index is available for the search field, we can retrieve the index blocks that meet the Search Condition in 8000 × 10 bytes/8 K Bytes = 10 pages, then find the result data block one by one based on the pointer on the index block, so that the I/O traffic will be much less.

However, sometimes using indexes is not as fast as using indexes!

Same as above. If you want to retrieve all records unconditionally (without the Where clause) without indexing, you need to access 8000x1000 bytes/8 K Bytes = 1000 pages. If you want to use indexes, first, retrieve the index, access 8000x10 bytes/8 K Bytes = 10 pages to obtain the index search results, and then retrieve the corresponding data page based on the index search results. Because it is to retrieve all the data, therefore, you need to access 8000 more lines X 1000 bytes/8 K Bytes = 1000 pages to read all the data. A total of 1010 pages are accessed, which is obviously not as fast as that without indexes.

SQLS has a complete set of data index optimization technologies. In the above cases, SQLS will automatically use the table scan method to retrieve data without using any indexes. So how does SQLS know when to use indexes and when to not use indexes? Because SQLS not only maintains data information, but also maintains data statistics!

Iv. Statistics

Open the Enterprise Manager, click the "Database" node, right-click the Northwind Database, click "properties", select the "Options" tab, and observe the "Settings" option, what did you find?

From Settings, we can see that in the database, SQLS automatically creates and updates statistics by default, which include data density and distribution information, they help SQLS determine the best query policy: Create a query plan, determine whether to use indexes, and what indexes to use.

When creating an index, SQLS creates a distribution data page to store two statistical information about the index: distribution table and density table. The query optimizer uses these statistics to estimate the Cost of using the index for queries (Cost), and then checks whether the index is useful for a specific query.

As the data in the table changes, SQLS automatically updates the statistics periodically. Sampling is performed randomly on each data page. After reading a data page from a disk, all rows on the data page are used to update statistics. The update frequency of statistical information depends on the amount of data in the field or index and the amount of data changes. For example, for a table with 10 thousand records, when the 1000 index key values change, the statistical information of the table may need to be updated, because 1000 values account for 10% of the table, this is a large proportion. For a table with 10 million records, the change of 1000 index values is negligible. Therefore, the statistics are not automatically updated.

As for the specific process that they help SQLS establish a query plan, the length is limited. This is omitted here. Please study it by yourself.

By the way, SQLS not only automatically records statistics, but also records details of other activities on the server, includes I/O statistics, CPU statistics, lock requests, T-SQL and RPC statistics, index and table scans, warnings and errors, database object creation/removal, connection/ disconnection, stored procedure operations, cursor operations, and so on. For information reading and setting, please search for the string "Profiler" in the SQLS Online help document (SQL Server Books Online.

5. manual indexing Maintenance

As mentioned above, some unsuitable indexes will affect the performance of SQLS. As the application system runs, data constantly changes, when the data changes to a certain degree, the index usage will be affected. You need to maintain the index on your own.

With the insertion, deletion, and split of data rows, some index pages may contain only a few pages of data. In addition, when the application executes a large number of I/O operations, rebuilding non-clustered indexes can maintain I/O efficiency. Re-indexing is actually re-organizing the B-tree. Which of the following situations require re-indexing:

1) Significant changes in data and usage modes;

2) The order of sorting changes;

3) A large number of insert operations or operations have been completed;

4) The number of disk reads using I/O queries is more than expected;

5) due to a large amount of data modification, insufficient use of data pages and index pages leads to overestimation of space usage;

6) dbcc checks that the index is faulty.

Vi. indexing principles

Coming to an end, let's look at the two important attributes of an index from another perspective-the unique index and composite index.

When designing a table, you can restrict the field values. For example, you can restrict the primary key or uniqueness of a field.

The primary key constraint specifies that one or more fields are not allowed to be repeated. It is used to prevent two identical records in the Table. Such fields are called primary keys. Each table can be created and only one primary key can be created, fields that constitute the primary key do not allow null values. For example, the "ID card number" field in the employee table or the "student ID, course number" field combination in the student table.

The Uniqueness constraint is similar to the primary key constraint. The difference is that the field that constitutes the uniqueness constraint allows null values.

The index is based on the primary key constraint and the uniqueness constraint. Because the field value is unique, we call this index "unique index ", if this unique index is created by a combination of two or more fields, it is also called "compound index ".

Note that a unique index is not a clustered index. If you create a unique index for a field, you cannot enter duplicate values for the field. It does not prevent you from creating a unique index for other types of fields. They can be clustered or non-clustered.

The unique index ensures that all data in the index column is unique and does not contain redundant data. If a table already has a primary key constraint or uniqueness constraint, SQLS automatically creates a unique index when creating or modifying a table. However, to ensure uniqueness, you should create a primary key constraint or a unique key constraint instead of creating a unique index. When creating a unique index, you should carefully consider these rules: When you create a primary key constraint or a unique key constraint in a table, SQLS batch automatically creates a unique index. If the table already contains data, when an index is created, SQLS checks the existing data redundancy in the table. If redundant values are found, SQLS cancels the execution of the statement and returns an error message, make sure that each row of data in the table has a unique value.

A composite index is an index created in two or more columns. When you search for two or more columns as a key value, it is best to create a composite index on these columns. When creating a composite index, consider these rules: You can combine up to 16 columns into a separate composite index. The total length of a composite index Column cannot exceed 900 bytes, that is to say, the composite Column Length cannot be too long. In composite indexes, all columns must come from the same table and cannot create Composite Columns across tables. In composite indexes, the order of columns is very important. In principle, we should first define the most unique column, for example, the index on (COL1, COL2) and on (COL2, COL1) the index on is different because the order of the two index columns is different. To enable the query optimizer to use a composite index, the WHERE clause in the query statement must refer to the first column in the composite index; composite indexes are useful when multiple key columns exist in a table. composite indexes can improve query performance and reduce the number of indexes created in a table.

To sum up, we have summarized the following index usage principles:

1) the logical primary key uses a unique group index, and uses a unique non-group index for the system key (as a stored procedure) and a non-group index for any foreign key column. Consider the size of the database space, how tables are accessed, and whether these accesses are mainly used for reading and writing.

2) Do not index memo/note fields, or index large fields (with many characters). This will occupy too much storage space for indexes.

3) do not index commonly used small tables

4) Generally, do not set too many indexes for small data tables. If they are often inserted or deleted, SQLS provides index maintenance for these insert and delete operations, which may consume more time than the scan tablespace.

VII. Finale

A query is a physical process. On the surface, SQLS runs in the East and West. In fact, most of the tasks that press the road are completed by the disk Input/Output System (I/O, A full table scan needs to read every data page of the table from the disk. If an index points to a data value, I/O reads the disk several times. However, in addition, deletion, and modification operations that occur at any time, the existence of indexes greatly increases the workload. Therefore, A reasonable index design is based on the analysis and prediction of various queries. Only by correctly combining the index with the program can the best optimization solution be produced.

In general, the idea of creating an index is:

(1) The primary key is often used as a condition for the where clause. A clustered index should be created on the primary key column of the table, especially when it is often used as a connection.

(2) create a clustered index for columns with a large number of duplicate values and frequent range queries and sorting and grouping, or columns that are frequently accessed.

(3) multiple columns are frequently accessed at the same time, and each column contains duplicate values. You can create a composite index to overwrite one or more queries, and use the columns with the most frequent queries as the leading columns, if possible, key queries may be overwritten.

(4) If you know that all values of the index key are unique, make sure that the index is defined as a unique index.

(5) When creating an index on a table that often performs insert operations, use fillfactor to reduce page splitting and increase concurrency to reduce the occurrence of deadlocks. If you create an index on a read-only table, you can set fillfactor to 100.

(6) When selecting index fields, try to select those small data fields as index keys so that each index page can accommodate as many index keys and pointers as possible, you can minimize the number of index pages that must be traversed by a query. In addition, try to use an integer as the key value, because it can provide faster access than any data type.

SQLS is a complicated system. It makes things behind indexes and queries clear and helps us better understand our system. In a word, indexes are like salt, but less tasteless and more salty.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More