Principles of database indexing

Source: Internet
Author: User
Tags sql server books sorted by name
What is the principle of database indexing?

The management of the MIS system for small and medium-sized enterprises is basically composed of two parts: one is the visual operation at the front end, and the other is the database management at the back end. The management and maintenance of the front-end by the network management includes ensuring smooth network links, handling MIS terminal emergencies, and managing and training operators, this is the most common and hard work done by network administrators. However, in the MIS system architecture, it is equally important to manage, maintain, and optimize databases, in reality, it seems that the network administrator has not paid enough attention to it. It seems that this is a programmer's business. In fact, if a network administrator can communicate and collaborate effectively with programmers on data table standardization, table index optimization, Capacity Design, transaction processing, and other aspects during the MIS design period, the daily front-end management work will become much easier, because in a sense, the database management system is equivalent to the operating system and plays an equally important position in the system.

This is the essential difference between database management systems such as SQL Server and database file systems such as dbasex and access. Therefore, the strength and weakness of the database management system reflects the level of network management. In my opinion, a good admin should be at least a competent DBA (Database Administrator ).

The following uses SQL Server (sqls) as an example to give a brief introduction to the "indexing principles" that are hard to understand in database management. Other database management systems such as Oracle and Sybase can be integrated.

I. Basic Structure of data tables

The purpose of creating a database is to manage a large amount of data, and the index is to improve data retrieval efficiency, database performance, and data access speed. We need to know about indexes and how they work. The key is to understand the operating principles of indexes to better manage indexes.

To understand how indexes work, we must first make a comprehensive review of the basic structure of a data table.

Sqls when a new table is created, the system allocates a continuous space in 8 K units to the disk. When the field value is written to the disk from the memory, this space is stored randomly. When an 8 K is used up, the sqls pointer will automatically allocate an 8 K space. Here, each 8 K space is called a data page, also known as a page or data page, and is allocated with a page number ranging from 0 to 7. The Guide information is recorded on the 0th page of each file, it is called a file header. Each 8 Data Pages (64 K) combine to form an extended area (extent. A heap is formed by a combination of all data pages ).

Sqls specifies that the row cannot span data pages. Therefore, the maximum data volume recorded in each row can only be 8 KB. This is the reason why the capacity of the char and varchar string types must be limited to 8 KB. The text type should be used to store data that exceeds 8 KB. In fact, text Field Values cannot be directly input or saved. It only stores a pointer and points to an extended area consisting of several 8 K text data pages, the real data is placed on these data pages.

Pages can be divided into space pages and data pages.

When the eight data pages in an extended area contain both the spatial page and the data or index page, it is called the mixed extension (mixed extent). Each table starts with a mixed extension; on the contrary, it is called the uniform extension (Uniform extent), which stores data and index information.

When a table is created, sqls allocates at least one data page to it in the hybrid expansion. As the data volume increases, sqls can allocate seven pages in the hybrid expansion, when the data exceeds 8 pages, the data page is allocated from the consistent expansion.

The space page is responsible for data space allocation and management, including the PFS page (page free space ): record whether a page has been allocated, is located in the hybrid expansion or consistent expansion, and the available space on the page. Gam page (global allocation map) and SGAM page (secodary global allocation map ): used to record the location of idle extensions or hybrid extensions that contain idle pages. Sqls uses these three types of page files to create new spaces for data tables when necessary;

The data page or index page stores data and index information. sqls uses four types of data pages to manage tables or indexes: these are Iam pages, data pages, text/image pages, and index pages.

In Windows, the physical location of each step of operations on the file is only known by the system. SQL Server follows this approach, in the process of inserting data, not only are the values of each field randomly stored in the data page, but also the location of each data page in the heap is only known by the system.

Why? As we all know, the reason why the OS can manage disks is that the file allocation table fat (File Allocation Table) is first loaded when the system is started. It manages the file system and records all operations on the file, the system can run normally. Similarly, as a management system-level SQL Server, there is also such a fat-like table, which is the index distribution image page: Iam (index allocation map ).

The existence of IAM makes it possible for sqls to physically manage data tables.

The Iam page is allocated from the hybrid expansion and records the location of the eight initial pages and the location of the expansion area. Each Iam page can manage 512,000 data pages. If the data volume is too large, sqls can also add more Iam pages, which can be located anywhere in the file. The first Iam page is called firstiam, which records the location of the subsequent Iam page.

Data Pages and text/image pages are opposite each other. The former stores non-text/image data because they both do not exceed 8 K capacity, the latter only saves text or image data that exceeds 8 KB. The index page, as its name implies, stores data information related to the index structure. Understanding the page helps us to understand how sqls maintains indexes, such as page splitting and fill factor.

Ii. Basic indexing concepts

An index is a special type of database object, which is closely related to tables.

Indexes exist for search. For example, an index is attached to the end of some books, indicating the location of the page where a keyword appears in the body, which is convenient for us to find. However, most books only have directories, and directories are not indexes, the sorting of the content in the book does not provide a real retrieval function. It can be seen that creating indexes takes up space separately. Indexes do not have to be created. They only exist for better and faster retrieval and keyword locating.

What should we do if we want to view books in the library? There are many small cabinets called index card cabinets at the front end of the Library, which are divided into several categories for us to search for books. For example, you can use the order of strokes of titles or pinyin order as the basis for searching, you can also query the desired books from the strokes or pinyin order of the author's name. There are many retrieval methods, but you can understand that, books in the library are not arranged in the order of these cards-although theoretically this can be done, in fact, all books are manually pasted with a specific number (①, they are arranged in this order. The index card does not specify the number of the first bookshelves in the book library, but only the specific number. The Administrator returns the requested book to the reader based on the number. This is an image-like example, which will be used repeatedly in the following sections.

After sqls is installed, the installer automatically creates several special system databases, such as master, model, and tempdb. The master database is the master database of sqls, it is used to store and manage system information of other system databases, user databases, and sqls. It is equivalent to the Registry in windows.

The master has a system table named sysindexes, which specifically manages indexes. Sqls must be used to query data tables without any doubt. It is one of the main roles in this article.

To view the index attributes of a table, run the following command in the query Analyzer: Select * From sysindexes where id = object_id ('tablename '); to view the size of the table's index space, run the system stored procedure command: sp_spaceused tablename. The tablename parameter is the name of the table to be indexed.

Iii. Balance Tree

If you know the page number of a keyword through the index after the book, you may find the correct page number through random lookup. But the more scientific and faster method is: first, flip the book to the location of about 1/2. If the page number to be searched is smaller than the page number on the page, flip the book forward to 1/4. Otherwise, turn the book back to 3/4, and so on, and further divide the page into smaller parts until the correct page number is displayed. This is called the "Balance Tree". In the official tutorial MOC, Microsoft has another saying: B-tree (Balance Tree.

A table Index consists of several pages that form a tree structure. Tree B is a root node. It points to two other pages and logically divides the records of a table into two parts: "branches" --- non-leaf level; rather than leaf nodes, they point to a smaller part: "leaf"-leaf level ). The root node, non-leaf node, and leaf node are all located on the index page, collectively referred to as the index node, which belongs to the category of the index page. These "branches" and "leaves" finally point to a specific page ). A leaf between a root node and a leaf node is also called a data intermediate page.

"Root" corresponds to the root field of the sysindexes table, which records the physical location (pointer) of non-leaf nodes ); the non-leaf node is located between the root node and the leaf node, and records the pointer to the leaf node. The leaf node finally points to the data page. This is the Balance Tree ".

Iv. clustered and non-clustered Indexes

Formally, indexes are classified into clustered indexes and nonclustered indexes ).

The clustered index is equivalent to the specific number on the back of the book. If a clustered index is created for a table, its index page contains the value of the index column (hereinafter referred to as the index key value ), the records in the table are sorted by the index key value. For example, if we create a clustered index on the field "name", the records in the table will be sorted by name; if the column with a clustered index is of the numerical type, records are arranged according to the value of the key value.

Non-clustered indexes are used to specify the logical sequence of data. That is to say, the data in the table is not arranged in the order specified by the index key value, but is still stored in the order when records are inserted. Its index page contains the index key value and the physical location of the row record pointed to in the data page, which is called the row Locator (RID: Row ID ). Like the index table behind the book, the order in the index table is inconsistent with the actual page number order. In addition, a book may have multiple indexes. For example, topic index and author index.

The index created by SQL Server by default is a non-clustered index. Because the non-clustered index does not reorganize the data in the table, it only stores the index key value and points it to the page where the data is located. If a table has no clustered index, 249 non-clustered indexes can be created theoretically. Each non-clustered Index provides different sorting orders for data access.

5. How data is accessed

If you can really understand the basic knowledge of the above indexes, then looking back at the indexing principle will be much simpler and easier.

(1) how to access sqls without creating any index data table:

Heap is translated into Chinese and called "heap". Its Meaning implies disorder and disorder. As we mentioned above, when data values are written into the data page, there is no specific order between each row of records, therefore, the order of rows and rows is random and unordered. Of course, the data pages in the table are unordered, and all the data pages in the table form a "Heap". You can say that, A non-indexed data table is like a library with only a bookcase but no index card cabinet. The Library is full of messy books. When the reader submits a query request to the Administrator, the Administrator goes into the library and looks for the content from the beginning to the beginning. If you are lucky, the first book on the first shelf is found. If you are not lucky, it will be found in the last book on the last shelf.

When sqls receives a query request, it first analyzes the value of a field called the index identifier (indid: Index ID) in the sysindexes table. If the value is 0, indicates that this is a data table rather than an index table, sqls will use another field in the sysindexes table -- that is, find the iam page chain of the table in the firstiam value mentioned earlier -- that is, all data page sets.

This is the way to search for data in a data table without an index. Isn't it very inefficient? For a table without indexes, sqls can only do this for a "Heap" record. What's even more boring is that even if the queried record is found in the first row, sqls still needs to scan the table from start to end. This type of query is called "traversal" and "Table scan ".

It can be seen that data tables without index creation can still run, but this method is not a big problem for small-scale tables, but the efficiency of querying massive data is too low.

(2) How does sqls access data tables with non-clustered indexes:

As mentioned above, multiple non-clustered indexes can be created with a B-tree structure. Its leaf-level nodes do not contain data pages but only index rows. If a table has only non-clustered indexes, each index row contains the non-clustered index key value and row identifier (RID), which points to the data row with the key value. Each rid consists of a file ID, a page number, and a row number on the page.

When the value of indid is between 2 and, it means that the table has a non-clustered index page. At this time, sqls calls the root field value to point to the root of the non-clustered index B tree, searches for the most similar value as the queried value, and finds the page number in the non-leaf node based on this value, then, find the Rid Of the value on the page corresponding to the leaf node, locate the page and row in the heap based on the RID, and return to the query end.

For example, if a non-clustered index is created on the lastname, when select * from member where lastname = 'ota 'is executed, the query process is: ① The indid value of sqls query is 2; ② start from the root immediately, locate the value closest to Ota "Martin" in the non-leaf node, and find its location on page 61st of the leaf level; ③ search for the ota rid only under Martin on page 61st of the leaf level page. The RID is displayed, indicates that the record named Ota in the lastname field is located in row 707th on the 4th page of the heap, and N indicates the file ID value, which is irrelevant to the Data. ④ Based on the above information, sqls immediately goes to
The record "published" is displayed on the 707 page and displayed on the front-end (client ). Depending on the data size of the table, the entire query process takes a few milliseconds to several milliseconds.

When talking about the basic concepts of indexes, we mentioned this method:

There are many index card cabinets in the front-end of the Library, which are divided into several categories, such as strokes by title or Pinyin sequence, author strokes or Pinyin sequence, but there are two differences: ① The index card records the specific location of each book-the number of books on the shelf of a cabinet-rather than the "special number"; ② There is no "special number" on the spine ". After the Administrator finds the specific location (RID) of the required books in the index Cabinet, the Administrator submits the books based on the specific location of the RID directly in the library.

Obviously, this query method is very efficient, but it consumes a lot of resources. Because the location of books in the library is changing at any time, the Administrator must spend extra energy and time to update the index at any time.

(3) how to access data tables with clustered indexes by sqls:

In a clustered index, the data page where the data is located is at the leaf level, and the index page where the index data is located is at the non-leaf level.

The query principle is similar to the preceding query for non-clustered indexes, but because the records are sorted by the index key values in the clustered index, in other words, the index key value of the clustered index is the specific data page.

This is like the books in the library are sorted by Pinyin of titles, and only corresponding index cards are created according to this sorting method, therefore, it is much simpler to query than to create a non-clustered index. Take the preceding query as an example:

Assume that a clustered index is created on the lastname field. When select * from member where lastname = 'ota 'is executed, the query process is: ① sqls queries the indid value as 1, this indicates that only clustered indexes are created in the system. ② start from the root immediately and locate the value closest to Ota "Martin" in non-leaf nodes ", and found its 120th page on the leaf page; ③ found the OTA entry under Martin on the leaf page 120th page, and this entry is already the data record itself; ④ return the record to the client.

This time, the efficiency is higher than the second method, so that it looks more beautiful, but its biggest advantage is its biggest disadvantage-because the same table can only be arranged in one order at the same time, therefore, only one clustered index can be created in any data table. Creating a clustered index requires at least 120% additional space for the source table to store copies of the source table and the intermediate index page!

Isn't the fish and the bear's paw worth both? There are some solutions.

(4) How does sqls access data tables with both clustered indexes and non-clustered indexes:

If we create a clustered index before creating a non-clustered index, then the non-clustered index can be searched using the clustered index keyword, just as in the library, there can be different types of book index cards in the front-end card cabinet, but each card contains the special number-not the specific location where the books are stored. To the greatest extent, this not only takes care of the speed of data retrieval, but also makes the daily maintenance of indexes more feasible. This is the most scientific retrieval method.

That is to say, when only non-clustered indexes are created, each leaf node specifies the row location (RID) of the record ); when both clustered indexes and non-clustered indexes exist, each leaf-level node points to the index key value of the clustered index, that is, the data record itself.

Assume that the clustered index is created on the lastname instead of the clustered index on the firstname. When select * from member where firstname = 'Mike 'is executed, the query process is: ① The indid value of sqls query is 2; ② start from the root immediately and locate the "Jose" entry closest to Mike's value in the non-clustered nodes of the firstname index; ③ check the logic position of Mike from the leaf level page under the Jose entry -- instead of the RID, It is the pointer of the clustered index; ④ According to the indicated position of this pointer, go directly to the leaf-level page in the clustered index of lastname to reach the mike data record itself; ⑤ return the record to the client.

This is exactly the same as the actual scenario we mentioned in "basic index concepts". When data is updated, sqls is only responsible for maintaining the health of clustered indexes, without having to consider non-clustered indexes. As long as we create clustered indexes on the fields of the ID class, instead, you can create non-clustered indexes on other fields that often need to be queried. This scientific and targeted method is used to create clustered indexes and non-clustered indexes on a table, we not only enjoy the flexibility and efficiency brought about by indexes, but also avoid the large amount of additional resource consumption caused by index maintenance.

Vi. Advantages and Disadvantages of Indexes

There are some inherent limitations of indexes: 1: To create an index, the system needs to occupy about 1.2 times the hard disk and memory space of the table to store the index. 2: when updating data, the system must have additional time to update indexes at the same time, to maintain data and index consistency-this is as if the library needs to have a dedicated location to place index cabinets, in addition, when the inventory book changes, someone needs to restructure the index card to keep the index consistent with the inventory.

Of course, the advantages of index creation are also obvious: in the case of massive data, if the index is properly established, this greatly improves the efficiency of sqls in performing queries, sorting results, and grouping.

Practices show that improper indexes not only do not help, but also reduce system performance. Because a large number of indexes take more time to insert, modify, and delete than no index. For example, it is inappropriate to create an index for the following fields: 1. Few or never reference fields; 2. Logical fields, such as male or female (yes or no.

To sum up, improving query efficiency is at the cost of consuming certain system resources. indexes cannot be established blindly and must be integrated, there must be a balance between "speeding up the query speed" and "Reducing the modification speed". If there is a loss, this will take a long time. This is an important indicator to test whether a DBA is excellent.

So far, we have been saying that sqls consumes system resources when maintaining indexes. What resources does sqls consume when maintaining indexes? What problems will occur? Should I optimize the field index?

______________________________________________________________________________________________

In the previous article, we elaborated on the basic concepts of indexes and the principles of data query, we know that when creating an index, we must make a balance between "speeding up the query speed" and "Reducing the modification speed". If there is a loss, this will take a long time. So how does sqls consume resources when maintaining indexes? In what ways should indexes be managed and optimized? Here are seven answers to these questions.

I. Page splitting

Microsoft MOC teaches us that when a data page reaches 8 K capacity, if data is inserted or updated at this time, the page will be split (another name page split ):

1. With clustered indexes: clustered indexes direct inserted and updated rows to specific pages, which are determined by the clustered index keywords;

2. if there is only a heap, a new row can be inserted as long as there is space. However, if we need more space to update row data, as a result, the row is moved to the new page, and a forwarding pointer is left in the original position, pointing to the new row to be moved, if the row with the forwarding pointer is moved again, the original pointer points to a new position;

3. If a non-clustered index exists in the heap, although the insert and update operations do not split pages in the heap, the non-clustered index still splits pages.

About half of the data is retained on the old page, regardless of whether there is any index, and the other half is placed on the new page, and the new page may be allocated to any available page. Therefore, frequent page splitting has serious consequences. This will cause a large amount of data fragments to the physical table, resulting in a sharp drop in I/O efficiency. Finally, stopping sqls and re-indexing is our only choice!

Ii. Fill Factor

However, at the beginning of chaos, we can avoid unpleasant appearance to some extent: when creating an index, we can specify a fill factor for this index, this allows you to retain a certain percentage of space on each leaf-level page of the index. In the future, data can be expanded to reduce page split. Fill factor is a percentage value from 0 to 100. If it is set to 100, the data page is filled. This setting is used only when no changes are made to the data (such as read-only tables. The smaller the value, the larger the free space on the data page. This reduces the need for page splitting during index growth. However, this operation requires more disk space.

The fill factor is only executed when the index is created. After the index is created, the fill factor is not maintained when data is added, deleted, or updated in the table, if you want to keep extra space on the data page, it is contrary to the intention of using the fill factor, because with the data input, sqls must split pages on each page, to keep the idle space specified by the fill factor. Therefore, the free space of the data page can be filled only when the data in the table is greatly changed. In this case, you can easily re-create the index, re-specify the fill factor, and re-distribute the data.

Conversely, improper filling factor setting reduces the read performance of the database. The reduction is inversely proportional to the filling factor setting value. For example, when the fill factor value is 50, the read performance of the database will be twice reduced! Therefore, it makes sense to set the fill factor only when the table creates a new index based on the existing data and predicts which changes will be made to the data in the future.

3. Two mathematical questions

Assuming there is no problem with the database design, is it as analyzed in the previous article? When you build a large number of indexes, in the query work, sqls can only process each submitted query with an index according to the "highest indicator? The answer is no!

The four indexing schemes mentioned in the previous chapter "how data is accessed" are only static, standard, and theoretical analysis and comparison. In fact, military orders will not be available, sqls is almost "autonomous" to determine whether to use or which index to use!

What's going on?

Let's first calculate a question: if a table's record occupies 1000 bytes (1 k) on disk, we will create an index for a field of 10 bytes, the index size of the record is only 10 bytes (0.01 K ). As mentioned in the previous article, the minimum space allocation unit of sqls is "page". A page occupies 8 K space on the disk. Therefore, only 8 "records" can be stored on one page ", however, 800 "indexes" can be stored ". Now we need to retrieve records that meet a certain condition (with a where clause) from a table with 8000 records. If there is no index, we need to traverse 8000x1000 bytes/8 K Bytes = 1000 pages to find the results. If the above index is available for the search field, we can retrieve the index blocks that meet the Search Condition in 8000 × 10 bytes/8 K Bytes = 10 pages, then find the result data block one by one based on the pointer on the index block, so that the I/O traffic will be much less.

However, sometimes using indexes is not as fast as using indexes!

Same as above. If you want to retrieve all records unconditionally (without the WHERE clause) without indexing, you need to access 8000x1000 bytes/8 K Bytes = 1000 pages. If you want to use indexes, first, retrieve the index, access 8000x10 bytes/8 K Bytes = 10 pages to obtain the index search results, and then retrieve the corresponding data page based on the index search results. Because it is to retrieve all the data, therefore, you need to access 8000 more lines X 1000 bytes/8 K Bytes = 1000 pages to read all the data. A total of 1010 pages are accessed, which is obviously not as fast as that without indexes.

Sqls has a complete set of data index optimization technologies. In the above cases, sqls will automatically use the table scan method to retrieve data without using any indexes. So how does sqls know when to use indexes and when to not use indexes? Because sqls not only maintains data information, but also maintains data statistics!

Iv. Statistics

Open the Enterprise Manager, click the "Database" node, right-click the northwind database, click "properties", select the "options" tab, and observe the "Settings" option, what did you find?

From settings, we can see that in the database, sqls automatically creates and updates statistics by default, which include data density and distribution information, they help sqls determine the best query policy: Create a query plan, determine whether to use indexes, and what indexes to use.

When creating an index, sqls creates a distribution data page to store two statistical information about the index: distribution table and density table. The query optimizer uses these statistics to estimate the cost of using the index for queries (cost), and then checks whether the index is useful for a specific query.

As the data in the table changes, sqls automatically updates the statistics periodically. Sampling is performed randomly on each data page. After reading a data page from a disk, all rows on the data page are used to update statistics. The update frequency of statistical information depends on the amount of data in the field or index and the amount of data changes. For example, for a table with 10 thousand records, when the 1000 index key values change, the statistical information of the table may need to be updated, because 1000 values account for 10% of the table, this is a large proportion. For a table with 10 million records, the change of 1000 index values is negligible. Therefore, the statistics are not automatically updated.

As for the specific process that they help sqls establish a query plan, the length is limited. This is omitted here. Please study it by yourself.

By the way, sqls not only automatically records statistics, but also records details of other activities on the server, includes I/O statistics, CPU statistics, lock requests, T-SQL and RPC statistics, index and table scans, warnings and errors, database object creation/removal, connection/ disconnection, stored procedure operations, cursor operations, and so on. For information reading and setting, please search for the string "profiler" in the sqls online help document (SQL Server books online.

5. manual indexing Maintenance

As mentioned above, some unsuitable indexes will affect the performance of sqls. As the application system runs, data constantly changes, when the data changes to a certain degree, the index usage will be affected. You need to maintain the index on your own.

With the insertion, deletion, and split of data rows, some index pages may contain only a few pages of data. In addition, when the application executes a large number of I/O operations, rebuilding non-clustered indexes can maintain I/O efficiency. Re-indexing is actually re-organizing the B-tree. Which of the following situations require re-indexing:

1) Significant changes in data and usage modes;

2) The order of sorting changes;

3) A large number of insert operations or operations have been completed;

4) The number of disk reads using I/O queries is more than expected;

5) due to a large amount of data modification, insufficient use of data pages and index pages leads to overestimation of space usage;

6) DBCC checks that the index is faulty.

Vi. indexing principles

Coming to an end, let's look at the two important attributes of an index from another perspective-the unique index and composite index.

When designing a table, you can restrict the field values. For example, you can restrict the primary key or uniqueness of a field.

The primary key constraint specifies that one or more fields are not allowed to be repeated. It is used to prevent two identical records in the Table. Such fields are called primary keys. Each table can be created and only one primary key can be created, fields that constitute the primary key do not allow null values. For example, the "ID card number" field in the employee table or the "student ID, course number" field combination in the student table.

The Uniqueness constraint is similar to the primary key constraint. The difference is that the field that constitutes the uniqueness constraint allows null values.

The index is based on the primary key constraint and the uniqueness constraint. Because the field value is unique, we call this index "unique index ", if this unique index is created by a combination of two or more fields, it is also called "compound index ".

Note that a unique index is not a clustered index. If you create a unique index for a field, you cannot enter duplicate values for the field. It does not prevent you from creating a unique index for other types of fields. They can be clustered or non-clustered.

The unique index ensures that all data in the index column is unique and does not contain redundant data. If a table already has a primary key constraint or uniqueness constraint, sqls automatically creates a unique index when creating or modifying a table. However, to ensure uniqueness, you should create a primary key constraint or a unique key constraint instead of creating a unique index. When creating a unique index, you should carefully consider these rules: When you create a primary key constraint or a unique key constraint in a table, sqls batch automatically creates a unique index. If the table already contains data, when an index is created, sqls checks the existing data redundancy in the table. If redundant values are found, sqls cancels the execution of the statement and returns an error message, make sure that each row of data in the table has a unique value.

A composite index is an index created in two or more columns. When you search for two or more columns as a key value, it is best to create a composite index on these columns. When creating a composite index, consider these rules: You can combine up to 16 columns into a separate composite index. The total length of a composite index Column cannot exceed 900 bytes, that is to say, the composite Column Length cannot be too long. In composite indexes, all columns must come from the same table and cannot create Composite Columns across tables. In composite indexes, the order of columns is very important. In principle, we should first define the most unique column, for example, the index on (col1, col2) and on (col2, col1) the index on is different because the order of the two index columns is different. To enable the query optimizer to use a composite index, the where clause in the query statement must refer to the first column in the composite index; composite indexes are useful when multiple key columns exist in a table. composite indexes can improve query performance and reduce the number of indexes created in a table.

To sum up, we have summarized the following index usage principles:

1) the logical primary key uses a unique group index, and uses a unique non-group index for the system key (as a stored procedure) and a non-group index for any foreign key column. Consider the size of the database space, how tables are accessed, and whether these accesses are mainly used for reading and writing.

2) Do not index Memo/Note fields, or index large fields (with many characters). This will occupy too much storage space for indexes.

3) do not index commonly used small tables

4) Generally, do not set too many indexes for small data tables. If they are often inserted or deleted, sqls provides index maintenance for these insert and delete operations, which may consume more time than the scan tablespace.

VII. Finale

A query is a physical process. On the surface, sqls runs in the East and West. In fact, most of the tasks that press the road are completed by the disk Input/Output System (I/O, A full table scan needs to read every data page of the table from the disk. If an index points to a data value, I/O reads the disk several times. However, in addition, deletion, and modification operations that occur at any time, the existence of indexes greatly increases the workload. Therefore, A reasonable index design is based on the analysis and prediction of various queries. Only by correctly combining the index with the program can the best optimization solution be produced.

In general, the idea of creating an index is:

(1) The primary key is often used as a condition for the WHERE clause. A clustered index should be created on the primary key column of the table, especially when it is often used as a connection.

(2) create a clustered index for columns with a large number of duplicate values and frequent range queries and sorting and grouping, or columns that are frequently accessed.

(3) multiple columns are frequently accessed at the same time, and each column contains duplicate values. You can create a composite index to overwrite one or more queries, and use the columns with the most frequent queries as the leading columns, if possible, key queries may be overwritten.

(4) If you know that all values of the index key are unique, make sure that the index is defined as a unique index.

(5) When creating an index on a table that often performs insert operations, use fillfactor to reduce page splitting and increase concurrency to reduce the occurrence of deadlocks. If you create an index on a read-only table, you can set fillfactor to 100.

(6) When selecting index fields, try to select those small data fields as index keys so that each index page can accommodate as many index keys and pointers as possible, you can minimize the number of index pages that must be traversed by a query. In addition, try to use an integer as the key value, because it can provide faster access than any data type.

Sqls is a complicated system. It makes things behind indexes and queries clear and helps us better understand our system. In a word, indexes are like salt, but less tasteless and more salty.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.