Database index creation principles

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

You can use indexes to quickly access specific information in database tables. An index is a structure that sorts the values of one or more columns in a database table, for example, the last name (lname) column of the employee table. If you want to search for a specific employee by name, the index will help you get the information faster than all rows in the table that must be searched.

An index is a separate, physical database structure. It is a set of one or more column values in a table and a logical pointer list pointing to the data page that physically identifies these values in the table.

The Index provides pointers to the data values stored in the specified column of the table, and then sorts these pointers according to the sort order you specify. The database uses an index in a similar way as you use an index in a book: it searches for an index to find a specific value, and then returns the pointer to the row containing the value.

In the database graph, you can create, edit, or delete each index type on the index/Key Attribute page of the selected table. When you save the table to which the index is attached or the relational graph of the table is saved, the index is saved in the database.

Advantages of index creation

1. Greatly speed up data retrieval;

2. Create a unique index to ensure the uniqueness of each row of data in the database table;

3. Accelerate the connection between tables;

4. When you use grouping and sorting clauses to retrieve data, you can significantly reduce the time for grouping and sorting in queries.

Index disadvantages

1. Indexes occupy physical space.

2. When adding, deleting, and modifying data in a table, the index must also be dynamically maintained, reducing the Data Maintenance speed.

Based on the functions of the database, you can create three indexes in the Database Designer: unique index, primary key index, and clustered index. For more information about the index functions supported by the database, see the database documentation.

Note that although the unique index helps to locate information, we recommend that you use primary keys or unique constraints to obtain the best performance results. For more information about these constraints, see primary key constraints and unique constraints.

　　Unique Index

A unique index is an index that does not allow any two rows to have the same index value.

When duplicate key values exist in existing data, most databases do not allow you to save the newly created unique index with the table. The database may also prevent adding new data that will create duplicate key values in the table. For example, if the employee's last name (lname) in the employee table creates a unique index, neither employee can have the same name.

For more information about unique indexes, see create unique indexes.

　　Primary Key Index

A database table often has a column or a combination of columns. Its Values uniquely identify each row in the table. This column is called the primary key of the table.

When you define a primary key for a table in the database relationship diagram, the primary key index is automatically created. The primary key index is a specific type of unique index. This index requires that each value in the primary key be unique. When a primary key index is used in a query, it also allows quick access to data. For more information about primary keys, see defining primary keys.

　　Clustered Index

In the clustered index, the physical order of the row in the table is the same as the logic (INDEX) Order of the key value. A table can contain only one clustered index.

If an index is not a clustered index, the physical sequence of the row in the table does not match the logical sequence of the key value. Compared with non-clustered indexes, clustered indexes generally provide faster data access speeds.

I. Index

1. Concept: indexing is a tool book that reveals the source of the document content and provides evidence of the document.

2. Type: there are many types. Different types can be divided from different perspectives. By text, it can be divided into the foreign index of the Chinese index; by indexing range, it can be divided into a comprehensive index and a topical index; by the time of the collected literature, it can be divided into a recent index and a retrieval index; the index category can be divided into the index of the title, the index of the author, the word index, the index of the topic, and the classification index.

3. function: reveal the document content and guide readers to find information

4. Role: The index reveals the basic information of a book or publication, such as the articles and sentences. Provides readers with detailed, complete, and systematic guidance on the required documents.

Iron 1: There is no free lunch in the world, and you have to pay for indexing.

The advantages of indexes are obvious to all, but few people are concerned about the cost of using indexes. If the database administrator has a full understanding of the price required for indexing, it will not create indexes everywhere.

The cost of building an index is quite high after careful calculation. For example, it takes time and effort to create and maintain indexes. Especially when designing a database, the database administrator needs to create indexes for the fields in the table and conduct research and coordination. For example, when the records in a table with an index are increased, deleted, or modified, the database must re-adjust the index. Although this work database is automatically completed, it consumes server resources. The more data the table contains, the more resources it consumes. If an index is an actual object in a database, each index occupies a certain amount of physical space. If there are more indexes, it will not only occupy a large amount of physical space, but also affect the operation performance of the entire database.

It can be seen that if database administrators want to use indexes to improve system performance, they still need to pay a lot of costs. What database administrators need to consider now is how to achieve a balance between the two. Or, find a critical point of return and investment.

Iron 2: do not create an index for columns that are rarely involved in queries or columns with more repeated values.

During the query, if we do not query by a field, it is a waste to create an index on this field. If there is an employee information table, we may query employee information by employee ID, employee name, or place of origin. However, we often do not query by ID card number. Although this ID number is unique. At this time, even if an index is created on this field, the query speed cannot be improved. On the contrary, it increases system maintenance time and occupies system space. This is just the way to lift a stone and smash your feet.

In addition, some fields have repeated values in the preceding employee information table. For example, the gender field is mainly "male" and "female", and the position field is limited. In this case, adding indexes to these fields does not significantly increase the query speed and reduce the user response time. On the contrary, because space is required, the overall performance of the database is reduced.

The second rule in database index management is that you do not need to create an index for columns that are rarely involved in queries or columns with a large number of repeated values.

Iron Law 3: it is best to create an index for columns queried by range.

In the information management system, you often need to query certain transaction records by range. For example, in the ERP system, you often need to query the sales orders and shipment status of the current month, which requires querying transaction records by date range. If you find that the stock is incorrect, you also need to check the stock in and out in a certain period of time, such as the stock transaction situation from April 1 to April 3. In this case, the query is based on the date.

You must create an index for these data columns that need to be quickly or frequently queried within a specified range. Because the index has been sorted, the specified range is continuous when it is saved. You can use the index sorting to speed up the query time and reduce the user waiting time.

However, if you may need to query by range, it is better to use the index if the range query conditions are not used much. For example, in the employee information table, you may need to query the details of employees who joined the company before January March 2008 to add benefits to them. However, because there are not many records in the table and similar queries are rarely performed. If the dimension field is used to create an index, although it is harmless, it is obvious that the benefit of the index is lower than the cost. For database administrators, it is not worth the candle.

Furthermore, if range query is used, it is best to use the TOP keyword to limit the results of a query. For example, only the first 500 records are displayed in sequence for the first time. Using TOP keywords with ranges can greatly improve the query efficiency.

Iron 4: If a table has a primary key or a foreign key, you must create an index for it.

To define an index column with a primary key, you must create an index for it. Because the primary key can be accelerated to a row in the table. Combined with the functions of indexes, the query speed can be doubled. For example, in the employee information table, we usually set employee numbers as primary keys. This not only improves the query speed, but also ensures the uniqueness of employee numbers because the primary key requires the record to be unique. In this case, if the employee ID field is set as an index, the employee ID is used to query the employee information, which is much more efficient than the absence of an index.

In addition, to make the value of a field unique, you can use two indexing methods. One is the primary key index mentioned above. The UNIQUE index is used to specify the uniqueness of the field content using the UNIQUE keyword. Both methods automatically create a unique index on the specified column in the table. There is no obvious difference between the two methods. The query optimizer does not distinguish which method is used to create a unique index, and they perform data queries in the same way.

If a data column in a table has a foreign key defined, it is best to create an index for this field. Because the foreign key is mainly used to query connections between tables. If an index is created on the foreign key, the connection query between the table and the table can be accelerated. For example, in the employee basic information table, there is a field for employee positions. Due to frequent changes in employee positions, the stored code is actually the code of an employee's position. The information about the position is detailed in another job information table. In this case, the employee position field is a foreign key. If a foreign key is created for this field, the connection speed of the two tables can be significantly improved. In addition, the more records, the more obvious the effect.

Therefore, when a table has a foreign key or primary key, it is best to create an index for it. Through indexes, the primary key and foreign key can be enhanced to improve database performance.

Iron V: do not create indexes for some special data types.

In the table, some fields are special. Such as text fields (txt) and Image Fields. If the fields in the table belong to these data types, it is best not to create an index for them. These fields share some common characteristics. If you are not sure about the length, either it is long, a few characters, or it is a null string. For example, the text data type is often used for remarks in the database table of the application system. Sometimes the remarks are long, but sometimes there is no data. If an index is created for a field of this type, it does not work. On the contrary, it increases the burden on the system.

Therefore, exercise caution when creating indexes for some special data types. In general, there is no need to create an index for it. However, there are also special cases. For example, in the ERP system, there is a product information table with a product specification field. Sometimes, the length may be up to 5000 characters. In this case, only the text data type can accommodate such a large amount of data. In addition, you also like to query product information through the specification parameter. If you do not create an index for this field, the query speed will be slow. In this case, the database administrator only sacrifices a little system resources to create an index for it.

It can also be seen from the above that, although the above several rules are true, whether or not the database administrator needs to make a reasonable choice based on the actual situation of the enterprise.

Iron 6: indexes can be integrated with the set of where statements.

When querying information, you may frequently use some limit statements. For example, when querying a sales order, the condition set of the customer and the order date is often used. For example, when querying the inventory transaction of a product, the product number and the condition set of the start and end dates of the transaction are used.

For these data columns that are frequently used in the Where clause, the index is created in the Set process of the Where clause. For data columns that require acceleration or frequent retrieval, these data columns that frequently participate in the query can be queried by index sorting to speed up the query time.

In short, indexing is like a double-edged sword, that is, it can improve the performance of the database and may play a negative role in the performance of the database. As a database administrator, you must have the ability to create appropriate indexes at the right time, right businesses, and right fields. The above six rules are just some basic requirements for index creation.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Database index creation principles

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support