SQL Application and Development: (ix) index to improve efficiency

Last Update:2015-05-23 Source: Internet

Author: User

Tags sorts

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the database, the index is a special object, it is a database structure that can speed up the data retrieval, it can quickly find the necessary content from the large amount of data, make the data query without retrieving the whole database. An index is an object that is based on data in a table, and unlike views, indexes need to occupy physical storage. Using the index of the database allows us to query the data more quickly.

1. Introduction

An index is a way of representing data that does not provide the order in which data is physically stored on disk. The index is established based on one or more column combinations of tables, and the physical location of the records is rearranged within the table. When an index is used, the data is provided to the user in a categorical sort order that can be controlled with the CREATE INDEX statement. In general, you can significantly improve database performance by indexing on the correct columns, especially if you are indexing on join columns between tables.

The index and the book directory function similarly. If the data in the data table is considered as the content of the book, then the index is the directory of the book. The directory of the book points to the contents of the book (by page number). Similarly, an index is a key value for a table that provides a pointer to a row in a table. The page numbers in the catalog are the direct paths to the contents of the book, and the index is the direct path to the table data, which makes it more efficient to access the data.

In a database diagram, you can create, edit, or delete each index type in an index/key property page for the selected table. When you save a table that is attached to this index or a database diagram that contains this table, the index is also saved.

1.1 Advantages

An index is a separate, physical database structure that enables database users to quickly access specific information in a database table.

Typically, you need to create an index on a table only when you frequently query the data in an indexed column. The index consumes disk space and reduces the speed at which rows are added, deleted, and updated. In most cases, however, the benefits of the data retrieval data from the index greatly outweigh its shortcomings. However, if your application frequently updates data, or if disk space is limited, it is best not to limit the indexed data.

In general, an index is a storage structure created to speed up retrieval. An index is established for a table, which consists of index pages that contain data pages. The rows in each index page contain logical pointers that allow you to retrieve data directly, which accelerates the retrieval of physical data.

Whether to create indexes on the columns in a table and what indexes to create will have a significant impact on the retrieval speed of the data. Columns that have been indexed can respond to queries almost immediately, whereas column queries that do not create indexes will need to wait a long time. Because columns that do not have an index created need to be searched line by row, the time spent in this index is directly proportional to the amount of data in the table.

Querying with an index has the following advantages:

Improve query speed : If there are no indexes in a table, the database forces a row-by-line search in the order of the table to find rows that meet the criteria, and it can be obvious that it takes a long time to access each row of the table. However, if the location information in the table to be queried determines the rows in the table, then the query time is shortened;

increase the execution speed of the connection, ORDER by, and group by: The connection, order by, and group by both require retrieval of the data and, if indexed in the table, the speed of the connection, order by, and group by execution is greatly increased;

The Query Optimization Analyzer relies on the index to work : During the execution of the query, the database automatically optimizes the query. Since the optimization of the database depends on the index, after indexing, the database will be indexed according to the index established, which index to make the index fastest;

enforce uniqueness of rows : Creating a unique index guarantees that the data in the table is not duplicated.

In summary, indexes can improve data query performance, but this is a cost to pay. Indexed data tables occupy more storage space in the database. In addition, in order to maintain the index, the command to insert, UPDATE, and delete the data takes longer. When designing and creating indexes, you should ensure that performance is increased at the expense of storage space and processing resources.

1.2 Precautions

Based on the needs of the environment, users can set up one or more indexes on the base table to provide multiple access methods and speed up data retrieval. Typically, establishing and deleting an index is done by the database administrator or the owner of the table. The system automatically chooses the appropriate index as the access path when accessing the data, and the user does not have to select the index.

The index is a double-edged sword. Although it accelerates the SELECT statement, it may also slow down the changes made to the data because the indexes in SQL must be dynamically updated at run time. Whenever the data in the table changes, all indexes in the table must change, and if this causes a reorganization of the index, a delay may occur when the index reorganization occurs.

The main factors to consider when creating an index are:

I. A table if creating a large number of indexes can affect the performance of the insert, UPDATE, and Delect statements, because when the data in the table changes, all indexes need to be adjusted appropriately. On the other hand, a large number of indexes can improve performance for queries that do not need to modify data (SELECT statements). Because the Query Analyzer has more indexes to choose from, to determine the best way to access the data at the fastest speed;

Ii. usually we query the table by primary key, so we should first consider indexing on the primary key column. In addition, frequently used columns in the connection (including foreign keys) should be considered as an index-setting option;

Iii. indexing a small table may not have an optimization effect, because Query Analyzer may take longer to traverse the index to search for data than a simple table scan.

In summary, it takes a certain amount of overhead to make an index, and maintaining the index also takes time and storage space when inserting or updating the data using INSERT or update. Therefore, it is not necessary to index all the columns in the table.

2. Type

Depending on the order of the indexes and whether the database is physically stored in the same order, the indexes can be divided into two categories: Clustered indexes (Clustered index) and nonclustered indexes (non-clustered index). Both clustered and nonclustered indexes are created using the B-TREE structure and include index pages and data pages, where index pages are used to hold indexes and pointers to the next level, and data pages are used to hold records.

Depending on the composition of the index key, the index can be divided into unique indexes and combined indexes. Unique index ensures that indexed columns do not contain duplicate values, which are indexes that use multiple columns in a table to index data. In addition, there are other index types due to the different SQL environments. For example, bitmap indexes and function indexes in Oracle.

Once the index is created, it is automatically managed and maintained by the database. For example, when you insert, UPDATE, or delete a record into a table, the database automatically makes the appropriate modifications in the index. When writing SQL query statements, tables with indexes that do not have an indexed table have no difference. An index simply provides a way to quickly access a specified record. A data table has an index that affects only the query speed of the table, without affecting any other aspect.

2.1b-tree Index

The top node of the B-tree index is called the root node, and the underlying node is called the leaf node, and the node between the root node and the leaf node is called the intermediate node (intermediate nodes). The page links in each level index are in the two-way link list. The B-TREE data structure starts with the root node, arranges the information in a balanced way, and can be divided into many layers as needed, and the B-tree index can easily provide free space for the update.

For the overall structure of the B-tree chart:

Since each leaf node has a linear order according to the index code values contained, it is possible to use the pointer pn of each node to link the leaf nodes together in the order of index code values. This order efficiently processes the files sequentially, while the other structures of the B-tree index can efficiently and randomly manipulate the files.

2.2 Clustered Index

The clustered index sorts the key values of the data rows within the table and stores the corresponding data records so that the physical order of the tables is consistent with the index order. A table can contain only one clustered index. If it is not a clustered index, the physical order of the rows in the table and the logical order of the key values do not match.

The b-tree of a clustered index is built from the bottom up, and a data page (the leaf node of the index page) contains a single record, and the index page of an intermediate node is generated by multiple data pages. The index page of the top-level root node is then synthesized by multiple intermediate node-indexed pages, which are combined to generate the topmost index page.

Data structure for the clustered index:

When you create a clustered index, the table is copied, the data in the table is sorted, and the original table is deleted. Therefore, there must be enough free space on the database to hold the copy of the data.

When creating a clustered index, you should be aware of the following:

· Most tables should have clustered indexes or use partitions to reduce contention for the end of the table, and in a highly transactional environment, blocking the last page will severely affect the throughput of the system.

· Under the clustered index, the data is physically sorted on the data page, and duplicate values are also grouped, so that when you include a scope check (between, <, <=, >, >=) or use a group by and order by query, Once a row with the first key value in the range is found, the rows with subsequent index values are guaranteed to be physically contiguous together without further searching. This avoids a wide range of scans and improves query speed.

· When you build a clustered index on a table that has frequent insert operations, do not place the index on a monotonically rising merit column (such as identity), or it will often cause a blocking conflict.

· Do not include frequently modified columns in the Poly index because the data rows must be moved to a new location after the code value has been modified.

· Select the clustered index should be based on the WHERE clause and the type of the join operation.

The candidate key columns for a clustered index are primary key columns that are used in the WHERE clause and can be randomly inserted, columns accessed by scope, columns used in group by or order by, columns that are not frequently modified, and columns used in connection operations.

2.3 Nonclustered indexes

A nonclustered index is similar to a book's index. One location of the data store where the index is stored in another location with the index pointing to the storage location of the data. The items in the index are stored in the order of the index key values, and the information in the table is stored in a different order (this can be determined by the clustered index). If you do not create a clustered index in the table, you cannot guarantee that the rows will have any particular order.

A nonclustered index has the same b-tree structure as a clustered index, but it has two significant differences from a clustered index:

· Data rows are not sorted and stored in the order of the nonclustered index keys.

· The page layer of a nonclustered index does not contain a data page, whereas a leaf node contains an index row. Each index row contains a nonclustered key value and one or more row locators that point to the data row that has the key value (or multiple rows if the index is not unique).

Nonclustered indexes have a structure that is completely independent of the data rows. The lowest row of a nonclustered index contains the key values of the nonclustered indexes, and each key-value entry has a pointer to the data row that contains the key value. Data rows are not stored in the order based on the nonclustered key.

Data structures for nonclustered indexes:

You should be aware of the following issues when working with nonclustered indexes:

· How much space the index needs to use;

· Whether the indexed columns are stable;

· How the index key is selected, and whether the scan effect is better;

· Whether there are many duplicate values.

We often use nonclustered indexes in the following situations: columns commonly used in aggregate functions often use the columns of the Join,order by and GROUP BY clauses.

2.4 Unique indexes and combined indexes

The primary key is used to identify a unique row, which defines a relationship of two tables, which is used to build the index of one or a set of columns. Depending on the composition of the index key, you can divide the index into three types: a unique index, a combined index, and an overlay index.

2.4.1 Unique Index

A unique index does not allow two rows to have the same index value. In the case of a multiple-column unique index, the index ensures that each value combination in the indexed column is unique. If duplicate key values exist in the existing data, most databases do not allow the newly created unique index to be saved with the table. The database also rejects this data when the new data duplicates the key values in the table. Defining a primary key for a table in a database diagram automatically creates a primary key index, which is a special type of unique index. The primary key index requires that each value in the primary key be unique. When a primary key index is used in a query, it also allows for fast access to the data.

In a database table, if there are more than one row in a single column that contains null values, you cannot create a unique index on that column. In a combination of columns, if more than one column contains a null value, these null values are treated as duplicate values. Therefore, you cannot create a unique index on more than one column.

2.4.2 Combined index and overlay index

As the name implies, a composite index is an index that uses more than one column to index data in a table, and it is created by connecting two or more column values.

The combined index should be used when the index of a 2~16 column can be used in a composite index, preferably when two or more columns are searched as one unit, or when many queries refer only to the columns specified in the index. All columns in a composite index must be in the same table, and the combined index value allows a maximum size of 900 bytes, which means that the total length of the fixed-size column that makes up the combined index must not exceed 900 bytes.

When all the required information is included in the index, this index is called the Overwrite index. Its key value contains all the data that satisfies the query.

2.5 Other Index types

Multiple types of indexes can be created in each version of the database system to suit the characteristics of various data tables and meet the needs of different situations. Here's a quick example:

Bitmap index

The bitmap index does not store index information in a way that is worthwhile in indexed columns. The bitmap index establishes a bitmap for each unique index column, where a single unit (Bit, 0 or 1) is used to correspond to the column values of a record. If the bit is 1, the column value corresponding to the bit is a record that contains the indexed column of the bitmap. The bit-to-column worth correspondence is done through the mapping function in the bitmap index.

This allows the bitmap index to perform the same functions as the B-tree index in a completely different internal mechanism. If the range of indexed columns includes only a few fixed values, the bitmap index is efficient and saves storage space.

3. Create

Create an index to know which indexes bring more benefits than overhead. The most basic commands for creating an index are:

<span style= "Font-size:18px;color: #000099;" ><strong>create INDEX Index_nameon table_name (column_name1,column_name2,...) </strong></span>

As in the previous cases, the syntax of the CREATE index statement differs greatly from one database system to another. However, all database systems support these basic commands.

3.1 Composite Index

A composite index refers to an index that contains more than one column, also known as a composite index. You can combine up to 16 columns in an index in SQL Server.

For example, in the database "Jewelry sales System", the city is located in Beijing, the jeweler information. Because a jeweler's city has multiple occurrences of a column value, the selectivity of the column is low, and if a single column index is created in that column, then the gas query performance is not very high, and note that each value in the jeweler Number column is a unique value. To improve the selectivity of the index, you can create a composite index that combines the city and jeweler number of the jeweler. The code is as follows:

<span style= "Font-size:18px;color: #000099;" ><strong>create INDEX  City _indexon Jeweler information (jeweler's city, jeweler number) </strong></span>

3.2 Unique index and primary key index

A unique index ensures that the column does not contain duplicate values. When you add data to a table that has a unique index, the database checks whether the newly added row has duplicate values in the column. If the newly added row has duplicate values in a unique index column with other rows that already exist, the database displays an error and the input operation is undone.

Specifying a unique index makes sense only when uniqueness is an attribute of the data itself. For example, when setting up a database for a student management system, we do not want to create a unique index for students ' names, as there may be students with the same name. The right thing to do is to create a number for each student and create a unique index on the school number field.

The method of creating a unique index is as simple as using the unique keyword in CREATE INDEX, which makes up a unique index of conformity.

For example, in the "Sales Management system" operator Information Table stipulates: Operators with the same name must set the password, to ensure the security of the system. Then you should establish a unique compliance index on the password and operator columns:

<span style= "Font-size:18px;color: #000099;" ><strong>create UNIQUE INDEX operator Safety _indexon operator information (operator name, password) </strong></span>

Now add a new operator information to the data table operator information, and the newly added operator information is the duplicate data in the table.

<span style= "Font-size:18px;color: #000099;" ><strong></strong></span><pre name= "code" class= "SQL" >insert into operator information (operator number, operator name, Password) VALUES (1009, ' Zhouhuifang ', ' 9698xie ')

executing the preceding statement should receive an error message that the user is not allowed to use the Insert command. This is because the data added by the Insert directive violates the "operator safety _index" index. Modifies the password value added by the Insert directive.

<strong><span style= "color: #000099;" >insert into operator information (operator number, operator name, password) VALUES (1009, ' Zhouhuifang ', ' 9698 ') </span></strong>

A primary key index is a special type of unique index. Primary key indexes are created automatically when you define a primary key for a table. Creating a primary key in a table requires only a column definition before using the keyword primary key. A primary key index requires that each value in the primary key be unique. When a primary key index is used in a query, it also does not allow fast access to the data.

When a primary key or uniqueness constraint is defined on a table, if a standard index created using the CREATE INDEX statement is already in the table, the index created by the PRIMARY KEY constraint or uniqueness constraint overwrites the previously created standard index. That is, the index created by the primary key index or uniqueness constraint has a higher precedence than the index created with the CREATE INDEX statement.

3.3 Clustered and non-clustered indexes

When a table is used with a clustered index, the data is stored in the same block of data, which requires only a small number of database blocks to be read, resulting in faster execution performance. When a clustered index is used, the physical arrangement of the data in the table is modified. Using clustered indexes is usually faster than nonclustered indexes.

The column used to build the clustered index is typically the primary key. In SQL Server, the index is established by default as a nonclustered index. Unless you use the keyword clustered to specify a clustered index, you can of course use the keyword nonclustered to specify a nonclustered index.

For example, in the database sales management system, create a unique clustered index on the Vendor Name column of the Vendor information table.

<span style= "Font-size:18px;color: #000099;" ><strong>use Sales Management System gocreate UNIQUE CLUSTERED INDEX Vendor Name _indexon Vendor information (vendor name) go</strong></span>

When a clustered index is established for a data table, the database sorts the indexed columns. The keyword ASC or desc ensures the ascending or descending sort direction of an indexed column. The default setting is ASC in ascending order.

4. View and delete

In SQL Server, the view and deletion of indexes can be done in both the Enterprise Manager and the SQL language.

For the view of the index, you can practice a very basic operation in SQL Server, here is not more introduction.

When an index is no longer needed, it can be removed from the database to reclaim the storage space it is currently using. These reclaimed space can be used by any object in the database.

How the index is deleted is related to how the index was used when it was created. If the index was explicitly created using the CREATE INDEX statement, you can use the DROP INDEX statement to delete it. Its syntax format is:

<span style= "Font-size:18px;color: #000099;" ><strong>drop INDEX ' Table.index | View.index ' [,... n]</strong></span>

where tables and views represent the table or view in which the index resides, index indicates the name of the index to be deleted, and the indexed name must be an identifier-compliant rule; [,...] Indicates that the preceding item can be repeated n times.

For example, the following SQL statement is used to delete the index "commodity _index" in the "commodity information" of the Database "reseller".

<strong><span style= "Font-size:18px;color: #000099;" >use Dealer Godrop INDEX product information. Commodity _index</span></strong>

If you want to delete an index, but cannot determine whether the index exists, you can also add a simple judgment statement to judge first.

For example, delete the "Customer Information sheet _index" index in the "Customer information sheet" of the Database "reseller".

<span style= "Font-size:18px;color: #000099;" ><strong>use Reseller If EXISTS (SELECT name from sysindexes)          WHERE name= ' Customer information table _index '     DROP index Customer Information Sheet-Customer information sheet _indexgo</strong></span>

5. Learning Summary

As for the learning summary of this chapter of the index, I feel that I have learned a tool which is what we want-to improve efficiency. Especially when doing the project, the data is very large and can not be indexed, easy to find and use, very happy to learn and summarize the content of this chapter, the harvest is quite good, mainly or previously said, for the knowledge of SQL, practice A, it is worth, is willing to share with you!

SQL Application and Development: (ix) index to improve efficiency

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More