Why are database index queries fast

Source: Internet
Author: User
Tags create index

I. Benefits of using indexes

Creating an index can greatly improve the performance of your system. First, by creating a unique index, you can guarantee the uniqueness of each row of data in a database table. Second, it can greatly speed up the retrieval of data, which is the main reason for creating indexes. Thirdly, the connection between tables and tables can be accelerated, particularly in terms of achieving referential integrity of the data. Finally, when using grouping and sorting clauses for data retrieval, you can also significantly reduce the time to group and sort in queries. By using the index, we can improve the performance of the system by using the optimized hidden device in the process of querying.

Second, the principle of the index

The data is stored as a block on disk. To ensure atomicity of the disk operation, all data blocks are accessed when the data is accessed. These blocks of data on disk are similar to linked lists, where they all contain a data segment and a pointer pointing to the memory address of the next node (the data block), and none of them require continuous storage (that is, logically contiguous chunks can be physically far apart).

Since many records can only be sorted by one field, to query an unsorted field, you need to use a linear lookup, that is, to access N/2 blocks of data, where n refers to all the blocks of data that a table covers. If the field is a non-key field (that is, does not contain a unique value), then search the entire tablespace, that is, to access all n blocks of data.

However, for a sorted field, you can use a binary lookup so that you access only log2 n blocks of data. Similarly, for a non-key field that has already been ordered, it is no longer necessary to search for additional chunks of data in the table as long as a larger value is found. As a result, there is a substantial improvement in performance.

First, consider the pattern of a sample database table:

Field name              Data type         The size on the diskId(Primary Key)  unsigned INT 4 firstname char  (50)  50   byte lastname char (50  50  bytes Span class= "PLN" >emailaddress char (100 100  byte       

Note: This char is used instead of varchar to accurately describe the size of the disk that the data occupies. The sample database contains 5 million rows of records, and no indexes are established. Next we analyze two queries against this table: One query is used id (the sorted key field), and the other query uses firstName (unsorted non-key fields).

Example Analysis One

For this sample database with R = 5 000 000 records, allocate a fixed storage space of R = 204 bytes per record on disk. This table is saved in the MyISAM database, and the default database block size for this database is B = 1024 bytes. Thus, we can calculate that the block factor for this table is BFR = (B/R) = 1024/204 = 5, that is, each block of data on the disk holds 5 records. So, the number of blocks required to save the entire table is N = (r/bfr) = 5000000/5 = 1 000 000.

Use a linear lookup search ID field-This field is the key field (the value of each field is unique) and requires access to N/2 = 500 000 data blocks to find the target value. However, because this field is sorted, you can use the binary lookup method, which only requires access to log2 1000000 = 19.93 = 20 blocks. Obviously, this will bring about a great improvement in performance.

Look again at the FirstName field, this field is unsorted, so it is not possible to use binary lookup, and the value of this field is not unique, so to find the end from the beginning of the table, that is, to access N = 1 000 000 data blocks. This situation can be improved by building an index.

If an index record contains only an indexed field and a pointer to the original record, it is definitely smaller than the record it points to that contains more fields. That is, the index itself consumes less disk space than the original table, so the number of chunks that need to be traversed is less than the original table. The following is the pattern for the FirstName field index:

字段名         数据类型        在磁盘上的大小firstName Char(50) 50 字节(记录指针) Special 4 字节

Note: In MySQL, depending on the size of the table, the size of the pointer may be 2, 3, 4, or 5 bytes.

Example Analysis Two

For this sample database with R = 5 000 000 records, each index record consumes R = 54 bytes of disk space, and the same default chunk size B = 1024 Bytes is used. Then the block factor of the index is BFR = (B/R) = 1024/54 = 18. Finally, the index of this table needs to occupy N = (r/bfr) = 5000000/18 = 277 778 data blocks.

You can now use the index to improve performance by searching the FirstName field again. Using a binary lookup for an index requires access to log2 277778 = 18.09 = 19 data blocks. Plus access to a block of data to find the address of the actual record, with a total access to 19 + 1 = 20 blocks, which is a far cry from the fact that searching for unindexed tables requires access to 277 778 blocks.

Third, when to build the index

Indexes are built on top of some columns in a database table. Therefore, when you create an index, you should carefully consider which columns you can create an index on, and on which columns you cannot create an index. In general, you should create indexes on these columns, such as: on columns that are often searched, you can speed up the search, enforce the uniqueness of the column on the column that is the primary key, and arrange the structure of the data in the organization table; These columns are mostly foreign keys, which can speed up the connection. Create an index on a column that often needs to be searched by scope, because the index is sorted, its specified range is contiguous, and the index is created on columns that are often ordered, because the index is sorted so that the query can take advantage of the sorting of the index to speed up the sort query time To speed up the judgment of a condition by creating an index on a column that is often used in the WHERE clause.
Similarly, indexes should not be created for some columns. In general, these columns that should not be indexed have the following characteristics: first, you should not create an index for columns that are seldom used or referenced in a query. This is because, since these columns are seldom used, they are indexed or non-indexed and do not improve query speed. Conversely, by increasing the index, it reduces the system maintenance speed and increases the space requirement. Second, you should not increase the index for columns that have only a few data values. This is because, because these columns have very few values, such as the gender column of the personnel table, in the results of the query, the data rows of the result set occupy a large proportion of the data rows in the table, that is, the data rows that need to be searched in the table are large. Increasing the index does not significantly speed up the retrieval. Third, for those columns defined as text, the image and bit data types should not be indexed. This is because the amount of data in these columns is either quite large or has very little value. The index should not be created when the performance of the modification is far greater than the retrieval performance. This is because modifying performance and retrieving performance are conflicting. When you increase the index, the retrieval performance is improved, but the performance of the modification is reduced. When you reduce the index, you increase the performance of the modification and reduce the retrieval performance. Therefore, you should not create an index when the performance of the modification is far greater than the retrieval performance.

Iv. Common types of indexes and methods of creation

This is the most basic index and it has no limitations. It is created in the following ways:

Create an index

CREATE INDEX indexname on mytable (username (length)); If it is a Char,varchar type, length can be less than the actual length of the field, and if it is a blob and text type, length must be specified.

Modify Table Structure

ALTER mytable ADD INDEX [IndexName] on (username (length)) When creating a table, specify directly

CREATE TABLE mytable (ID INT NOT NULL, username VARCHAR (+) not NULL, INDEX [IndexName] (username (length)));

Syntax for dropping an index:

DROP INDEX [IndexName] on mytable;

(2) Unique index

It is similar to the previous normal index, except that the value of the indexed column must be unique, but it allows for a null value. If it is a composite index, the combination of column values must be unique. It is created in the following ways:

Create an index

CREATE UNIQUE INDEX indexname on mytable (username (length)) Modify table structure

ALTER mytable ADD UNIQUE [IndexName] on (username (length)) When creating a table, specify directly

CREATE TABLE mytable (ID INT NOT NULL, username VARCHAR (+) not NULL, UNIQUE [IndexName] (username (length)));

(3) Primary key index

It is a special unique index and is not allowed to have null values. The primary key index is typically created at the same time as the table is built:

CREATE TABLE mytable (ID INT not NULL, username VARCHAR (+) NOT NULL, PRIMARY KEY (id)); Of course, you can also use the ALTER command. Remember: A table can have only one primary key.

(4) Combined index

To visually compare single-column and composite indexes, add multiple fields to the table:

CREATE TABLE mytable (ID int NOT NULL, username varchar (+) NOT NULL, City VARCHAR (+) NOT NULL, age INT NOT NULL  ); To further extract the efficiency of MySQL, it is necessary to consider building a composite index. is to build name, city, and age into an index:

ALTER TABLE mytable ADD INDEX name_city_age (name (ten), city,age); When the table is built, the usernname length is 16, which is used here in 10. This is because, in general, the length of the name does not exceed 10, which speeds up the index query, reduces the size of the index file, and increases the update speed of the insert.

If you set up a single-column index on Usernname,city,age, so that the table has 3 single-column indexes, the efficiency of the query and the combined index above is very different, much lower than our combined index. Although there are three indexes at this point, MySQL can only use one of the single-column indexes that it considers to be the most efficient.

The establishment of such a composite index, in fact, is equivalent to the following three sets of composite indexes:

Usernname,city,age usernname,city Usernname Why not city,age such a combination index? This is because the MySQL composite index is the result of the "leftmost prefix". The simple understanding is only from the left to the beginning of the combination. Not as long as the combined index is used for queries that contain these three columns, the following SQL uses this combined index:

SELECT * FROM MyTable whree username= "admin" and city= "Zhengzhou" select * FROM MyTable whree username= "admin" and the next few will not be used:

SELECT * FROM MyTable whree age=20 and city= "Zhengzhou" select * FROM MyTable whree city= "Zhengzhou"

Why are database index queries fast

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.