MySQL database indexing principle in layman's

Last Update:2018-08-06 Source: Internet

Author: User

Tags create index

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article transferred from: https://www.cnblogs.com/aspwebchh/p/6652855.html

Some time ago, the company a new online site appeared slow response to the problem, a responsible for the project but not technical sister to find me, let me think of ways to improve the speed of the site, because there are a lot of users to complain. My first reaction to the problem is the database, pretending to think about a bit, posing a deep cool appearance, said: "Is not a database query on the problem, add index to the table", and then sister came to a sentence: "Now our website access is too large, index may lead to write data performance degradation, impact users use." At that time, I was stunned, a kind of forced to be exposed to the feeling, in their professional field incredibly by non-professional classmate education, face really a bit Shing.

In fact, I say this example is not to show our colleagues of the company's strong professional ability, do the product bar, high security, performance, and even non-technical colleagues know the technical details. In fact, I just wanted to explain, "database" and " Database Index "These two things are the most widely used in the server-side development of the two concepts, skilled use of database and database index is the necessary skills for developers to survive in the industry, and all day and technical personnel dealing with the non-technical staff, because of the long-term, nature can speak a good and clear.

Using an index is simple, as long as you can write the statement that creates the table, you can certainly write the statement that creates the index, knowing that there is no server-side programmer in the world that does not create the table. However, it is one thing to use an index, and it is another matter to understand the index principle well enough to use the index, which is entirely a two-day difference (I haven't reached this level myself yet). A large part of the programmer's knowledge of indexes is limited to the notion that indexing makes queries faster.

Why do I have to add a primary key to my table?
Why do indexes make queries faster when indexed?
Why does indexing make writing, modifying, and deleting slow?
Under what circumstances should indexes be indexed on two fields at the same time?

They may not necessarily be able to say the answers to these questions. What are the benefits of knowing the answers to these questions? If the development of the application using only 10,000 of the database table data, then understand and do not understand the real difference, however, if the development of the application has hundreds of million or even billions of levels of data, then do not understand the principle of the index, write out the program will simply not run, like if the truck loaded with a car engine, Can this van still pull the goods?

The next step is to explain some of the questions raised above, hoping to help readers.

The index is described in many articles on the web, such as "Index is like a book directory, through the catalogue of books on the accurate positioning to the specific content of the book", this sentence description is very correct, but like to take off the pants fart, said, and did not say, through the directory to find the content of the book is naturally more than a page by page of the book to find faster, The same people who use the index are not aware that it is faster to navigate to the data by index than to a single query, or why they are indexed.

To understand the principle of indexing it must be clear that a data structure is "balanced tree" (non-binary), that is, B tree or a + + tree, the important thing to say three times: "Balance tree, balance tree, Balance tree". Of course, some databases also use hash bucket to index the data structure, however, the mainstream RDBMS is the balance tree as the default index data structure of the table.

We usually build tables with a primary key, and in some relational databases, if you do not specify a primary key when you are building a table, the database rejects the statement execution that built the table. In fact, a table with a primary key cannot be called a "table". A table without a primary key, its data is placed in a disorderly place on the disk storage, a line of the arrangement is very neat, and I recognize the "table" very close. If a primary key is given to the table, the storage structure of the table on disk is transformed from a neatly arranged structure into a tree structure, that is, the "balanced tree" structure, in other words, the entire table becomes an index. Yes, again, the whole table becomes an index, the so-called "clustered index." This is why a table can have only one primary key, and a table can have only one "clustered index", because the primary key is to convert the "table" data format into the "index (Balanced Tree)" format.

Is the structure diagram of the table with the primary key (the clustered index). The picture is not very good and will be looked at. The data for all nodes (except the bottom) of the tree is made up of the data in the primary key field, which is usually the ID field of the primary key we specify. The bottom part is the data in the real table. If we execute an SQL statement:

SELECT * FROM table where id = 1256;

First, the index is positioned at the leaf node where the value is 1256, and then the data row with ID equal to 1256 is taken from the leaf node. The details of the operation of the balance tree are not explained here, but as you can see, there are three layers in the tree, from the root node to the leaf node only three times to find the results. Such as

If a table has 100 million data, need to find one of the data, according to the general logic, one by one to match, the worst case needs to match 100 million times to get results, with the Big O notation is O (n) The worst time complexity, which is unacceptable, And these 100 million data can obviously not be read into memory for the program to use at once, so, these 100 million matches in the case without cache optimization is 100 million IO cost, to the current disk IO capability and CPU computing power, it may take several months to produce results. If the table is converted into a balanced tree structure (a very lush and node-like tree), assuming that the tree has 10 layers, then only 10 io overhead can be found to find the required data, speed at an exponential level, with the large O notation is O (log n), n is the total tree records, the base is the number of branches of the tree, The result is the number of levels of the tree. In other words, the number of lookups is based on the number of branches of the tree, the logarithm of the total number of records, expressed by a formula is

The program to represent is Math.Log (100000000,10), 100000000 is the number of records, 10 is the number of branches of the tree (the real environment under the number of forks more than 10), the result is the number of searches, the results here from billion to single-digit. Therefore, the use of indexes can make database queries surprisingly performance-enhancing.

However, there are two sides of things, index can make the database query data speed up, and make the speed of writing data decline, because it is very simple, because the structure of the balance tree must be maintained in a correct state, adding and deleting the data will change the balance tree node in the index data content, destroy the tree structure, therefore, Each time the data changes, the DBMS has to re-comb the structure of the tree (index) to ensure that it is correct, which can result in a small performance overhead, which is why the index causes side effects for operations other than queries.

After you finish the clustered index, let's talk about the nonclustered index, which is the regular index we usually mention and use.

Nonclustered indexes, like clustered indexes, are also data structures that use a balanced tree as an index. The values of each node in the index tree structure are derived from the indexed fields in the table, and if the name field of the user table is indexed, the index is made up of the values in the Name field, and the DBMS needs to maintain the correctness of the index structure as the data changes. If you index multiple fields in a table, there will be multiple independent index structures, and each index (nonclustered index) has no association with each other. Such as

Each time a new index is created for the field, the data in the field is copied and used to generate the index. Therefore, adding an index to a table increases the size of the table and consumes disk storage space.

The difference between a nonclustered index and a clustered index is that a clustered index allows you to find the data you need to look up, while a nonclustered index allows you to find the primary key value for the record, and then uses the value of the primary key to locate the data you want through the clustered index, such as

Regardless of whether the table is queried in any way, the primary key is used to locate the data through the clustered index, and the clustered index (primary key) is the only path to the real data.

However, there is an exception that can be used without a clustered index to query out the required data, this non-mainstream method is called "Overwrite index" query, which is usually called composite index or multi-field index query. The article above has pointed out that when the field is indexed, the contents of the field will be synchronized to the index, if you specify two fields for an index, then the contents of this two field will be synchronized to the index.

Let's look at the following SQL statement

Build an index

Create INDEX Index_birthday on user_info (birthday);

Search for birthdays on November 1, 1991 user name of the person who was born

Select user_name from user_info where birthday = ' 1991-11-1 '

This SQL statement is executed as follows

First, find the primary key ID value for all records that birthday equals 1991-11-1 through a nonclustered index Index_birthday

The clustered index lookup is then performed by the resulting primary key ID value, and the location of the real data (data row) stored by the primary key ID value pair is found

Finally, the value of the User_name field is returned from the actual data obtained, that is, the final result is obtained.

We changed the index on the birthday field to a double-field overlay index

Create INDEX Index_birthday_and_user_name on User_info (birthday, user_name);

The execution of this SQL statement becomes

Find the contents of a leaf node that birthday equals 1991-11-1 through a nonclustered index index_birthday_and_user_name, however, the value of the User_name field is in addition to the value of the USER_NAME table primary key ID in the leaf node , so it is not necessary to get the user_name value of the leaf node directly by the real location of the lookup data row of the primary key ID value. With this way of overwriting the index directly, you can omit the next two steps that do not use the Overwrite index lookup, which greatly improves the query performance, such as

The general workings of the database index are as described in the article, but the details may be slightly skewed, but this will not have an impact on the results of the conceptual elaboration.

MySQL database Indexing principle (GO)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More