90% index optimization problems for programmers during interviews, 90% programmers

Source: Internet
Author: User

90% index optimization problems for programmers during interviews, 90% programmers

Preface

This article will share with you the index optimization that 90% programmers can use during interviews. The basic indexing principles and indexing creation principles are the key points, which are essential for interviews! You can collect a lot of understanding. Let's take a look at the detailed introduction.

The index can be divided into the following points (Technical article ):

  • Overview of indexes (What Are indexes, advantages and disadvantages of indexes)
  • Basic use of indexes (creating indexes)
  • Basic indexing principles (interview highlights)
  • Index data structure (B tree, hash)
  • The principle of creating an index (top priority, required for an interview! Please add to favorites !)
  • How to delete millions or more of data

I. Overview of Indexes

1) What is an index?

An index is a special file (an index on an InnoDB data table is an integral part of a tablespace) that contains reference pointers to all records in the data table. In more general terms, indexes are equivalent to directories. When you use the Xinhua Dictionary, you can tear up the directory. You can only query the idioms starting with a word from the first page to 1,000th pages. Tired! After returning the directory to you, you can quickly locate it!

2) Advantages and Disadvantages of indexes:

This can greatly speed up data retrieval, which is also the main reason for creating indexes ., By using indexes, you can use the optimizer during the query process to improve system performance. However, indexes also have disadvantages: indexes require additional maintenance costs. Because index files exist independently, data is added, modified, and deleted, additional operations on the index file will be generated. These operations require additional IO, which will reduce the efficiency of adding, modifying, and deleting files.

Ii. basic use of indexes (true technical text)

1) Create an index: (three methods)

Method 1:


Method 2: Use the alter table command to add an index:

Alter table is used to create a common index, a UNIQUE index, or a primary key index.


Table_name is the name of the table to be indexed, and column_list indicates which columns are indexed. When multiple columns are indexed, they are separated by commas.

You can name the index name index_name by yourself. MySQL assigns a name based on the first index column. In addition, alter table allows you to change multiple tables in a single statement, so you can create multiple indexes at the same time.

Method 3: Use the create index Command to CREATE

Create index can add normal or UNIQUE indexes to a table. (However, you cannot create a primary key index)


III. Basic principles of indexing (I don't want to talk as much space as other articles)

Indexes are used to quickly search for records with specific values. If no index exists, the entire table is traversed during query.

The indexing principle is simple, that is, to convert unordered data into ordered queries.

1. Sort the content of the columns that have created an index.

2. generate an inverted table for the sorting result

3. splice the data address chain on the inverted table content

4. When querying, get the inverted table content first, and then retrieve the data address chain to get the specific data.

Iv. index data structure (B tree, hash)

1) B-Tree Index

Mysql extracts data through the storage engine. Basically, 90% of users use InnoDB. According to the implementation method, there are currently only two InnoDB Index types: BTREE (B tree) index and HASH index. B-tree indexes are the most frequently used index types in Mysql databases. Basically all storage engines Support B-tree indexes. Normally, the index we mentioned does not accidentally refer to the (B-tree) index (actually implemented using B +-tree, because mysql always prints BTREE when viewing table indexes, therefore, it is referred to as B-tree index)


Query Method:

Primary Key Index Area: PI (the address of the data stored in Association) is queried by the primary key,

Common Index Area: si (the address of the associated id, and then the address above ). Therefore, you can query data by primary key at the fastest speed.

B + tree properties:

1.) n subtree nodes contain n keywords. Instead of storing data, they store data indexes.

2) All leaf nodes contain information about all the keywords and pointers to records containing these keywords, and the leaf nodes themselves are connected in a small and large order based on the size of the keywords.

3.) All non-terminal nodes can be regarded as index parts. The nodes only contain the maximum (or minimum) keywords in their Subtrees.

4) in the B + tree, data objects are inserted and deleted only on leaf nodes.

5) The B + tree has two head pointers. One is the root node of the tree and the other is the leaf node with the minimum key code.

2) hash index (good technical knowledge)

To put it simply, like a HASH table (HASH list) that is simply implemented in a data structure, when we use HASH indexes in mysql, it mainly uses Hash algorithms (Common Hash algorithms include direct addressing, square sampling, folding, Division remainder, and random number ), convert the database field data to a fixed-length Hash value, and store the row pointer of the data to the corresponding location of the Hash table. If a Hash collision occurs (the Hash values of two different keywords are the same ), the data is stored as a linked list under the corresponding Hash key. Of course, this is just a brief simulation diagram.


Ps: if you are interested in the data structure, follow me and view the [data structure] topic. I will not explain it in detail here.

5. Principles for creating indexes (top priority)

Although the index is good, it is not unrestricted to use. It is best to comply with the following principles:

1) The leftmost prefix matching principle is very important for composite indexes. mysql always matches the right until it encounters a Range Query (>, <, between, like) to stop matching, for example, if a = 1 and B = 2 and c> 3 and d = 4 is created for an index in the order of (a, B, c, d), d cannot use the index, if an index (a, B, d, c) is created, the order of a, B, and d can be adjusted as needed.

2) index creation is performed only when fields that are frequently used as query Conditions

3) frequent update fields are not suitable for index creation

4) if columns that cannot effectively distinguish data are not suitable for index columns (such as gender and unknown to men and women, there are up to three types of columns, and the degree of discrimination is too low)

5) try to expand the index and do not create a new index. For example, if the table already has an index of a and now you want to add an index of (a, B), you only need to modify the original index.

6) indexes must be created for data columns with foreign keys.

7) for columns that are rarely involved in queries, do not create indexes for columns with more repeated values.

8) do not create indexes for columns of data types defined as text, image, and bit.

How to delete millions or more of data (this is a good technical article)

About indexes: indexes require additional maintenance costs. Because index files exist independently, when we add, modify, and delete data, additional operations on the index file will be generated. These operations require additional IO, which will reduce the efficiency of adding, modifying, and deleting files. Therefore, when we delete millions of data records from a database, we can see from the MySQL official manual that the data deletion speed is proportional to the number of created indexes.

  • Therefore, when we want to delete millions of data, we can first Delete the index (which takes about three minutes)
  • And then delete the useless data (this process takes less than two minutes)
  • After the deletion is complete, re-create the index (at this time there is less data) and create the index very quickly, about 10 minutes.
  • Compared with the previous direct deletion, it is definitely much faster, not to mention that in case of deletion interruption, all deletion will be rolled back. It is even more difficult.

Common Database index optimization statements

The following table tb_test is used as an example:

create table tb_test(id int not null,age int not null, name varchar(30) not null,addr varchar(50) not null);create unique index idx1_tb_test on tb_test(id);create index idx2_tb_test on tb_test(name);create index idx3_tb_test on tb_test(addr);

Index optimization recommendations

1. Calculate the index Column

For example, we want to find the age and name in the data records with IDs greater than 100 in Table tb_test.

The correct SQL statement is:

select age,name from tb_test where id > 1*100;

The SQL statement not recommended is:

select age,name from tb_test where id/100 > 1;

2. concatenate index Columns

For example, we want to find the id and age in the records whose name is "zhou" and addr is "CQ" in the table tb_test.

The correct SQL statement is:

select id,age from tb_test where name='zhou' and addr='CQ';

The SQL statement not recommended is:

select id,age from tb_test where concat(name,' ‘,addr) = ‘zhou CQ';

3. Use is null or is not null in the index Column

For example, we want to find the age in the record whose id is greater than or equal to "0" in Table tb_test.

The correct SQL statement is:

select age from tb_test where id >= 0;

The SQL statement not recommended is:

select age from tb_test where id is not null;

4. Use of or in index Columns

For example, we want to find the age and name in the records whose id is 101 or 102 in the table tb_test.

The correct SQL statement (using union) is:

select age,name from tb_test where id = 101 union select age,name from tb_test where id = 102;

The SQL statement (or) not recommended is:

select age,name from tb_test where id = 101 or id = 102;

5. Avoid using wildcards for the first character of the like index Column

For example, we want to find the id and age in the record whose name matches "zho" in the table tb_test.

The correct SQL statement is:

select id,age from tb_test where name like ‘zho%';

The SQL statement not recommended is:

select id,age from tb_test where name like ‘%ho%';

6. Use of composite indexes

If the index we created is a composite index, you must use the first field in the index as the condition to ensure that the system uses the index.

For example, the following index is created on table tb_test:

create index idx4_tb_test on tb_test(id,name,addr);

The preceding index idx4_tb_test is equivalent to the index (id), index (id, name), index (id, name, addr. This index is not used when name or addr is used separately in the where condition of an SQL statement. This index is used only when id is required.

Incorrect use of index columns in the SQL statements we have compiled may result in index unavailability, while full table scan greatly reduces database performance. Therefore, it is necessary to learn how to use the correct index.

Today, the index is explained here. The basic principles of indexes and the principles for creating indexes are important, and the interview is essential! You can collect a lot of understanding.

Summary

The above is all the content of this article. I hope the content of this article has some reference and learning value for everyone's learning or work. If you have any questions, please leave a message to us, thank you for your support.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.