First, write in front
As the development and testing tasks come to an end, it is important to set up some of the pre-release preparations for the project, one of which is to index some of the previously written SQL statements, which is very necessary in a high concurrency, high-traffic environment, and establishing a good index can greatly improve the query efficiency of SQL statements. So the question is, what is an index, and how do you build a good index? This article takes the MySQL InnoDB storage engine as an example, the actual project to see how to build a good and index.
Second, the index definition
The official MySQL definition of an index is: index is the data structure that helps MySQL to get data efficiently. By extracting the skeleton of a sentence, you can get the essence of the index: The index is the data structure.
We know that database queries are one of the most important functions of a database, such as the following SQL statement:
SELECT * fromtest_table WHERE id = 99; You can get a data record with ID 99 from table test_table.
We all want to query the data as fast as possible, so the designers of the database system are optimized from the point of view of the query algorithm. The most basic query algorithm is, of course, sequential lookup (linear search), traversing test_table and then row-by-line ID value is 99, the complexity of the O (n) algorithm is obviously bad when the data volume is large, fortunately, the development of computer science provides a lot of better search algorithm, such as binary search (binary search), binary tree lookup (binary trees Search), and so on. If you look at it a little bit, you will find that each lookup algorithm can only be applied to a particular data structure, such as a binary lookup requires an orderly retrieval of data, while a binary tree lookup can only be applied to a binary lookup tree, but the data itself cannot be fully organized to meet a variety of data structures (for example, It is theoretically impossible to organize both columns sequentially, so in addition to the data, the database system maintains a data structure that satisfies a particular lookup algorithm that references (points to) data in some way, so that an advanced find algorithm can be implemented on those data structures. This data structure is the index.
The above example is mainly to illustrate the role of the index, including MySQL InnoDB most of the database system and file system did not choose a binary tree structure as an index, but the use of B-tree or its variant B+tree as the index structure, This index structure minimizes the number of disk I/O accesses during the lookup process, as to what is B-tree or b+tree and why they are chosen for the database index structure, which can be learned on their own. Here we first introduce the two b+tree indexes of the MySQL InnoDB engine.
Third, Mysqlinnodb B+tree index
1. One is the primary key index, the primary key index is the clustered index (Cluster indexes), it has not only the primary key, but also all the data that the primary key belongs to, so in InnoDB, the primary key index is the data;
2. One is a column value of key, the primary key position is value (column value, primary key position) of the non-primary key index (SECONDARYINDEX)
InnoDB belongs to the Index organization table, and all data is hung under the primary key leaf node. Therefore, if the primary key is not guaranteed to be inserted in the order, then there will be a large number of primary key node splitting, resulting in a large number of I/O operations. In addition, InnoDB specifies that the length of a single index field must not exceed 768 bytes, otherwise truncation exceeds the length of the index.
InnoDB's non-primary key index all point to the primary key index, the lookup non-primary key index cannot get the whole row of data, need to look through the leaf node pointer to its primary key index location to obtain the entire row of data, so the primary key index must be designed as small as possible, otherwise the non-primary key index will be very large.
Iv. Principles for indexing:
Let's take a look at the principles to be followed to build a good index, and illustrate it with specific examples;
1. The leftmost prefix matching principle, very important principle, MySQL will always match right until it encounters a range query (>, <, between, like) to stop the match, such as a = 1 and B = 2 and C > 3 and D = 4 if established (a,b,c,d The index of the order, D is not indexed, if the establishment (A,B,D,C) of the index can be used, the order of a,b,d can be arbitrarily adjusted.
2.= and in can be scrambled, such as a = 1and B = 2 and c = 3 build (a,b,c) The index can be in any order, and the MySQL query optimizer will help you optimize the form that the index can recognize.
3. Try to choose a high-differentiated column as the index, the formula for the degree of sensitivity is count (distinct col)/count (*), indicating that the field does not repeat the scale, the greater the proportion of the number of records we scan, the difference between the unique key is 1, and some states, The gender field may be 0 in front of big data, and one might ask, what is the empirical value of this ratio? Using different scenarios, this value is also difficult to determine, generally need to join the field we are required to be more than 0.1, that is, the average 1 scan 10 records
4. The index column cannot participate in the calculation, keep the column "clean", such as from_unixtime (create_time) = ' 2015-08-14 ' can not be used to the index, the reason is simple, B + tree is stored in the Data table field values, but when the retrieval, You need to apply all the elements to the function to compare, obviously the cost is too large. So the statement should be written as Create_time = Unix_timestamp (' 2015-08-14 ').
5. Expand the index as much as possible and do not create a new index. For example, the table already has an index of a, now to add (A, b) of the index, then only need to modify the original index.
6. In the order BY or GROUP BY clause, if you want to sort by index, the order of the indexed columns must be the same as the order by or the GROUP BY clause, and the sort direction (reverse or positive order) of all columns is the same, and if the query is associated with more than one table, Only the fields referenced by the ORDER BY clause will be sorted by the first table, and the order by or group BY statement is the same as the query-type statement: The leftmost prefix principle of the index needs to be met, or MySQL will perform a sort operation that cannot be sorted by the index ; (There is a case that the order by or the GROUP BY clause may not satisfy the principle of the leftmost prefix, that is, when the preamble is a constant, if a where or join assigns constants to these columns, it can compensate for the insufficiency of the index).
V. Examples
Statement 1:
Statement 2:
For these two statements, if considered separately, you might build two indexes for statement 1 (status,netting_batch_no,debtor_agent_member_id) and set up for statement 2 (Netting_batch_ no,debtor_agent_member_id,transaction_currency); If you look at it synthetically, an index is enough, ie (netting_batch_no,debtor_agent_member_id ), there is no need to place the status or Transaction_currency field in the index, because the two fields are too poorly differentiated;
According to the principle of indexing 2, statement 1 is able to go to this index;
According to the principle of indexing 1, Statement 2 can also go to this index;
The more index is not the better, the establishment of too many indexes will increase the database memory or disk consumption, and will affect the performance of INSERT, delete and other operations, index in the index to follow the principles of index establishment, overall consideration;
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Principles of index building--taking InnoDB as an example