Mysql index and optimization, mysql INDEX OPTIMIZATION
Today, I saw some articles about mysql indexes written by others, but I had some small gains. I just started my essay record and briefly picked some important points.
Repost: http://www.cnblogs.com/tgycoder/p/5410057.html
Mysql index implementation principle
1. the MyISAM engine uses B + Tree as the index structure. The data domain of the leaf node stores the data record address. The MyISAM index method is also called "non-clustered, the reason for this is to distinguish it from the InnoDB clustered index.
2. InnoDB also uses B + Tree as the index structure. The first major difference is that InnoDB's data file itself is an index file. The first major difference is that InnoDB's data file itself is an index file. As mentioned above, the MyISAM index file is separated from the data file, and the index file only stores the data record address. In InnoDB, the table data file itself is an index structure organized by B + Tree. The leaf node data field of this Tree stores the complete data records. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index
The InnoDB primary index (also a data file) shows that the leaf node contains a complete data record. This index is called a clustered index. Because the data files in InnoDB need to be clustered by the primary key, InnoDB requires that the table have a primary key (MyISAM may not). If it is not explicitly specified, mySQL automatically selects a column that uniquely identifies a data record as the primary key. If this column does not exist, MySQL automatically generates an implicit field for the InnoDB table as the primary key, this field is 6 bytes in length and its type is long integer.
3. leftmost prefix and Related Optimization
Previously, I understood that the leftmost prefix would mean that the index order is inconsistent with the where condition query, so that the index cannot be used. This is incorrect.
Ps: In the leftmost prefix principle, if the where clause has or, it will still traverse the entire table.
(1) In fact, the order of the where condition does not affect the use of indexes. For example, you can add three fields to the t_user table Union Index (name, mobile, create_date)
Select * from t_user where mobile = '2013' and create_date = '2017-07-31 'and name = 'Corner ';
Theoretically, indexes are sensitive to order. However, because MySQL's query optimizer automatically adjusts the conditional order of the where clause to use suitable indexes, indexes can also be used.
(2) The first column of the index is not specified in the query condition.
If the where condition does not have the name condition, only the other two cannot use the index regardless of the order. If the where condition only has name and status, but does not have mobile, only one column of index can be used, the index of the status column is not available.
(3) Range Query
The index can be used for a range column (the leftmost prefix must be used), but the index cannot be used for the column after the range column. At the same time, the index can be used for at most one range column. Therefore, if there are two range columns in the query condition, the index cannot be fully used.
Table t_title joint index (emp_no, title, from_date)
EXPLAIN SELECT * FROM employees.titlesWHERE emp_no < '10010'AND title='Senior Engineer'AND from_date BETWEEN '1986-01-01' AND '1986-12-31';
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+| 1 | SIMPLE | titles | range | PRIMARY | PRIMARY | 4 | NULL | 16 | Using where |+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
Only the first index can be used. It is particularly important to note that MySQL is an interesting place, that is, it may not be able to distinguish between the range index and multi-value matching with the explain statement, in type, both are displayed as range. At the same time, the use of "between" does not mean that it is a range query, for example, the following query:
All indexes are used.
EXPLAIN SELECT * FROM employees.titlesWHERE emp_no BETWEEN '10001' AND '10010'AND title='Senior Engineer'AND from_date BETWEEN '1986-01-01' AND '1986-12-31';
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+| 1 | SIMPLE | titles | range | PRIMARY | PRIMARY | 59 | NULL | 16 | Using where |+----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+
(4) query conditions contain functions or expressions
If the query condition contains a function or expression, MySQL does not use an index for this column.
If the like wildcard % does not appear at the beginning, you can use the index. However, depending on the actual situation, you may only use one of the prefixes.
Explain select * FROM employees. titles WHERE emp_no = '000000' AND title LIKE 'Senior % '; + ---- + ------------- + -------- + ------- + ------------- + --------- + ------ + ------------- + | id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra | + ---- + ------------- + -------- + ------- + --------------- + --------- + ------ + ------------- + | 1 | SIMPLE | titles | range | PRIMARY | 56 | NULL | 1 | Using where | + ---- + ------------- + -------- + ------- + --------------- + --------- + ------ + ------------- +
4. Index selectivity and prefix Index
(1) under what circumstances should a field be indexed? Today, I just saw this "Selective" concept, in addition to the small amount of table data, you do not need to create an index. Because the index file itself consumes storage space, it will increase the burden on database operations. In other cases, the index is less selective:
The so-called Index Selectivity refers to the ratio of non-repeated Index values (also called Cardinality) to the number of table records (# T): Index Selectivity = Cardinality/# T
Obviously, the value range of selectivity is (0, 1]. The higher the selectivity, the greater the value of the index, which is determined by the nature of B + Tree.
This is like a question I asked during the interview: Is it suitable for creating an index for a Gender column? (The answer is no)
For example, in the employees. titles table used above, if the title field is frequently queried separately, do you need to create an index? Let's take a look at its selectivity:
SELECT count(DISTINCT(title))/count(*) AS Selectivity FROM employees.titles;+-------------+| Selectivity |+-------------+| 0.0000 |+-------------+
The selection of the title is less than 0.0001 (the exact value is 0.00001579), so there is no need to create a separate index for it.
(2) There is an index optimization strategy related to index selectivity called prefix index, that is, replacing the entire column with the column prefix as the index key. When the prefix length is appropriate, it can make the prefix index selectively close to the full-column index, and reduce the size and maintenance overhead of the index file because the index key becomes short. The following uses the table employees. employees as an example to describe how to select and use a prefix index.
As shown in figure 12, the employees table has only one index <emp_no>. If we want to search for a person by name, we can only scan the entire table:
EXPLAIN SELECT * FROM employees.employees WHERE first_name='Eric' AND last_name='Anido';+----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+| 1 | SIMPLE | employees | ALL | NULL | NULL | NULL | NULL | 300024 | Using where |+----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+
If you frequently search for employees by name, the efficiency is obviously low, so we can consider creating indexes. There are two options: <first_name> or <first_name, last_name> to check the selectivity of the two indexes:
SELECT count(DISTINCT(first_name))/count(*) AS Selectivity FROM employees.employees;+-------------+| Selectivity |+-------------+| 0.0042 |+-------------+SELECT count(DISTINCT(concat(first_name, last_name)))/count(*) AS Selectivity FROM employees.employees;+-------------+| Selectivity |+-------------+| 0.9313 |+-------------+
<First_name> the selectivity is obviously too low. <first_name, last_name> the selectivity is good, but the length of first_name and last_name is 30. Is there a way to take both length and selectivity into account? You can use the first few characters of first_name and last_name to create an index. For example, <first_name, left (last_name, 3)> to check its selectivity:
SELECT count(DISTINCT(concat(first_name, left(last_name, 3))))/count(*) AS Selectivity FROM employees.employees;+-------------+| Selectivity |+-------------+| 0.7879 |+-------------+
The selectivity is good, but the distance from 0.9313 is still a bit, so add the last_name prefix to 4:
SELECT count(DISTINCT(concat(first_name, left(last_name, 4))))/count(*) AS Selectivity FROM employees.employees;+-------------+| Selectivity |+-------------+| 0.9007 |+-------------+
At this time, the selectivity is very good, and the index length is only 18, which is nearly half shorter than <first_name, last_name>. We will create This prefix index:
ALTER TABLE employees.employeesADD INDEX first_name_last_name4 (first_name, last_name(4));
Execute the query by name again to compare and analyze the results before the index creation:
SHOW PROFILES;+----------+------------+---------------------------------------------------------------------------------+| Query_ID | Duration | Query |+----------+------------+---------------------------------------------------------------------------------+| 87 | 0.11941700 | SELECT * FROM employees.employees WHERE first_name='Eric' AND last_name='Anido' || 90 | 0.00092400 | SELECT * FROM employees.employees WHERE first_name='Eric' AND last_name='Anido' |+----------+------------+---------------------------------------------------------------------------------+