Index optimization of Mysql performance optimization _mysql

Source: Internet
Author: User
Tags md5 mysql client mysql query mysql version mysql index

As a free and efficient database, MySQL is basically the first choice. Good security connection, with query resolution, SQL statement optimization, use of Read and write locks (refinement to line), things isolation and multiple version concurrency control to improve concurrency, complete transaction logging, powerful storage engine to provide efficient query (table records up to millions), if it is InnoDB, can also be a complete recovery after the crash, There are many advantages. Even if there are so many advantages, still rely on people to do some optimization, reading after writing a summary of the consolidated, wrong please correct me.

Complete MySQL optimization requires a deep foundation, large companies even have dedicated to write MySQL kernel, SQL optimized siege lions, MySQL server optimization, a variety of parameter constants set, query statement optimization, master-slave replication, software and hardware upgrades, disaster recovery backup, SQL programming, Need is not stint knowledge and time to master, as a rookie like me to develop, strong eating so much digestion can not be meaningless: no use Ah, and also the operation of the DBA, but also to write the business on hand, that is, write a good sql, and a lot of SQL statement optimization and index still have a great relationship.

First of all, the MySQL query process is roughly: MySQL client through the protocol and MySQL server connection, send query statements, check the query cache, if hit, directly return results, otherwise the statement resolution, a series of preprocessing, such as check whether the statement is written correctly, Then it is query optimization (such as whether to use an index scan, if it is an impossible condition, terminate prematurely), generate a query plan, then the query engine starts, executes the query, gets the data from the underlying storage engine call API, and returns it to the client. How data is stored and how it is taken is related to the storage engine. Then MySQL defaults to using the Btree index, and a general direction is, no matter how toss SQL, at least for now, MySQL uses only one index in the table.

MySQL through the storage engine to fetch data, naturally with the storage engine has a lot of relationship, different storage engine index is not the same, such as MyISAM full-text indexing, even if the index called a name internal organization is different, The most commonly used of course is innodb (and fully compatible with MySQL MARIADB, its tacit engine is xtradb, and InnoDB very similar), here is written InnoDB engine. And the implementation of the index with the storage engine, according to the implementation of the way, INNODB index currently only two: Btree index and Hash index. Usually we say that the index is not an accident refers to the B-Tree index, InnoDB's Btree index, is actually implemented with a B + tree, because when viewing the table index, MySQL print btree, so the short name is the B-tree index. As for the difference between B-tree and A + + tree, forgive me the data structure is not well learned, but also need to fill the place.

Using the Btree index means that all indexes are stored sequentially (ascending), which is what MySQL does, and the Btree index in MYSL is the abstract structure below (reference to high-performance MySQL).

structure, each layer of nodes from left to right from small to large arrangement, Key1 < Key2 < ... < Keyn, for less than Key1 or in [Key1,key2) or other values of nodes, in the Entry leaf node lookup, is a range distribution, while, The same layer of nodes can be accessed directly, because there is a pointer to the link between them (MyISAM Btree index is not). Each search is an interval search, some words are found, no words is empty. Indexing speeds up access because it eliminates the need for full table scan data (not always), and compares the value of a lookup to the values in a node, usually using a binary lookup, which is almost the fastest for a sorted number.

Where Val points, for InnoDB, it points to table data, because the InnoDB table data itself is the index file, which is a significant difference from the MyISAM index, and the MyISAM index points to the address of the table data (Val points to something like 0X7DFF. , etc.). For example, for InnoDB a primary key index, this might be

        

The InnoDB index node Val value directly points to the table data, that is, its leaf nodes are table data, they are linked together, the table record rows are not placed separately elsewhere, and the leaf nodes (data) are accessible.

Earlier in the Btree abstract structure, the nodes of the indexed values are placed on the page, and there are two issues to note:

1. Leaf page, the value of the page (the picture above), that is what the so-called page is, I added a node annotation, that is, here the minimum page can be approximated as a single node. We know that the computer's storage space is a piece of usually a piece is used up to use another piece, if the last piece of only 5kb, but here just to apply for 8kb space, you have to apply for this space in a new block, and then the application is connected to this 8kb behind, as long as this block of space enough, Then the 5kb is usually the so-called "debris", the computer will be more than a lot of such fragmented space, so there are pieces of defragmentation. In MySQL, the page here can be understood as block storage space, that is, the index tree node is stored in the page, each page (called the logical page) has a fixed size, InnoDB is currently 16kb, a page is used up, when you continue to insert the table to generate a new index node, you go to the new page to store this node, The new node continues to be placed behind the node of the new page.

2. Page splitting problem, a page is always full, and then a new page to continue, this behavior is called page splitting. When a new page is opened, MySQL prescribes a splitting factor that reaches 15/16 of the page storage space to the next page. The presence of page splitting can significantly affect the performance of performance maintenance indexes. It is generally recommended that you set an insignificant integer self-indexed index to facilitate index storage

    

If the index is not self added or integer, such as non-integer, similar to MD5 strings, when they are indexed, because the next data to be inserted is not necessarily larger than the previous one, or even less than all the current page values, you need to run to the previous pages to compare and find the right location, InnoDB cannot simply insert a new line behind the previous line. After finding and inserting an index, it may cause the page to reach the split factor threshold, requiring page splitting, which further causes all subsequent index pages to be split and sorted, with little or no problem, and large amounts of data may waste a lot of time and produce many fragments.

    

Primary keys are always unique and not empty, InnoDB automatically establishes the index (primary key), the index that is established on the Non-key field, also called the secondary Index, and the index arrangement is in order, except that it also comes with a data field of the primary key value of this record, not a pointer to this data row, When using the secondary index lookup, locate the index value for the column, and then locate the row record based on another data field on the index node---Primary key value, that is, each lookup is actually looked up two times. The advantage of storing primary key values in additional data fields is that when page splitting occurs, you do not need to modify the value of the data field, because even if the page splits, the primary key value of the row is unchanged, and the address changes. For example, the index of the Name field is shown below

      

An index that contains a column is called a Single-column index, and a multiple column is called a composite index, because the Btree index is ordered, so it is more appropriate for a range query, but in a composite index, you should also pay attention to the number of columns, the order of the columns, and the effect of the columns in the

Like having a table like this

CREATE TABLE staffs (
    ID int primary key auto_increment,
    name varchar NOT NULL default ' comment ' name ", age
    int NOT NULL default 0 comment ' age ',
    pos varchar NOT null default ' comment ' position ',
    add_time timestamp not nul L Default Current_timestamp comment ' entry time '
  charset UTF8 comment ' Employee record table ';

Add a composite index of three columns

ALTER TABLE staffs add index IDX_NAP (name, age, POS);

In the use of Btree indexes, you can use some of the indexes or portions of the index (use explain to view usage easily):

  1. Full Value Matching

such as select * from staffs WHERE name = ' July ' and ' = ' ' and ' pos = ' dev ', the key field shows the use of the IDX_NAP index

  2. Matching the leftmost column , for composite indexes, does not always match all field columns, but it can match the left column in the index

For example, select * from staffs WHERE name = ' July 'and ' s ', the key field displays the index, noting that the Key_len field (the index length used by this statement) is smaller than the previous one, meaning it is not Use all indexed columns (usually this length can be reckoned with which index columns are used, bury a pit), and in fact only the name and age columns are used

Try again select * from staffs WHERE name = ' July ', it also uses the index, the Key_len value is small, actually uses only the name column in the index

  3. Match the column prefix , that is, the previous part of the column in an index, mainly used in a fuzzy match, such as SELECT * fromstaffs where name like ' j% ', and the key field of the explain information uses the index. However, MySQL's B-Tree index cannot have a fuzzy match without a column prefix, such as SELECT * from staffs where '%y ' or like '%u% ', which is said to be due to API limitations of the underlying storage engine

  4. Matching range , such as select * from staffs where name > ' Mary ', but I found in the test > can, >= but not, At least not on the string column (test MySQL version 5.5.12), but in the time type (timestamp) can be, do not test under the truth is not sure that the index = =

Out of curiosity. The index of the next integer field (IDX_CN (count, name), count is integral), found that the integral type is much less restrictive, the following can be used in the index, even the first fuzzy matching is OK

SELECT * from IndexTest1 where count > ' A '
  select * from IndexTest1 where Count >= ' a '
  select * from index  Test1 where Count > ' 10% '
  select * from IndexTest1 where Count >= ' 10% '
  select * from IndexTest1 where count > '%10% '
  select * from IndexTest1 where Count >= '%10% '

  5. Match exactly one column and range to match the adjacent column on the right, that is, the previous column is a fixed value, the last column is the range value, and it uses the index of name and age two columns (Key_len conjecture)

such as SELECT * from staffs WHERE name = ' July ' and age > 25

  6. Access to index only queries , such as Staffs table, the index is based on (Name,age,pos) above, has been read all the columns, if we use which columns of the index, query only to check the data of these columns, is only access to the index query, such as

Select Name,age,pos from staffs WHERE name = ' July ' and ' m ' and pos = ' dev '
  select Name,age from staffs where Nam E = July and age > 25

The first sentence used all indexed columns, the second sentence used only the first two columns of the index, select fields can only be the two columns, the query index, MySQL is called overlay index, is the index contains (covered) all the fields of the query. If you use an index query, you need to look at the last extra column in the explain, using index indicates that the overlay index is used, and the using where indicates that the where filter is used

  7. Prefix index

differs from the column prefix (a fuzzy match like ' j% ') and the leftmost column index (a query that sequentially takes the left column in the index), which takes only part of a column as an index. In general, when it comes to the difference between InnoDB and MyISAM, one obvious difference is that MyISAM supports Full-text indexing, and InnoDB does not, and even for text, blobs, and so long string or binary data, MyISAM takes the number of characters before the index, InnoDB's prefix index is similar to this one, some columns, typically string types, are long, all as indexes greatly increase storage space, indexes also need maintenance, for long strings, but also want to as an index column, a good way is to take the previous part (prefix), representing an entire column as an index string, The question is: how do you make sure that the prefix represents or roughly represents this column? So one concept in MySQL is the selectivity of an index, which is the ratio of the number of distinct values in the index (also called the cardinality) to the total number of records (#T) of that column in the entire table. such as a list (1,2,2,3), the total is 4, the number of distinct values is 3, the selectivity is 3/4, so the selectivity range is [1/#T, 1 The larger the value, the greater the number of distinct values in the column, the more appropriate to be the prefix index, and the selectivity of the unique index (unique KEY) is 1.

For example, there is a list of a varchar (255), with it as a prefix index, for example, with 7 tests, each increment to see the selectivity value growth to that number basically unchanged, it can represent the entire column, combined with this length of the index column is stored data too much, make a trade-off, basic on the line. But if this selectivity is small, it's pathetic.

Select COUNT (distinct left (A, 7))/count (*) as non_repeat from tab;

Set a number of prefixes, such as 9, when you add an index

ALTER TABLE tab ADD index IDX_PN (name (9))--single prefix index
  ALTER TABLE tab ADD index IDX_CPN (count, name (9))--composite prefix index

The above is the common way to use the index, there are such situations can not be used or not fully used, and some is the case above the counter example, to key (A, B, c) as an example

1. Skip columns, where a = 1 and c = 3, use up to index columns A;where B = 2 and c = 3, one also not used, must start from the leftmost column

2. Front is the range query, where a = 1 and B > 2 and c = 3, use up to a, b two indexed columns;

3. The order is reversed, where C = 3 and B = 2 and a = 1, and one is not used;

4. An expression is used on the index column, such as where substr (A, 1, 3) = ' HHH ', where a = a + 1, the expression is a taboo, and then simply MySQL is not recognized. Sometimes the amount of data is not large to seriously affect the speed, you can generally find out, such as first check all the data with order records, and then in the program to filter the "cp1001" to the beginning of the order, rather than write SQL filter it;

5. Fuzzy matching, as far as possible to write where a like ' j% ', the string is placed on the left, so that it can be used to get a column index, and may not even use, of course, this depends on the data type, the best Test.

  The effect of sorting on indexes

Order BY is a regular statement, and sorting follows the principle of the leftmost prefix column, such as key (A, B), and the following statement can be used (test is wonderful)

SELECT * from tab where a > 1 order by B
  select * to Tab where a > 1 and b > ' 2015-12-01 00:00:00 ' ORDER BY b
  SELECT * from tab order by a, b

The following conditions are not available

1. Non-leftmost column, select * from tab order by B;

2. Does not come in the order of indexed columns, select * from tab where B > ' 2015-12-01 00:00:00 ' orders by A;

3. Multiple-column ordering, but the column is in an inconsistent order, select * from Tab a ASC, b Desc.

  Clustered index and Overlay index

As I said before, MySQL index from the structure of only two types, btree and hash, covering the index is only in the query, to query the column just with the use of the index column exactly the same, MySQL direct Scan index, and then can return data, greatly improve efficiency, because no need to go to the original table query, filtering, The index in this form is called the overlay index, such as key (A,b), when the query selects A,b from tab where a = 1 and B > 2, the essential Reason: the Btree index stores the original table data.

The clustered index is also not a separate index, which is briefly written before that the Btree index places the data in the index. The index of the leaf page, including the primary key, the primary key is closely with the table data next to each other, because the table data only one, a column of key values to each row with the data are close together, so a table has only one clustered index, For MySQL, it is the primary key column, which is the default.

Clustered indexes organize table data together (refer to the previous primary key index thumbnail), heavily dependent on the primary key sequence when inserting, preferably continuous, or face frequent page splitting problems, moving a lot of data.

  Hash index

Briefly, like a hash table (hash list), similar to a simple implementation in a data structure, when we use a hash index in MySQL, we also compute a hash value for the indexed column (similar to MD5, SHA1, CRC32), and then arrange the hash value in order (default ascending). It also records a pointer to a row in the datasheet for that hash value, which, of course, is just a brief simulation

      

For example, to create a hash index of the name column, the hash values are generated sequentially, but the sequential hash value does not correspond to the records in the table, from the address pointer, and the hash index may be built on two or more columns, and the hash value after the multiple-column data is not stored in the table data. It first calculates the hash value of the column data, compares it to the hash value in the index, finds the equality of the column data, may involve other column conditions, and then returns the data. Hash of course there will be conflicts, that is, collisions, unless there are many conflicts, the general hash index efficiency is high, or the hash maintenance cost is high, so the hash index is usually used in higher selectivity of the column above. The structure of a hash index determines its characteristics:

1. Hash index is only the hash value order, with the table data does not have a relationship, can not be applied to orders by;

2. The hash index calculates the hash value for all its columns, so when querying, you must take all the columns, such as a (a, b) hash index, where a = 1 and B = 2, and no one is not available;

3. Hash index can only be used to compare query = or in, the other scope query is invalid, the essence or because the table data is not stored;

4. In the event of a collision, the hash index must traverse all the hash values and compare the address points to the data until all eligible rows are found.

Landfills

As mentioned earlier, by Explain Key_len field, you can roughly estimate which columns are used, and the length of the indexed column is directly related to the data type of the indexed column, in general, we say int is 4 bytes, bigint8 byte, char is 1 bytes, consider the character set when you build the table, such as UTF8, Also related to the selected character set (==!), under UTF8, a char is 3 bytes, but know that these still can't say Key_len is the data type of the indexed column that will be used represents the number of bytes. A plus is the end? The facts are always somewhat different, test methods are compared mechanically (the following based on MySQL 5.5.2)

Build table, add index, int type

--Test table
  CREATE TABLE keyLenTest1 (
    ID int primary key auto_increment,
    Typekey int default 0,
    add_time times Tamp NOT NULL default Current_timestamp
  ) CharSet UTF8
  --add index
  ALTER TABLE KEYLENTEST1 Add index Idx_k (typekey );

The int index has a default length of 5, and +1 on a 4-byte basis

Char type

 --Change to char type, 1-character
  ALTER TABLE keyLenTest1 modify Typekey char (1);

--Change to char type, 2-character
  ALTER TABLE keyLenTest1 modify Typekey char (2);

The char type is initially 4 bytes (3+1 bytes), followed by 3-byte increments

VARCHAR type

--Changed to varchar type, 1 characters
  ALTER TABLE KEYLENTEST1 modify Typekey varchar (1);

--Changed to varchar type, 2 characters
  ALTER TABLE KEYLENTEST1 modify Typekey varchar (2);

varchar type, 1 characters, Key_len is 6, then increment by 3 bytes

So, if a statement uses int, char, varchar,key_len how to calculate and which index columns to use should be very clear.

If you want to know more details, explain the meaning of each field, more details of the index, in addition to explain, as well as show profiles, slow query log, etc. (did not look closely), recommended to see High-performance MySQL, after all, I write too superficial.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.