Database indexing and optimization in MySQL

Source: Internet
Author: User
Tags create index

First, the concept of the index
indexing is the way to speed up the retrieval of data in a table. The index of the database is similar to the index of the book. In books, the index allows users to quickly find the information they need without having to read through the entire book. In the database, the index also allows the database program to quickly locate the data in the table without having to scan the entire database.
Second, the characteristics of the index
1. Index can speed up retrieval of database
2. Indexing reduces the speed of database inserts, modifications, deletions, and other maintenance tasks
3. Indexing is created on a table and cannot be created on a view
4. Indexes can be created either directly or indirectly
5. You can use the index in the optimization hide
6. Use the query processor to execute the SQL statement, on a table, you can use only one index at a time
7. Other
third, the advantages of the index


1. Select the data type of the index

MySQL supports many data types, and choosing the right data type to store data has a significant impact on performance. Generally, you can follow some of the following guidelines:

(1) Smaller data types are generally better: smaller data types typically require less space in disk, memory, and CPU caching, and are faster to handle.
(2) A simple data type is better: integer data has less processing overhead than characters because the strings are more complex. In MySQL, you should use a built-in date and time data type instead of a string to store the time, and an integer data type to store the IP address.
(3) Try to avoid null: you should specify NOT NULL unless you want to store null. In MySQL, columns with null values are difficult to query optimization because they make indexing, indexing, and comparison operations more complex. You should use 0, a special value, or an empty string instead of a null value.

1.1. Select identifier
It is important to select the appropriate identifier. You should not only consider the storage type when choosing, but also consider how MySQL operates and compares it. Once you have selected a data type, you should ensure that all related tables use the same data type.
(1) Integer: is usually the best choice as an identifier because it can be processed faster and can be set to auto_increment.

(2) String: Try to avoid using strings as identifiers, they consume better space and are slower to handle. And, generally, strings are random, so their position in the index is random, which results in page splitting, random access to the disk, and fragmentation of the clustered index (for the storage engine that uses the clustered index).

2. Introduction to Indexing
For any DBMS, indexing is the most important factor for optimization. For a small amount of data, no proper index impact is not very large, but when the volume of data increases, performance will drop dramatically.
If you index multiple columns (combined indexes), the order of the columns is very important, and MySQL can only find the leftmost prefix of the index effectively. For example:
The query statement select * from T1 where c1=1 and c2=2 can use this index, assuming there is a composite index IT1C1C2 (C1,C2). The query statement select * FROM T1 where c1=1 can also use the index. However, the query statement select * FROM T1 where c2=2 is not able to use the index, because there is no boot column that combines the index, that is, if you want to use the C2 column for a lookup, you must appear c1 equal to a value.

2.1, type of index
Indexes are implemented in the storage engine, not in the server layer. Therefore, the indexes for each storage engine are not necessarily identical, and not all of the storage engines support all index types.
2.1.1, B-tree Index
The assumption is like the next table:


CREATE TABLE People (

last_name varchar (m) NOT NULL,

first_name varchar (m) NOT NULL,

DOB date NOT NULL,

Gender enum (' m ', ' f ') not NULL,

Key (last_name, first_name, DOB)

);


Iv. Disadvantages of the index

1. It takes time to create indexes and maintain indexes, which increase as the amount of data increases
2. The index needs to occupy the physical space, in addition to the data table occupies the data space, each index also occupies certain physical space, if wants to establish the clustered index, then needs the space to be bigger
3. When the data in the table to add, delete and modify the time, the index will also be dynamic maintenance, reduce the data maintenance speed

V. Classification of indexes

1. Creating indexes directly and indirectly creating indexes
directly creates the index: CREATE index Mycolumn_index on MyTable (myclumn)
indirectly CREATE INDEX: Define PRIMARY KEY constraint or uniqueness key constraint, can indirectly create INDEX
2. General index and Uniqueness Index
Normal index: CREATE index mycolumn_index on MyTable (myclumn)
Uniqueness Index: guarantees that all data in an indexed column is unique and can be used for clustered and nonclustered indexes
CREATE UNIQUE coustered index myclumn_cindex on MyTable (MyColumn)
3. Single index and composite Index
Single index: That is, non-composite Index
Composite Index: Also called a combined index, Contains more than one field name in an index establishment statement, up to 16 fields
CREATE INDEX name_index on username (firstname,lastname)
4. Clustered and nonclustered indexes (clustered index, clustered index)
Clustered Index: Physical index, same physical order as base table, data values are always ordered in order
CREATE CLUSTERED index mycolumn_cindex on MyTable (mycolumn) with
ALLOW _dup_row (allows clustered indexes with duplicate records)
Nonclustered index: CREATE unclustered index Mycolumn_cindex on mytable (mycolumn)

Vi. Use of indexes
1. When the field data Update frequency is low, query usage is high and there is a large number of duplicate values is recommended to use clustered index
2. Multiple columns are frequently accessed at the same time, and each column contains duplicate values to consider establishing a composite index
3. The leading columns of the composite index must be well controlled, otherwise the effect of indexing cannot be played. If the leading column is not in the query condition, the composite index is not used. The leading column must be the most frequently used column
4. The query optimizer lists several sets of possible connectivity scenarios and finds the best solution for the least cost of the system, based on the connection conditions, before the multiple table operations are actually executed. The join condition takes into account the table with the index, the table with many rows, and the selection of the inner and outer table can be determined by the formula: the number of matches in the outer table * The count of each lookup in the inner-layer table, and the product minimum is the best scheme.
Any operation result of a column in a 5.where clause is computed by column in the SQL runtime, so it has to do a table search instead of using the index above the column, and if the results are available at query compile time, you can be optimized by the SQL optimizer, using indexes to avoid table searches ( Example: SELECT * from the record where substring (card_no,1,4) = ' 5378′&& select * to record where card_no like '%78% ') any pairs of columns will result in a table scan, including database functions, calculation expressions, and so on, to move the action to the right of the equal sign as much as possible
The ' in ' of the 6.where condition is logically equivalent to ' or ', so the parser converts in (' 0′, ' 1′) to column= ' 0′or column= ' 1′ to execute. We expect it to be looked up according to each or clause separately, add the result so that you can take advantage of the index on column, but in fact it takes an "or policy," that is, the row that satisfies each or clause, the worksheet in the temp database, and the unique index to remove the duplicate rows. Finally, the result is computed from this temporary table. Therefore, the actual process does not use the index on column, and the finish time is also affected by the performance of the tempdb database. The IN, or clauses often use worksheets to invalidate indexes; If you do not produce a large number of duplicate values, you can consider the opening of the sentence; an open clause should contain an index
7. To be good at using stored procedures, it makes SQL more flexible and efficient


analyzing MySQL indexing efficiency

Methods: Add explain to the general SQL statement;
Analysis of the meaning of the result:
1) Table: list name;
2 Type: Types of connections, (ALL/RANGE/REF). Where ref is the ideal;
3) Possible_keys: The index name of the query can be used;
4 key: The actual use of the index;
5) Key_len: The length of the part used in the index (bytes);
6 Ref: Display column name or "Const" (do not understand what meaning);
7 rows: Displays the number of lines that MySQL believes must be scanned before the correct results are found;
8) Extra:mysql's recommendations;

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.