MySQL Optimization related summary

Last Update:2016-06-06 Source: Internet

Author: User

Tags connection pooling

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

MySQL Optimization related summary2016-05-31 Database concentration camp

Optimization Order:

Select the appropriate engine and table structure and data type
Build an index and optimize SQL.
Increased cache, Redis, memcache.
Master and slave, main master, read/write separation.
MySQL comes with partitioned table
Vertically split based on business coupling, distributed database
Split horizontally to select a reasonable sharding key.

Engine differences with the selection:

InnoDB uses clustered indexes, and the clustered index contains data. The secondary index (composite index, prefix index, unique index) stores the value of the primary index, so it takes two b-tree searches to find, with a slightly less efficient reading. However, the overlay index and adaptive Hash Index can alleviate this problem to some extent (adaptive hash index does not require user-specified, InnoDB dynamically at run time based on the distance of the frequency and mode of access to a subset of pages to establish a hash index, but only in memory, the park will be lost, restart slowly re-maintenance). InnoDB uses row locks, which are smaller in size and more concurrent. and read and write can be concurrent, reading does not need to lock, depending on the isolation level, when the lock is encountered to read the snapshot. So reading and writing concurrency is good. In addition, InnoDB provides transactional and foreign key functionality compared to MyISAM.
MyISAM data and indexes are stored separately, the index is cached in memory, and the index stores the disk address of the data. So do not need to query b-tree multiple times, read the performance of good. But MyISAM is a table lock, and only read and read is concurrent, so the write efficiency is poor. and write a high priority, a lot of writing can cause the operation to starve.
In summary: If your application has a very large percentage of read and write operations, or if you need full-text indexing, you can use MyISAM (5.6 to start InnoDB also supports full-text indexing). Other cases, in addition to special cases of special analysis, recommended InnoDB.

Selection of data types:

Char, Varchar:char are fixed-length, varchar is variable-length, so char processing is much faster. Char is suitable for data columns with little change in length, or for scenarios where queries are demanding. MyISAM recommends using varchar with CHAR,INNODB.
Text, Blob:blob can be stored in binary, text can only save character data. You can improve the retrieval performance of large text fields by synthesizing indexes (creating an extra column to store hash values for large text column data), but this method only works with exact and complete matching queries. The prefix index is also useful for querying, but the query condition cannot begin with%.
Floating-point, fixed-point number: float, double is a float, and the precision is rounded when it exceeds. Decimal, Numberic is the fixed-point number, the actual use of string storage, so the accuracy is higher, exceeding the accuracy of warning or direct error. Therefore: the precision-sensitive data such as currency to use fixed-point number, because the floating-point numbers there is an error problem.
Date Type: DateTime can be stored for days and minutes, and is more distant than timestamp can represent. If time zone issues are involved, use timestamp.

Character:

Different character sets may involve implicit conversions and cause indexes to fail.

Table structure:

Choose the appropriate primary key (selectivity, length), do not wide table, not too much association, do not large pieces.
About Paradigms:

First, a field has a value that only one meaning exists. {School number, gender + age}. Workaround: {School Number, gender, age}
Second, if the primary key is a composite primary key, the non-primary key field cannot depend on the part of the primary key and must depend on all. {Study number, subject, score, total score}, fractions depend on composite primary key, but the total is dependent on only one field of discipline, so it does not conform to the second paradigm. Solution: {Study number, subject, score} (subject, total)
Thirdly, there should be no redundancy. {School number, class, class teacher} class teacher field There will be a lot of redundancy. Solution: ({School number, class} {class, Class teacher})
Paradigms do not have to be strictly adhered to, depending on the circumstances of the choice, appropriate violations sometimes bring more benefits. For example

QQ User table Store QQ icon Lighting situation, if each icon with a field storage, then dozens of icons will make the table is very wide, waste space. If you store all the lights with a bitmap, one or two fields are available. Although it violates the first paradigm, it is still a good idea.
The second and third paradigm, while guaranteeing the rigor of the table, may lead to more connections. Proper violations can reduce connectivity, especially when distributed deployment, which saves a lot of hassle.

Design and use of indexes:

The indexes for MyISAM and InnoDB are btree indexes, and b stands for the balance tree. Prefix index is supported, the prefix index has a disadvantage: order by and Group by cannot use the index. MyISAM supports full-text indexing (5.6 Start InnoDB also supports full-text indexing).
The most appropriate column to index is the column that appears in the WHERE clause and the link clause, not the SELECT.
Index column to use a short column, if the Long Character column index, as far as possible with the prefix index, in the shortest length, to meet enough selectivity.
Do not over-index, which is a big burden for insertions and updates.
InnoDB try to use the self-increment index, if not, do not use large fields.
The leftmost prefix principle is the first principle used by the Btree index, that is, the order of the columns in the index condition, in the order of the columns in the composite index, from left to right as far as possible, in the middle can not be interrupted, and as far as possible with the exact "=" as the condition. The index is available for the first condition that uses a range comparison from left to right, but the subsequent conditional column can no longer use the index.
Common scenarios where indexes cannot be used:

A like query that starts with%. You can consider full-text indexing. Or using InnoDB's clustered index, the sweep index is much faster than the sweep table. For example: A table has a primary key ID, and the secondary index is name. Now you want to search for name-like%end% by name, the direct select * from table where name is%end%, which causes the table to be fully scanned and inefficient. Because InnoDB is the primary key value stored in each secondary index, it can be changed to select * FROM (SELECT ID from table where name is like%end%) A, table B where a.id = b.ID; In this way, because the ID and name meet the overwrite index in the secondary index, the index can get all the IDs that satisfy the condition, and then the final result is queried based on the ID.
When a type conversion, function, operation, transcoding occurs.
The left column of the composite index is not included.
MySQL guesses that the sweep table is faster than using the index.
Or, each field must be indexed, otherwise the index will not be used at all.

Transactions and Locks:

InnoDB provides transactional functionality, MyISAM not available.
InnoDB uses row-level locks, and table-level locks are also supported. MyISAM provides table-level locks. InnoDB has better read and write concurrency, but there is a possibility of deadlock in row-level locks.
InnoDB row-level locks are implemented for index entry locking, meaning that all records are locked without index retrieval and are identical to table locks. Row locks are divided into three types:

Record Lock: Index entry plus lock
Gap Lock: Gap locks
Next-key Lock: The first two combinations

Transaction:

Read Uncommit not submitted
Read Committed: Resolving dirty Reads
REPEATABLE READ REPEATABLE READ: Resolving non-repeatable reads (Default level)
Serializable serializable: Resolving Phantom Reads
Update lost
Dirty Read
Non-REPEATABLE READ
Phantom reading
Atomicity, consistency, persistence, isolation
Concurrent transaction processing can improve efficiency and resource utilization, but it also poses problems:
Isolation levels: Dirty reads, non-repeatable reads, and Phantom reads are all consistency issues that need to be resolved by an isolation mechanism. There are two kinds of isolation mechanisms: locking, generating a consistent snapshot. The tighter the transaction isolation, the smaller the side effects and the greater the cost. There are 4 isolation levels available that can be selected depending on the business:
The default isolation boundary can be read repeatedly, and there will be a phantom read problem. Select...for update resolves the phantom read problem. Eg:select * FROM data where ID < fro update; Locks all records with an ID of less than 100 and the non-existent record "gap" also locks, and the level is Next-key lock, so the Phantom read is avoided. However, it is easy to cause serious lock waits, try not to use, and access the updated data with exactly equal conditions. It is worth noting that the Next-key lock is also added when a non-existent data is requested with equal conditions.

Deadlock Prevention:

Accessing tables in the same order
In a transaction, if you need to update the record, you should request a sufficient level of lock directly: exclusive lock.

Distributed transaction: 5.0.3 start support, and only InnoDB. Two-stage submission, but the efficiency is not ideal.

Splitting of the table:

Vertical split: The main code and part of the column are placed in one table, and the master code and the other part are placed in another table. The advantage is that the line is smaller, the data page can put more cache, the downside is the need to manage redundancy, to get all the data needed to join.
Horizontal split: When the table is large or the data in the table is inherently independent. Pros: Reduce the number of index layers, disadvantages: Bring complexity to your application.

Partition table:

Dividing a table into smaller parts is transparent to the application. You can store more data and improve query throughput. When a WHERE clause contains a partitioning condition, you can scan only a portion of the partition to improve efficiency. Sum, count, and so on, can be summarized in parallel on the partition.
All partitions of the same table must have the same engine.
Partition type: Range, LIST based on enumeration, hash hash, KEY similar to hash. You cannot use fields other than primary keys and unique keys for partitioning fields.

Optimization means:

View the slow statement record that has been performed by the slow query log. Show Processlist View the current MySQL running thread.
Explain view the execution plan for slow SQL.

All: Full table scan
Index: Indexed full scan
Range: Index range scan
Ref: Prefix index using a non-unique index or a unique index
Eq-ref: Using a unique index
Const System: Single table only a maximum of one matching row can be found very quickly.
NULL: No access to table or index is required.
The type of select_type:select. Simple table, PRIMARY main query, second or subsequent, subquery subquery in union Union.
Table: Tables Name
Type: How to find the desired row in the table. Efficiency from low to High:
Possible_keys: Indexes that may be used
Keys: The actual index used
Key_len: The length of the indexed field to use
Rows: Number of scanned rows
Extra: Additional Information

Explain extended plus show warnings can see what changes the optimizer made before the SQL was actually executed.
Show profiles can view each query for the current thread. Show profile for query + ID (show profiles), you can see the time-consuming of each step. You can also further review the CPU IO block and other levels to see what resources to use when the time is high. For example: Show profile CPU for query + ID.
5.6 provides trace for tracking SQL to further understand why the optimizer chooses the final execution plan.

First open Trace:set optimizer_trace= "Enabled=on", End_markers_in_json=on;
Cache size: Set optimizer_trace_max_mem_size=1000000;
Execute SQL
SELECT * from Information_schema.optimizer_trace; View the trace file.

SQL statement Optimization:

Insert: If a single client inserts multiple bars, insert into test values as much as possible (3,4), and (5,6) inserts at the same time, reducing interaction
The order By:btree index is stored sequentially and can be exploited. So minimize the extra filesort and return the ordered data directly through the index. Procedure: The order by and where use the same index, composite index. and the order by field is either ascending or descending. If you do not, sort operations a lot, data more, appropriate open large sort_buffer_size let the sorting as far as possible in memory, this value is exclusive per thread, multiple threads on multiple buffer, note!
GROUP BY: By default group by C1, C2 will be on c1,c2 ... Sort all of the fields, if you do not need to deliberately display the addition of an order by NULL to prohibit sorting, improve efficiency.
Nested queries: There are cases where connections can be used instead.
Or: Ensure that each column is indexed, and that when MySQL is processed, each field is queried for a union operation.
Paged query: limit1000,10 will sort out the first 1010 rows and last only 10 rows. Low efficiency.

Using the Overwrite index: the subquery first uses the overwrite index to query to the primary key that satisfies the condition, and then uses the primary key back table to find the record. Eg:select name, value from data order by name limit 1000, 10; Change to select name, value from data a INNER join (select ID from Data order by name limit, ten) b on a.id = b.ID;
Records the value of the last row sequence of the previous result, and then: where Name > Lastvalue order by name limit 10; This method is not suitable for cases where the sorted field has duplicate values, and the record is lost.

SQL prompt:

Use index: Let MySQL reference the provided index. Eg:select * FROM Data Use index (IDX_ID);
IGNORE index: Ignores an indexed
Force index: Forcing the use of an index

Statistical operations on large tables: Create a new temporary table, tell the required data to import temporary tables, and then count. Benefit: Quarantine, you can temporarily add fields, indexes.
In does not exist: exist
Skills:

Order by rand () limit 5; Randomly select 5 lines.

Application optimization:

Connections are built at a cost that applies connection pooling.
If the table update operation is infrequent, you can take advantage of the query cache. Once the table has a little update, the cache for the entire table is invalidated.
Add the cache layer.
The master-slave to apportion read and write pressure, but there is a delay, need to consider.
Distributed database cluster.

Other optimizations:

Periodic analysis table: Analyze table test; The analysis table allows the system to get accurate statistics and SQL to generate a more correct execution plan.
Optimization table: Optimize table test; Merge space fragments.
The above two operations will lock the table!!!

Cluster:

Mysql Cluster:

Node Type: Management node (1), SQL node (bridge between application and data node), data node (storage data, multiple mirror nodes for downtime)

MMM architecture:

Dual master replication schema, only one master provides write, the other provides a partial read.

MHA Architecture:

Two components: MHA Manager management node, MHA

MySQL Optimization related summary

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More