Best practices for MySQL database optimization in China Mobile

Last Update:2016-08-14 Source: Internet

Author: User

Tags create index

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original2016-08-12 Even Zhang Dbaplus community

This article according to the Dbaplus community 69th period of the line to share organized, at the end of the book to send Oh ~

Instructor Introduction even Zhang

Data Development Engineer

Currently China Mobile Hangzhou Research and Development Center data Research and Development engineer, good at MySQL fault diagnosis, performance tuning, MySQL high-availability technology, has been China Telecom integrated platform Development and Operation Center DBA

Open source database MySQL is more susceptible to performance bottlenecks, often need to optimize MySQL database, and MySQL database optimization requires operations DBA and related development to participate, where MySQL parameters and server configuration optimization is mainly performed by operations DBAs, development needs to be optimized from the data type, index optimization, SQL optimization Three aspects consider MySQL database optimization problem, this sharing will from the development perspective, see how to achieve MySQL database optimization.

This sharing outline:

An example
Data type optimization
Index optimization
SQL optimization

One example

Number of rows that the database needs to process: 189444*1877*13482~~~479 billion

If an appropriate index is added to the associated field:

Number of rows The database needs to process: 368006*1*3*1~~~110

MySQL is usually a request for a thread whose thread_handling is one-thread-per-connection, so a SQL request can only take advantage of one CPU

By indexing, the number of rows the database needs to process is reduced by 4 orders of magnitude, while waiting for half an hour in the first case does not necessarily run out of the results, but the second situation can get the desired result in the second range. As can be seen from this example, MySQL database optimization is very important, an unreasonable SQL can lead to service exceptions.

Development requires a view of MySQL execution plan and profile tools:

EXPLAIN SELECT ...
EXPLAIN EXTENDED SELECT ...
Profile Tool

SET profiling = 1;

Show Profiles;

--Displays the SQL statement of the recently sent MySQL service

Show profile;

--Displays detailed procedural information for the most recent single SQL statement

Show profile all for query 61;

--Show all relevant information

Second, the Data type optimization

To select a data type:

STEP1: Determine the appropriate large type, such as numbers, strings, time, etc.;
STEP2: Choose a specific type, the same large type of different subtype data storage length, range, the allowable precision is different, sometimes there are some special behavior and properties.

Principles of Universal application:

Use a small, simple and appropriate data type;
For variable-length string varchar, only the space that is really needed is allocated;
use enum carefully;
Define the identity column as much as possible using an integral type;
Use the same data type to store similar or related values, especially the columns used in the association condition.

Core principle: Concrete problem concrete analysis. Some specific business scenarios are not suitable for applying universally used principles.

>>>>

Use a small, simple, and appropriate data type:

CASE1: If you only need to save 0-200,tinyint unsigned better.
Because smaller data types require fewer disks, less memory and CPU cache, and less CPU cycles to process.
CASE2: Use int instead of varchar (15) to store the IP address.
Because character sets and collation rules (collations) make characters more complex than integer comparisons.
CASE3: Uses MySQL built-in types (date, time, DateTime, and so on) instead of strings to store dates and times.
CASE4: Stores the MD5 value of the password with char, because the MD5 of the password is a fixed-length value.

>>>>

For variable-length string varchar, only the space that is really needed is allocated:

The space overhead for storing ' zyhy ' using varchar (4) and varchar (200) is the same, but using the shorter column varchar (4) has the following advantages:

Because MySQL typically allocates a fixed-size block of memory to hold internal values, longer columns consume more memory, which is especially bad when using temporary tables for memory, or when sorting with disk temporary tables.

Therefore, it is recommended to allocate only the space that is really needed.

>>> >

Use enum with caution

MySQL is very compact when storing enum enumerations, and is compressed into one or two bytes based on the number of list values. MySQL internally saves each value in the list as an integer and holds the lookup table for the number-string mapping relationship in the table's. frm file. An enumeration field is sorted by an internally stored integer rather than a defined string.

The result of select E + 0 from Enum_test shows that MySQL internally saves each value in the list as an integer and can be used in arithmetic operations with integers.

The result of select E from Enum_test order by E shows that the sorting result is consistent with the order of the table, and if it needs to be sorted alphabetically by word A, it needs to be handled in an additional way, such as:

Define the enumeration columns in the order in which they are needed;
Use the field () function to specify the sort order in the query, but this causes MySQL to not be able to use the index to eliminate sorting.

Enum advantages and disadvantages compared to varchar:

Advantages: The data is compact, the storage is an integer, occupies a small space, as an associated field, the efficiency is much higher than the varchar type;
Disadvantage: The string list is fixed, adding or removing strings must use ALTER TABLE, and if the added string is not at the end of the list, you will need to rebuild the entire table to complete the modification. Because an enum is saved as an integer, a lookup must be made to convert to a string, with some overhead when it is necessary to convert to a string. In some specific cases, it is possible to associate a varchar column with an enumeration column more slowly than a varchar self-association.

>>>>

Define an identity column with an integral type as much as possible

Because the execution calculation and comparison of the plastic data are very fast;
It is not recommended to use random strings such as UUID as identity columns, because random strings are arbitrarily distributed in large spaces, causing the INSERT and select statements to become slow.

>>>>

Store similar or related values using the same data type, especially for columns used in association conditions

Because mixing different data types can cause performance problems, there is a resource consumption of data type conversions in the associated condition.
Stealth type conversions can cause difficult-to-find errors when comparing operations.

>>>>

An explanation about specifying the width of an integer type

MySQL can specify widths for integer types, such as int (11), but for most applications this does not make sense: it does not limit the legal scope of a value, but only specifies how many of the MySQL interactive tools (such as the MySQL command-line client) are used to display the number of characters. For storage and Computation, int (1) and int (20) are the same.

>>>>

About real types

MySQL supports both exact types (decimal, numeric), and imprecise types (float, double).
You can use decimal to store integers that are larger than bigint.
The CPU does not support direct computation of decimal, but the MySQL server itself computes the decimal in a high precision. The CPU directly supports native floating-point arithmetic, so the floating-point operation is significantly faster.
Consider using bigint instead of decimal, and multiply the values that need to be stored by the number of decimal places, When accurate to 0.01, all values are multiplied by 100 to be stored in the bigint, which avoids the problem of inaccurate floating-point storage computations and the high cost of decimal precision.

>>>>

Definition of NULL:

A missing unknown value, means "not has a value."

The result of any mathematical operation with null or NULL

Determines whether the value is equal to NULL, cannot be used simply =, but is null/is not NULL

Both 0 and the empty string are NOT null:

The difference between null and empty string

Inserts a null and an empty character a, which have different meanings:

INSERT a NULL: I don't know if this person has a telephone number;
INSERT a ': Make sure the person does not have a phone number;
Count (Table.column), MIN (), and SUM () ignore null, and COUNT (*) evaluates all rows that contain null

Third, index optimization

>>>>

Index type

Categorized by Data storage:

Clustered index: The data row is actually stored in the leaf page of the index. That is, the data rows and adjacent key values are stored together in a compact manner.
Secondary index (nonclustered index): The leaf node of the two-level index contains the primary key column of the reference row (it does not point to the physical location of the row, but the primary key value of the row). A secondary index requires two index lookups, not one at a time. (For InnoDB, adaptive hash indexes can reduce this duplication of effort)

Classification by index data structure:

B-tree Index
Hash index
Spatial Data Index (R-TREE)
Full-Text Indexing

InnoDB PRIMARY KEY index structure:

In InnoDB, the table data file itself is an index structure organized by B+tree, and the tree's leaf node data field holds the complete data record. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index.

InnoDB non-primary key index:

The secondary index of the InnoDB data field stores the corresponding record value and the value of the primary key corresponding to the record instead of the address.

>>>>

Indexing policy

Tables that are often associated with other tables should be indexed on the associated fields;
Fields that often appear in the WHERE clause, especially large table fields, should be indexed;
Frequent data manipulation of the table, do not create too many indexes, data insertion, update and delete will affect the index, too many indexes will cause the Insert Update delete operation is slow;
The index should be built on a highly selective field cardinality/rows as much as 1. The show index command looks at cardinality (the number of rows after which the index column has been weighed).
The index should be built on small sections, which are especially suitable for large text fields or even long fields, do not index, or create prefix indexes such as CREATE index name on table name (column name 1 (specified length), ...)
Delete useless indexes, such as duplicate indexes, unnecessary redundant indexes;
Design a reasonable index column order for composite indexes

Some of the concepts related to indexing are described below.

prefix index: A partial character at the beginning of an index to conserve index space and improve indexing efficiency.

Risk: The selectivity of the index is reduced.

For blob,text or very long varchar-type columns, the prefix index must be used.

Otherwise the error will be:

[ERR] 1170-blob/text column ' Blobtext ' used in key specification without a key length

The length of the prefix index has a trade-off point: Choose a prefix that is long enough to ensure high selectivity and not too long.

So how to calculate the selectivity of different prefix lengths:

The query shows that when the prefix length reaches 7, the length of the prefixes is increased, and the range of selectivity increases is already small.

Repeating index: An index that creates the same type in the same order on the same column. (SQL excerpt from "high-performance MySQL")

The equivalent of three duplicate indexes built.

MySQL needs to maintain duplicate indexes separately, and the optimizer needs to consider each one individually when optimizing queries, so repeating indexes can affect performance.

Redundant indexes:

CASE1: If you create an index (a, b) and then create an index (a), a redundant index is generated because index (a) is just the index (a, a) of the prefix index.
CASE2: Index (A), then create INDEX (A,ID), where ID is the primary key, and for InnoDB the primary key column is already contained in the two-level index, so this is also redundant.

When do I need redundant indexes?

Expanding an existing index causes it to become too large, affecting other query performance that uses that index.

For example, if you have an index on an integer column, and now you need to expand the index with a varchar column, you can consider a redundant index to meet the query requirements for different scenarios if you use a combination of integer and varchar columns that is more efficient than indexes that use integer columns alone.

Index column order:

In the multi-column B-tree index, the order of indexed columns means that the index is sorted first by the leftmost column, followed by the second column, ...

It is recommended that the highest-selectivity column be placed in the leftmost column of the index.

How to determine the more selective fields: (SQL Excerpt from high-performance MySQL)

The selectivity of customer_id was found to be higher.

The index column order is recommended for (customer_id, staff_id).

Overwrite Index

The index contains (or overrides) all values that require a query field.

Advantage:

Only need to read the index, you can access the data
Indexes are stored in order of column values, and sequential queries are faster than random IO.

Case:

When you initiate a query that is overwritten by an index, you can see information for "Using index" in the extra column of explain.

Scenarios where the index cannot be used

In some scenarios, the index does not take effect, such as:

When using like or regexp, start with%, i.e. "%***";
When a function is used in a field;
When the condition field type is inconsistent at join;
When a non-first index is used in a composite index;
Use! = and <> do not equal time;
When the index column is not independent.

Iv. SQL optimization

Where A separate column is used in a clause:

If the column in the query is not independent, the index is not used.

Associated Query Optimizations:

Make sure that the on or using clause has an index on the column. It is generally only necessary to create an index on the corresponding column of the second table in the association order.
The associated field types remain consistent.

like Matching Optimizations:

If the like parameter is a fixed string that begins with a wildcard character, MySQL may also use the index when doing a like comparison.

SELECT * from the customer where last_name like ' ma% ';

The index is used in the extra information display.

A string that starts with a wildcard character after like does not use the index

SELECT * from the customer where last_name like '%ma% ';

The Rows column shows 599 rows, which is the total number of the Customer table, so the index is not used.

Avoid unnecessary type conversions in sql:

SELECT * from Charge_record where phone=13990055761;

SELECT * from Charge_record where phone= ' 13990055761 ';

Select Specify the column to replace the SELECT *:

In some cases select * requires more resources to be wasted than a select specified column
If some columns contain a type such as text, select specifies the column to reduce the use of the network transmit buffer
If the SQL contains an order by, and the sort cannot take advantage of the used index, the extra fields will consume more sort_buffer_size.
Select specifies that the column makes it easy to use the overwrite index.

For example, the overlay index is used.

Sub-query Optimizations:

MySQL5.6, the subquery mostly iterates through outer table, performs a subquery for each record it returns, and the subquery does not have any indexes, resulting in a much slower subquery than the associated query (solution: Table join instead of subquery);
After MySQL5.6, the subquery is greatly optimized, the sub-query results are stored in the temporary table, so that the subquery is executed only once, and the optimizer also adds indexes to the derived table generated by the subquery, which makes the performance of the subquery robust optimized.

Once the "absolute truth": subqueries are much slower than associated queries. --no longer established.

Sub-query optimizations enable you to reduce multiple queries to access data multiple times.

Sometimes, however, a subquery may be faster than an associated query.

>>>>

GROUP by optimization:

The identity column grouping of a table is more efficient than grouping other columns.

SELECT Actor.first_name, Actor.last_name, COUNT (*) from Film_actor INNER JOIN actor USING (actor_id) GROUP by Actor.first_ Name, Actor.last_name;

After optimization:

SELECT Actor.first_name, Actor.last_name,count (*) from Film_actor

INNER JOIN actor USING (actor_id) GROUP by actor.actor_id;

Because actor.actor_id is the primary key, the grouping efficiency increases.

When a GROUP BY clause is used, the result set is automatically sorted by the grouped fields, and the GROUP BY clause can use the DESC or ASC keywords directly, so that the grouped result sets are sorted in the desired direction.

So: If there is no sorting requirement, you can add order by NULL to make MySQL no longer file sort, thus improving query efficiency.

>>>>

Union optimization:

Unless you need to eliminate duplicate rows, be sure to use union all, because without the all keyword, MySQL adds the distinct option to the temporary table, making it a very expensive unique check for the entire temporary table.

Because the temporary table generated by the Union cannot use the optimizer's optimization strategy, it is possible to write a copy of the Where, ORDER by, and limit clauses redundantly into each subquery directly.

Case:

If the order by, limit, and other clauses are redundant to write a copy to each sub-query.

The cardinality of the order is effectively reduced, thereby increasing efficiency.

Bibliography :high-performance MySQL

Q&a

Q1: is this a product? Isn't that a Cartesian product?

A1: This is the product, but not the Cartesian product. The Cartesian product is the product of the total number of rows in a table, and this product is a nested product.

Q2: in order to optimize the index, the MySQL single table over how much will consider the table or to achieve its performance bottleneck?

A2: MySQL single-sheet over billion almost reached the performance bottleneck, but also with the help of NoSQL query efficient, put hot data in nosql, reduce MySQL pressure.

Q3: There are several select on the online library, the execution time reaches thousands or even tens of thousands of seconds, but I connect the database executes only 1 seconds more, show processlist shows as Waut to net Max_net_package I have set to 1 g Server-side network no problem, how to troubleshoot this problem?

A3:1, check whether max_allowed_packet This parameter is large enough and effective; 2, if there are other requests on the line will block those select;3, monitoring the MySQL service CPU IO memorybandwidth, and so on.

Q4: What is the operating mechanism of flush table in MySQL? (Locking or what?) Before because of a problem with Mysqldump's backup online, a database outage occurred: there was a long line of SQL that was executing on the Mysqlldump script, which resulted in the subsequent SQL being in wait to flush table causing a lot of waiting to append a problem ~ In addition to the Flush Table Hermit command at the time of backup, there is nothing to do with implicit flushtable, then there is a good idea to know fluh table implementation principle, this I checked a lot of information have not found.

A4:Flush adds a shared lock, and backup generally has flush table, because data integrity is guaranteed.

Best practices for MySQL database optimization in China Mobile

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More