A brief discussion on MySQL optimization and design rules

Last Update:2017-06-02 Source: Internet

Author: User

Tags logical operators mysql in

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

When the amount of data is large, the database will be optimized for slow databases. At this time are to specific circumstances, specific business needs to optimize.

However, some steps and rules should be appropriate for all situations. Here a comprehensive online search for a simple analysis of the information.

The first optimization of your SQL and index;

Second plus cache, Memcached,redis;

After the third has done, or slow, the decision from the copy or master copy, read and write separation, can be done in the application layer, high efficiency, can also use three-party tools, third-party tools recommended 360 Atlas, the other is either inefficient, or no one to maintain;

If the above are done or slow, do not want to do slicing, MySQL comes with partition table, first try this, for your application is transparent, no need to change the code, but the SQL statement needs to be optimized for the partition table, SQL conditions to take the partition condition of the column, so that the query to locate a small number of partitions, Otherwise, all partitions will be scanned;

If the above are done, then do the vertical split, in fact, according to the coupling degree of your module, a large system is divided into a number of small systems, that is, distributed systems; the second is horizontal segmentation, for the data volume of the table, this step is the most troublesome, most can test the technical level, to choose a reasonable sharding Key, in order to have a good query efficiency, table structure to change, do a certain amount of redundancy, the application should be changed, SQL as far as possible with sharding key, the data will be targeted to the table to check, rather than scan all the table;

Design Rules

Rule 1: The MyISAM storage engine can be selected in general, and the InnoDB storage engine must be used if transaction support is required.

Note: The MyISAM storage Engine B-tree index has a big limit: the sum of the lengths of all fields participating in an index cannot exceed 1000 bytes. In addition MyISAM data and indexes are separate, while INNODB data storage is sorted by cluster (cluster) index, the primary key is the default clustered (cluster) index, so MyISAM although in general, the query performance is higher than InnoDB, However, InnoDB's query performance with the primary key condition is very high.

Rule 2: naming rules.

The database and table names should be the same as the business module names they serve
A class of tables that serve the same submodule should prefix or suffix the submodule name (or part of the word) as far as possible
The table name should contain the words corresponding to the stored data as far as possible
Field names should also be kept as much as possible relative to the actual data
The federated index name should contain as much as possible all the index key field names or abbreviations, and the order of each field name in the index name should match the index key in the index, and try to include an idx-like prefix or suffix to indicate that the period object type is an index.
Other objects, such as constraints, should include, as far as possible, the names of the owning table or other objects to indicate their relationship

Rule 3: database field type definition

Often need to calculate and sort CPU-consuming fields, should try to select more rapid fields, such as with timestamp (4 bytes, minimum 1970-01-01 00:00:00) instead of datetime (8 bytes, minimum 1001-01-01 00:00:00) , replacing floating-point and character types with integral types
Variable-length fields use varchar, do not use char
For binary multimedia data, pipelining queue data (such as logs), large text data is not placed in database fields

Rule 4: The business logic execution process must have an initial value in the table to be read. Avoid the value of the business read out as negative or infinite causes the program to fail

Rule 5: Do not need to obey the paradigm theory, moderate redundancy, let query minimize join

Rule 6: A large large segment with low frequency access splits the data table. Some large characters occupy a lot of space, access frequency is significantly less than other fields, this situation is split, frequent queries do not need to read big pieces, resulting in the waste of IO resources.

Rule 7: Large tables can consider horizontal splitting. Large tables affect query efficiency, there are many split methods according to business characteristics, such as data based on time increment, can be divided according to time. Data divided by ID can be split according to the number of id% databases.

I. Database indexing

Rule 8: the relevant index for business needs is determined by the where condition of the SQL statement constructed by the actual design, the business does not need to build an index, and does not allow more than one field in the federated index (or primary key). In particular, the field does not appear in a conditional statement at all.

Rule 9: one field or multiple fields that uniquely determine a record to establish a primary key or a unique index, you cannot uniquely identify a record and create a generic index for improved query efficiency

RuleNumber: The table used by the business, some records are few, or even only one record, in order to constrain the need to establish an index or set the primary key.

RuleOne: For values that cannot be duplicated, a field that is frequently used as a query condition should have a unique index (the primary key default unique index), and the criteria for that field in the query criteria be placed in the first position. There is no need to establish a federated index associated with the field.

RuleNumber: For a field that is queried frequently, its value is not unique, you should also consider establishing a normal index, the field condition in the query statement is placed in the first position, the same method for the Federated index Processing.

RuleNumber: When you access data through a non-unique index, you need to consider the density of records returned through that index value, in principle, the maximum density can not be greater than 0.2, if the density is too large, it is not appropriate to establish an index.

When the amount of data found through this index accounts for more than 20% of all the data in the table, the cost of establishing the index needs to be considered, and because the index scan produces random I/O, the efficiency is much lower than the sequential I/O of the full table sequential scan. This index may not be used when the database system optimizes query.

rule: Databases that require a federated index (or federated primary key) should be aware of the order of the indexes. The matching criteria in the SQL statement are also consistent with the order of the indexes.

Note: The incorrect flow of the index can also lead to serious consequences.

rules: Multiple field queries in a table are query criteria, do not contain other indexes, and field union values are not duplicated, and a unique federated index can be built on these fields, assuming the index fields (a1,a2,... an), the query condition (A1 Op val1,a2 op Val2,... am op valm) m<=n, the index can be used, and the position of the field in the query condition is consistent with the position of the field in the index.

Rule#: The founding principle of a federated index (the following is assumed to establish a federated index on the field a,b,c of the database table (A,B,C))

The fields in the federated index should be as much as possible in order to filter the data from many to fewer, i.e. the field with the largest variance should be the first field in the House
Indexing as much as possible is consistent with the condition order of the SQL statement, so that the SQL statement is as conditional as the entire index as possible, avoiding the query as part of the index (especially if the first condition is inconsistent with the first field of the index)
where A=1,where a>=12 and A<15,where a=1 and B<5, where A=1 and b=7 and c>=40 can be used for this federated index for the condition, and these statements where b=10,wher E c=221,where b>=12 and c=2 are not able to use this federated index.
When the database fields that need to be queried are all reflected in the index, the database can directly query the index to get the query information without scanning the whole table (this is called key-only), can greatly improve the query efficiency. Index can be used when A,AB,ABC is associated with a query in another table field
Indexes can be used when A,AB,ABC order by or group is not B,C,BC,AC for sequential execution
Table scans and sorting may be more efficient than using federated indexes when:

A. The table has been organized by index
B. A large proportion of all data in the data station being queried.

Rule#: Important business when accessing data tables. However, when data cannot be accessed through an index, it should be ensured that the number of sequential access records is limited and in principle no more than 10.

Two. Query statement and Application system optimization

rule: reasonably construct query statement

Insert statement, according to the test, bulk insertion of 1000 at the most efficient, more than 1000, to split, multiple times the same insert, should be combined batch. Note the length of the query statement is less than the mysqld parameter Max_allowed_packet
The performance order of various logical operators in a query condition is and,or,in, so you should avoid using in large collections in the query condition as much as possible.
Always drive large recordsets with small result sets, because in MySQL, only the nested join is a join method, which means that the MySQL join is implemented through nested loops. Reduce the number of loops in a nested loop by using the principle of small result sets to drive large recordsets to reduce the total number of IO and CPU operations
Optimize the inner loop of the nested join as much as possible.
Take only the columns you need and try not to use SELECT *
Use only the most efficient filter fields, where the filter condition in the WHERE clause is less good
Avoid complicated joins and subqueries as much as possible
MySQL in concurrency this is not too good, when the concurrency is too high, the overall performance will drop sharply, which is mainly related to MySQL internal resource contention lock control, myisam with table lock, InnoDB better use row lock.

rule: Optimization of application system

Using the cache wisely, the performance increase is an order of magnitude for the less-changed portions of active data that are cached in memory through the application layer's cache.
Reduce the number of IO times by merging the same query repeatedly.
Transaction Correlation Minimum principle

Content to know Zhuqz and Tengyun

A brief discussion on MySQL optimization and design rules

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More