Old Leaf viewpoint: My opinion on the MySQL development norm

Source: Internet
Author: User

Most MySQL specifications can also be shared on the web, and what is shared here is what the old leaves personally think is more important, or easily overlooked, and easily confused.

1. Use InnoDB engine by default
"Lao Ye point of View" has been repeatedly appealed, InnoDB for almost 99% of the MySQL application scenario, and in the MySQL 5.7 system tables are changed to InnoDB, there is no reason to cling to myisam it.

In addition, the InnoDB table, which is frequently read and written, must be used as an explicit primary key with an integer type with an auto-increment/sequence feature.

" Reference ": [MySQL FAQ] Series-Why the InnoDB table is recommended to use the self-increment column master key.

2. Character Set selection Utf-8
"Old leaf View" If you want to save disk space, it is recommended to select Latin1. The recommended choice of utf-8 is usually for the so-called "versatility", but in fact the Utf-8 data submitted by the user can be stored in the Latin1 character set as well.

The trouble with storing utf-8 data with latin1 is that, if there is a Chinese-based search, it may not be 100% accurate (the old leaf himself simply tests the general Chinese finish search is not the problem, that is, the general Chinese comparison is not a problem).

The practice of storing utf-8 data with the latin1 character set is that the character set on the Web side (client side) is Utf-8, and the backend program is handled with Utf-8, but Character_set_client, Character_set_connection, Character_set_results, Character_set_database, character_set_server These are latin1, and the data table, the character set of the field is also latin1. Alternatively, the data table takes Latin1, and the SET NAMES LATIN1 can be executed after each connection.

" reference ": Talk about MySQL character set.

3, InnoDB table row record physical length of not more than 8KB
"Old leaf View" InnoDB's data page defaults to 16KB, based on the characteristics of B+tree, a data page needs to store at least 2 records. Therefore, when the actual storage length exceeds 8KB (especially the Text/blob column), the large column (large column) causes "Page-overflow storage", similar to "Row migration" in Oracle.

Therefore, if you must use large columns (especially the TEXT/BLOB type) and read and write frequently, it is best to split the columns into child tables and not store them with the primary table. If it's not too frequent, consider keeping it in the main table.

Of course, if the innodb_page_size option is modified to 8KB, the row record physical length is not recommended to exceed 4KB.

" Reference ": [MySQL Optimization case] Series-optimize storage efficiency for InnoDB table blob columns.

4. Whether to use partitioned tables
"Old leaf View" in some scenarios where it is obvious that you can improve performance or operational convenience after using partitioned tables, it is recommended that you use partitioned tables.

For example, the old leaves on the Zabbix database using the TOKUDB engine, and based on the time dimension of the partition table. This benefit is to ensure that the Zabbix daily application is not affected under the premise, convenient for administrators to routinely delete past data, only need to delete the corresponding partition, no need to perform a very slow delete to affect the overall performance.

" reference ": Migrating Zabbix databases to Tokudb.

5. Whether to use stored procedures, triggers
"Old leaf View" in some suitable scenarios, the use of stored procedures, triggers are also completely OK.

We used to use the storage to complete the game business logic processing, performance is not a problem, and once the requirements change, just modify the stored procedures, the cost of change is very low. We also use triggers to maintain a frequently updated table, and all changes to the table are synchronized to the other table (like materialized views in disguised implementations), and there is no performance problem.

Do not treat the MySQL stored procedures and triggers as a scourge, with good words, no problem, really encountered problems and optimization is not too late. In addition, MySQL because there is no materialized view, so the view can not use as little as possible.

6. Choose the right type
In addition to the common recommendations, there are a few other points: "Lao Ye's View".

6.1, with int unsigned storage IPV4 address, with Inet_aton (), Inet_ntoa () to convert, basically do not need to use char (15) to store.

6.2, enumeration type can use Enum,enum internal storage mechanism is the use of tinyint or smallint (not Char/varchar), performance is not bad, remember never use CHAR/VARCHAR to store enumerated data.

6.3, also a "common sense misleading" which has been spreading earlier, it is suggested to replace DateTime with timestamp. In fact, starting from 5.6, it is recommended to choose DateTime Storage datetime, because it is more usable than timestamp, the physical storage is only 1 bytes more than timestamp, the overall performance loss is not very large.

6.4. In all field definitions, the default is not a null constraint unless it must be null (but I can't think of any scenario where a null value must be stored in the database, which can be represented by zero). When you count () statistics on the field, the results are more accurate (null is not counted by count), or when you perform a WHERE column is NULL retrieval, you can also quickly return the results.

6.5, as far as possible do not directly SELECT * Read all fields, especially when there are text/blob large columns in the table. It might not have been necessary to read these columns, but the lazy writing of SELECT * resulted in the memory buffer pool being washed out by these "junk" data to really need to buffer the hot spot data.

7. About Index
In addition to the common recommendations, there are a few key points to the old leaf view:

7.1, more than 20 lengths of string columns, it is best to create a prefix index rather than an entire column index (for example: ALTER TABLE T1 ADD Index (User (20)), can effectively improve the index utilization, but its disadvantage is that the column is not used to sort the prefix index. The length of the prefix index can be based on the statistics of the field, which is generally slightly larger than the average length.

7.2. Periodically check and delete duplicate indexes with Pt-duplicate-key-checker tool. For example, index idx1 (A, B) indexes have already covered index IDX2 (a), so you can delete the IDX2 index.

7.3. When there is a multi-field federated index, the field order of the filter in the Where does not need to be consistent with the index, but it must be consistent if there are sorting or grouping.

For example, if you have a federated index idx1 (A, B, c), the following SQL can be used for full indexing :

SELECT ... WHERE B =? and C =?  and a =?; --Note that the field order in where does not match the order of the indexed fields select ... WHERE B =? and a =? and C =?; SELECT ... WHERE a =? and b in (?,?) and C =?; SELECT ... WHERE a =? and B =? ORDER by C; SELECT ... WHERE a =? and b in (?,?) ORDER by C; SELECT ... WHERE a =? ORDER by B, C; SELECT ...  ORDER by A, B, C; --can be sorted by a federated index

The following SQL can only be used for partial indexes, or can take advantage of the ICP feature :

SELECT ... WHERE B =? and a =?; --can only use (A, B) part Select ... WHERE A In (?,?) and b =?; --explain shows that only partial indexes (A, b) are used, while there are icpselect ... WHERE (a between? and?) and b =?; --explain shows that only partial indexes (A, b) are used, while there are icpselect ... WHERE a =? and b in (?,?); --explain shows that only partial indexes (A, b) are used, while there are icpselect ... WHERE a =? and (b between? and?) and C =?; --Explain displays the entire index (a, B, c) with Icpselect ... WHERE a =? and C =?; --explain shows that only part (a) of the index is used, while there are icpselect ... WHERE a =? and C >=?; --explain shows that only part (a) of the index is used, while ICPICP (index condition pushdown) is a new feature of MySQL 5.6, and its mechanism allows other parts of the index to participate in filtering, Reducing data transfer and back-table requests between the engine layer and the server layer can often significantly improve query efficiency.

The following SQL does not fully work with the index :

SELECT ... WHERE b =?; SELECT ... WHERE B =? and C =?; SELECT ... WHERE B =? and C =?; SELECT ... ORDER by B; SELECT ... ORDER by B, A;

As can be seen from a few examples above, the " common sense misleading " that previously emphasized where the conditional field order is consistent with the index order to use the index does not need to be strictly adhered to.

In addition, sometimes the query optimizer specifies that the index or execution plan may not be optimal, either by manually specifying the optimal index, or by modifying the session-level Optimizer_switch option to turn off some features that result in worse results (such as the index merge is usually a good thing, But it has also been worse when using the index merge, either forcing one of the indexes to be specified, or temporarily turning off the index merge feature.

8. Other 8
.1, even if the index-based conditional filtering, if the optimizer realizes that the total number of data that needs to be scanned is more than 30% (Oracle seems to be 20%,mysql is currently 30%, may be adjusted later), it will directly change the execution plan for the full table scan, no longer use the index.

8.2, multi-table join, to filter the most (not necessarily the smallest amount of data, but only add the where condition after the most filtering of that) table selected as the driver table. In addition, if there is a sort after join, the sort field must belong to the driver table to complete the sorting using the index on the driver table.

8.3. In most cases, the cost of sorting is usually higher, so if you see a Using filesort in the execution plan, prioritize creating a sort index.

8.4, using pt-query-digest regular analysis slow query log, and combined with Box anemometer to build slow query log analysis and optimization system.

" Reference ": [MySQL FAQ] Series-explain What information in the results should be of interest.

Note: If not specifically stated, the above specifications are recommended for MySQL 5.6 and previous versions (and mainly 5.6 versions, especially for the two locations of the ICP feature, DateTime change). Versions 5.7 and later may change, and individual specification recommendations need to be adjusted accordingly.

Old Leaf viewpoint: My opinion on the MySQL development norm

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.