How to efficiently create a INNODB engine-based table with MySQL

Source: Internet
Author: User
Tags one table

2016-05-27 Zhao Wei Database developer

A user asked us why the following table statement will fail to execute, the error is "Row size too large ..."
Let me start with this example to talk about how to effectively build a table using MySQL. The images used in this article were searched from the web, not made by myself, thanks to the author and owner of the picture.

CREATE TABLE bad_table (
col1 varchar () not NULL DEFAULT ' 000001 ',
col2 varchar (+) DEFAULT NULL,
Col3 varchar (+) DEFAULT NULL,
Col4 varchar (+) DEFAULT NULL,
COL5 varchar (+) DEFAULT NULL,
COL6 varchar (6) DEFAULT NULL,
COL7 varchar () DEFAULT NULL,
Col8 varchar () DEFAULT NULL,
COL9 varchar (9) DEFAULT NULL,
COL10 varchar (3) DEFAULT NULL,
Col11 varchar () DEFAULT NULL,
col12 varchar (2) DEFAULT NULL,
Col13 varchar (5) DEFAULT NULL,
Col14 varchar (1) DEFAULT NULL,
Col15 varchar (3) DEFAULT NULL,
Col16 decimal (24,6) DEFAULT NULL,
COL17 varchar (3) DEFAULT NULL,
Col18 decimal (24,6) DEFAULT NULL,
Col19 varchar () DEFAULT NULL,
COL20 varchar () DEFAULT NULL,
Col21 varchar () DEFAULT NULL,
Col22 varchar (1) DEFAULT NULL,
Col23 varchar (3) DEFAULT NULL,
Col24 varchar (3) DEFAULT NULL,
Col25 varchar (+) DEFAULT NULL,
Col26 Int (one) DEFAULT NULL,
col27 varchar () DEFAULT NULL,
Col28 varchar (1) DEFAULT NULL,
col29 varchar (+) DEFAULT NULL,
COL30 varchar (+) DEFAULT NULL,
Col31 varchar (+) DEFAULT NULL,
Col32 varchar (+) DEFAULT NULL,
col33 decimal (24,6) DEFAULT NULL,
COL34 varchar (1) DEFAULT NULL,
Col35 varchar (+) DEFAULT NULL,
col36 varchar (+) DEFAULT NULL,
col37 varchar (DEFAULT) NULL,
Col38 varchar (+) DEFAULT NULL,
col39 varchar (+) DEFAULT NULL,
COL40 varchar (+) DEFAULT NULL,
col41 varchar (DEFAULT) NULL,
Col42 varchar (3) DEFAULT NULL,
col43 varchar () DEFAULT NULL,
col44 varchar (3) DEFAULT NULL,
Col45 varchar () DEFAULT NULL,
col46 varchar () DEFAULT NULL,
col47 varchar (8) DEFAULT NULL,
col48 varchar (8) DEFAULT NULL,
col49 varchar (+) DEFAULT NULL,
COL50 varchar (2) DEFAULT NULL,
col51 varchar (2) DEFAULT NULL,
col52 varchar (2) DEFAULT NULL,
col53 varchar (2) DEFAULT NULL,
col54 varchar (2) DEFAULT NULL,
COL55 varchar (2) DEFAULT NULL,
COL56 varchar (3) DEFAULT NULL,
col57 varchar (2) DEFAULT NULL,
col58 varchar (2) DEFAULT NULL,
col59 varchar (3) DEFAULT NULL,
COL60 varchar (4) DEFAULT NULL,
Col61 varchar (4) DEFAULT NULL,
Col62 varchar (2) DEFAULT NULL,
col63 varchar (DEFAULT) NULL,
Col64 varchar (+) DEFAULT NULL,
COL65 varchar (+) DEFAULT NULL,
col66 varchar (+) DEFAULT NULL,
col67 varchar () DEFAULT NULL,
col68 varchar (2) DEFAULT NULL,
col69 varchar (2) DEFAULT NULL,
Col70 varchar (2) DEFAULT NULL,
col71 varchar (2) DEFAULT NULL,
col72 varchar (DEFAULT) NULL,
col73 varchar (1) DEFAULT NULL,
col74 varchar (DEFAULT) NULL,
col75 varchar (DEFAULT) NULL,
col76 varchar (2) DEFAULT NULL,
col77 varchar () DEFAULT NULL,
col78 varchar (+) DEFAULT NULL,
col79 varchar (4) DEFAULT NULL,
col80 varchar (1) DEFAULT NULL,
col81 varchar (1) DEFAULT NULL,
col82 varchar (+) DEFAULT NULL,
PRIMARY KEY (col1)
) Engine=innodb DEFAULT Charset=utf8 row_format=dynamic;

This is a real-world application from real customers, where all the information is hidden from the customer, the table name and column name are also transformed, the key information is removed, and all the comments are removed. This table is a poorly defined table, it has a lot of problems, this article analyzes its problems and how to better build the table.
This table looks visually problematic including
1. Too many columns
2. Many columns are varchar types with maximum length within 8
3. There are some varchar columns with a maximum length of hundreds of

First, too many columns result in a particularly long line, but usually not every field is updated frequently, so the update cost for those fields that are frequently updated is larger because the first large row causes a page to store fewer rows, even if you change one byte of a row, and you need to fully read the entire page where it is located. Then reading a page can hit fewer rows, you may need to read more pages to update multiple rows, and also need to write more pages. At the same time, it also means that the cost of recording binlog and transaction logs increases (which may be partially reduced by optimization). And the buffer hit ratio of any row is severely reduced, and the performance of queries and updates is degraded.

Also, for InnoDB, this means that when a table has a lot of rows (assuming tens of millions of rows), the B + tree will be higher in height, resulting in longer search times. Here is a simple calculation. Assuming that the table has 100 million rows, innodb_page_size=4k, because InnoDB requires a minimum of 2 rows of data on a page, the main table's B + tree will have a maximum of 50 million leaf node pages, assuming that the average per internal node page can store 200 records (each record store {index key, Sub-node pointer}), then 200^h1 >= 5*10^7, you can know the height of the tree h1=4 (the height of a single root node is 0), while the main table has about 58 million pages. It is assumed that the InnoDB B + Tree has a good self-balancing mechanism and does not produce any subtree with a height difference of about 1. If the table is designed so that each leaf node page can store 100 rows, then there are only 1 million leaf node pages, so 200^h2 >= 10^6, you can know h2=3, while the main table altogether has about 1.04 million pages. This shows that the density and hit rate of the line increases significantly, and the search path of the H2 tree results in an average of one page read per search.

Then make a calculation based on extreme assumptions: Assuming that the primary key is so long that only 10 records can be stored in the Inner node page, then H1 will become 8, and the B + Tree of the main table will have about 10 million internal node pages, a total of about 60 million pages; The H2 will become 6, the B + of the main table. The tree will have about 1 million internal node pages, about 2 million pages in total. The latter's query and update efficiency will be much higher than the former. That is, the key of any index should not be too long, otherwise the search path of the index will be longer, and buffer efficiency (hit rate) will be reduced. This is similar for secondary index, and because their index rows store the primary key, the size of the primary key also affects the efficiency of the secondary index.

Finally, InnoDB has some optimizations for the storage of index keys, including merging the duplicate fields on the same index record and not repeating the same fields in the adjacent index records, to some extent improving the results of the calculations above, but there is no intrinsic improvement. InnoDB B + tree structure, such as this article does not expand this to repeat.


2. According to the comments of the original SQL statements, it is known that these short varchar fields have the following which can be used in a more appropriate type.
From the storage space efficiency, each varchar short field (length less than 257) also requires an additional 1 bytes of space, while in the MYSQLD internal query processing stage, the numeric type of the field calculation and processing is more efficient than the string type. But these are basically trivial, not very different. More important is the ease of use, the use of appropriate types in the subsequent development work will be convenient, the existence of these numeric types must have a reason, should be used in the appropriate situation.

These fields, which are less than 4 in length, fall into these categories:
A. Used to store bool values, that is, storage (two-yuan state, yes/no). This should use the bool type
B. Used to store type values, i.e. storage (type, classification, category, level, rank), etc. This should use the enum type. Some people worry that enums cannot anticipate all possible enumeration values in advance, but in ALTER TABLE you can add more enumerated values in the way that the column definitions are modified in the line mode (no copy table is required), and the enumeration values are stored as numbers in the table store, with a maximum of 2 bytes per field. In particular, when querying, comparing, matching is a numeric comparison rather than a string, so efficiency can be improved a lot.

C. Fields with a length of less than 12
An unreasonable field is primarily used to store date time. DateTime should be used to ensure convenient and correct calculation and querying of dates and times, as well as proper indexing. Especially if the time zone is supported, it is very difficult to calculate correctly without using the date type.

3. For varchar columns with a maximum length of hundreds of, in the dynamic format they are stored on the off page, that is, they are not stored on the same page as the data rows, but are stored in a specific other page, so that reading these fields also requires access to more data pages. The principle is that it does not begin to speak in detail here.

In fact, in the dynamic format, this is stored for fields that are longer than 40 bytes in length. If the query requires these columns, the more such fields, the less efficient the query execution, so the business of the query statements do not always select*, especially in the table column has a lot of such a large number of paragraphs (varchar,blob,text), but rather to select the required columns, This avoids innodb from needlessly fetching these fields, knowing that each row of these fields may require multiple disk IO operations, especially if their page storage efficiency is low. Not all of these fields are offpage stored in the actual storage line, but are removed from the long-to-short to do the off page store until the line length is met (i.e., at least two rows in a page). In this case, the length of the line may be longer, if the maximum length of the varchar column is more than 40. The approximate structure of innodb pages and rows is as follows:

In this case, the user is experiencing a problem due to innodb_page_size=4k, so the maximum possible length of the table is greater than the maximum allowable size of the 4k page (slightly less than 2k). The solutions include:
1. Split the table, based on business requirements, according to the definition of the ER model to split, it is best to put the basic static columns in one table, often updated in another table.
2. If the business requirements require so many columns to be defined in this table, the columns with a maximum length of less than 40 bytes (UTF8 in this case) can be shortened and shortened until the table can be successfully built.

How to efficiently create a INNODB engine-based table with MySQL

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.