About the optimization of the InnoDB storage engine text and BLOB types

Source: Internet
Author: User
Tags compact mysql in

In the database optimization, we see some of the tables are designed to use text or BLOB fields, if the single table storage space reached nearly hundred g or dozens of G, this situation again to change and optimization is very difficult

First, Introduction

In order to understand the performance impact of large segments, it is necessary to know how the InnoDB storage engine is handled:

1, some knowledge points

1.1 Prior to the InnoDB 1.0.x release, the InnoDB storage engine provided Compact and Redundant(Redundant 格式是为兼容之前版本而保留的) two formats to hold row record data, compact and redundant together calledAntelope (羚羊)

For large characters such as Blob,text,varchar (5120), InnoDB will only store the first 768 bytes in the data page, while the remaining data will be stored in the overflow segment (when overflow occurs), the maximum 768 bytes is convenient to create the prefix index/prefix Index, the rest of the content is stored in the extra page, even if it's just a single byte. Therefore, the shorter the length of all columns , the better

    • Large volumes may waste a lot of space in the InnoDB. For example, if the stored field value is only one byte longer than the row requirement, the entire page is used to store the remaining bytes, wasting most of the page space. If you have a value that only slightly exceeds the size of 32 pages, you actually need to use 96 pages

    • Extended storage disables adaptive hashing because the entire length of the column needs to be fully compared to find out if it is the correct data (the hash helps InnoDB to find the "Guess location" very quickly, but must check if the "Guess location" is correct). Because the adaptive hash is a complete memory structure and directly points to the "most" frequently accessed page in the buffer pool, the adaptive hash is not available for extended storage space

650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/268981/201704/268981-20170416165548431-1847919989. JPG "style=" border:0px; "/>

1.2 MySQL 5.1 Innodb_plugin introduces a new file format : Barracuda (梭子鱼) The file format has the new two line formats: compressed anddynamic,两种格式对blob字段采用完全溢出的方式,数据页中只存放20字节,其余的都存放在溢出段中,因此,强烈不建议使用BLOB、TEXT、超过255长度的VARCHAR列类型

650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/268981/201704/268981-20170416172232149-2104583353. JPG "style=" border:0px; "/>

The 1.3 InnoDB page size defaults to the 16KB,INNODB Storage engine table for the Index organization table, the leaf node at the bottom of the tree is a doubly linked list, so there should be at least two rows of records in each page, which determines that InnoDB cannot exceed 8k when storing a row of data, but in fact it should be smaller , because there are some INNODB internal data structure to be stored, after 5.6 version, the new option Innodb_page_size can be modified, in the previous version of 5.6, can only modify the source code recompile, but it is not recommended to modify this configuration

The data page of 1.4 InnoDB will reserve 1/16 of the space when new data is written, the reserved space can be used for subsequent record writing, reduce the overhead of the frequently added data page, be limited to the InnoDB storage mode, and if it is written sequentially, the most ideal case The fill rate for the data page is 15/16, but generally there is no guarantee of a full sequential write, so the data page's fill rate is typically 1/2 to 15/16. Therefore, each InnoDB table is best to have a self-increment column as the primary key , so that the new record is written as much as possible in order; When the data page fill rate is less than 1/2, InnoDB shrinks, freeing free space

Compared to redundant, the 1.5 compact line format probably saves up to 20% of storage space, and compressed can save up to 50% of storage space compared to the compact, but it can cause TPS to drop by 90%. It is strongly deprecated to use the compressed line format

1.6 The BLOB data type is used, is it necessarily stored in the overflow segment? Usually we think that the storage of large objects such as BLOBs will store data outside the data page, in fact, the key point is to see whether a page can store two rows of data, the BLOB can be completely stored in the data page (single line length not more than 8096 bytes), The varchar type may also be stored in the overflow page (the single line is longer than 8096 bytes and the first 768 bytes are stored in the data page)

1.7 MySQL in the operation of the data, in page units, whether it is updated, insert, delete a row of data, you need to read that row of data page in memory, and then in the operation, so there is a hit rate problem, if a page can be relatively large enough to hold a row, Then the hit rate will be relatively high, performance will be improved

1.8 The page of a blob, text, or long varchar column stored in Off-page is exclusive and cannot be shared. Therefore , it is strongly not recommended to use multiple long columns in a table

1.9 MySQL 5.6 is the default Compact row format and is currently the most used row format. The user can SHOW TABLE STATUS LIKE‘table_name‘ view the row format used by the current table using a command, where the row_format column represents the type of row record structure currently in use

650) this.width=650; "src="/img/fz.gif "alt=" Copy Code "style=" border:0px; "/>

mysql>desc db_page;+-----------------+----------------+----------------+---------------+-------------- -----+-----------------+| field           |  type           | null            | Key            | Default           | Extra            |+-----------------+----------------+------------- ---+---------------+-------------------+-----------------+| id               | int (one)         |  NO             | PRI            |                    | auto_increment  | |  title           | varchar (  )  | NO             |                |                    |                  | |  name            | varchar ( )   | YES            |                |                   |                  | |  content         | text            | YES             |               |                    |                  |+------ -----------+----------------+----------------+---------------+-------------------+-----------------+mysql> show variables like  "Innodb_file_format", +-------------------------+-----------------+|  Variable_name           | value            |+-------------------------+-----------------+| innodb_file_format       | barracuda       |+-------------------------+------------ -----+mysql>show table status like  "Db_page"  \g*************************** 1.  row ***************************            name: db_page         engine: innodb         Version: 10     Row_format: Compact            Rows: 2 Avg_row_length: 8192     Data_length: 16384Max_data_length: 0   Index_length: 0       data_free: 0 auto_increment: 3    create_time: 2017-03-07  13:30:19    update_time:      check_time:        collation: utf8_general_ci       checksum:   Create_options:         Comment:     block_format: original

650) this.width=650; "src="/img/fz.gif "alt=" Copy Code "style=" border:0px; "/>

In MSYQL 5.7.9 and later versions, the default line format innodb_default_row_format is determined by the variable, and its default value is DYNAMIC , or can be specified at the time of CREATE table ROW_FORMAT=DYNAMIC .

Note that if you want to modify the row mode of an existing table to compressed or dynamic , you must first set the file format to Barracuda: set global innodb_file_format=Barracuda; , and then ALTER TABLE tablename ROW_FORMAT=COMPRESSED; modify it to take effect, or the modification is invalid without prompting

Ii. effects on the types of large text/blob such as the 2.1 compact

Variable length large field types include Blob,text,varchar, where the varchar column value is greater than a certain number of N will also save the overflow page, under the Latin1 character set, the N value can be calculated as follows: InnoDB block size defaults to 16KB, because the InnoDB storage engine table is indexed to organize tables , the leaf node at the bottom of the tree is a doubly linked list, so there should be at least two rows of records in each page, which determines that the InnoDB cannot exceed 8k when storing a row of data, minus the number of bytes of other column values, which is about equal to N. For InnoDB, memory is extremely valuable, if the 768-byte length of the blob is placed on the data page, although you can save some IO, but can cache the number of rows is less, that is, the index value can be cached less, reducing the index efficiency

2.2 Dynamic

The dynamic row format, whether the column store is placed on the Off-page page, depends primarily on the row size, which places the longest column in the row into off-page until the data page can hold the next two rows. Text/blob column <=40 bytes is always stored on the data page. This way you can avoid putting too many large columns into B-tree Node in the compact, because the dynamic format thinks that as long as the large column values have some data in the off-page, it is more efficient to put the entire value into the off-page.

The compressed physical structure is similar to dynamic, but the data rows of the table are compressed with the zlib algorithm. In the case of Long BLOB column type is more, can reduce the use of off-page, reduce storage space (generally around 40%), but the higher the requirements of the Cpu,buffer pool may be stored in both compressed and non-compressed version of the data, so also occupy part of the memory. Here the MySQL 5.6 Manual Innodb-compression-internals speaks very clearly.

In addition, because ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPRESSED are from ROW_FORMAT=COMPACT the change, so they deal with CHAR type storage in the same way as the COMPACT.

Three. Optimized for Text/blob-type field access

MySQL's Io is in page, so unnecessary data (large print fields) are also read into memory along with the data that needs to be manipulated, which can cause problems due to the larger amount of memory (compared to other small pieces) that makes memory utilization worse, resulting in more random reads. From the above analysis, we have seen that the bottleneck of performance is due to the large number of segments stored in the data page, resulting in poor memory utilization, resulting in excessive random reading, then how to optimize the impact of the large segment

3.1 Compression & Consolidation

A, InnoDB provides the Barracuda file format, the large number of fields are fully stored in the overflow segment, the data segment is only 20 bytes, which greatly reduces the space consumption of the data page, so that a data page can hold more rows of data, but also improve the memory hit rate (for this instance, The length of most lines does not exceed 8k, so the amplitude of the optimization is limited), if the data of the overflow segment is compressed, then the space use will be greatly reduced, the specific compression ratio can be set key_blok_size to achieve.

b, you can compress large print segments with compress () and then save them as blobs, or compress them in the application before sending them to MySQL

C, a table has multiple class blob fields, combine them as <TEXT><f_big_col1>long..</f_big_col1> <f_content>long..</f_content></TEXT> , and then compress the storage

D. Avoid using text if the expected length range of varchar is satisfied

3.2 Splitting

Split the main table into a one to two association table, the large number of fields alone into another table, the single line length becomes very small, page density is much larger than the original table, so that enough rows can be cached, buffer pool hit rate will increase, An application that requires extra maintenance is a large piece of sub-table, and can also be optimized by overwriting the index, separating the index from the original table structure, and changing from a data page with a less-dense access to a very dense index page, and a random io to sequential IO

Summary: or let a single page can hold enough rows, constantly prompt memory hit rate, from the database underlying storage principle, can be more profound optimization of the database

In summary, if you do need to store blob, TEXT, and long varchar columns in the InnoDB table in the actual business, here are some suggestions:

    • Whenever possible, serialize, compress, and store all data in the same column to avoid multiple off-page

    • If the expected length range of varchar is satisfied, avoid using the text

    • If you cannot consolidate all columns into a single column, you can retreat to the next, and split into multiple sub-tables according to the maximum length of each column, as far as possible the total length of each child table is less than 8KB, reducing the frequency of off-page occurrence


About the optimization of the InnoDB storage engine text and BLOB types

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.