[Mysql] storage and optimization of text blob large fields in InnoDB Storage engine, innodbblob
Recently, during database optimization, we can see that some tables use text or blob fields in design, and the storage space of a single table has reached nearly 100 GB, it is very difficult to change and optimize this situation.
I. Introduction
To understand the impact of large fields on performance, we must know the processing method of the innodb Storage engine:
1. Some knowledge points
1.1 before InnoDB 1.0.x, the InnoDB Storage engine providesCompact
AndRedundant (the Redundant format is reserved for compatibility with previous versions)
The compact and redundant formats are used to store row record data.Antelope (Antelope)
For large fields such as blob, text, and varchar (5120), innodb only stores the first 768 bytes on the data page, the remaining data is stored in the overflow segment (applicable when overflow occurs). The maximum value of 768 bytes is to facilitate the creation of prefix indexes and prefix indexes, the rest of the content is stored in an additional page, even if only one byte is added. Therefore,The shorter the length of all columns, the better.
- Large fields may waste a lot of space in InnoDB. For example, if the field value is only one byte more than the row value, the entire page is used to store the remaining bytes, which wastes most of the page space. If there is a value that only slightly exceeds the size of 32 pages, you actually need to use 96 pages
- Extended storage disables adaptive hashing, because the entire length of the entire column needs to be compared to find the correct data (hash helps InnoDB quickly find the "Location of speculation ", but you must check whether the "Location of the guess" is correct ). Because Adaptive Hash is a complete memory structure and direct to the Buffer Pool to access the "most" frequently accessed pages, Adaptive Hash cannot be used for extended storage space.
1.2 innodb_plugin MySQL 5.1 introduces a newFile Format:Barracuda)
The file format has two new row formats:compressed
AndDynamic, the two formats completely overflow blob fields. The data page only stores 20 bytes, and the rest are stored in overflow segments. ThereforeWe do not recommend that you use BLOB, TEXT, or VARCHAR column types with a length greater than 255.;
1.3 The default page size of innodb is 16 KB. the innodb Storage engine table is the index organization table, and the leaf node at the bottom of the tree is a two-way linked list. Therefore, each page should have at least two rows of records, this determines that innodb cannot store more than 8 KB of data in a row, but it should be smaller, because there are some InnoDB internal data structures to be stored. Versions later than 5.6, the new option innodb_page_size can be modified. In versions earlier than 5.6, you can only modify the source code and re-compile it. However, this configuration is not recommended.
1.4 when new data is written to the data page of InnoDB, 1/16 of the space is reserved. The reserved space can be used for subsequent writing of new records, reducing the overhead of frequent new data pages, limited by the InnoDB Storage method, if data is written in sequence, the data page filling rate is 15/16 in the ideal case, but generally there is no way to ensure full sequential writing, therefore, the filling rate of data page is generally 1/2 to 15/16. ThereforeIt is recommended that each InnoDB table have an auto-incrementing column as the primary key.So that new records are written in sequence as much as possible. When the data page filling rate is less than 1/2, InnoDB will shrink and free up space.
Compared with REDUNDANT, the 1.5 COMPACT row format can save about 20% of the storage space, while COMPRESSED can save about 50% of the storage space compared with COMPACT, but may cause a 90% decrease in TPS. ThereforeCOMPRESSED row format is strongly not recommended
1.6 If the blob data type is used, will it be stored in the overflow section? We usually think that the storage of large objects such as blob will store data outside the data page. Otherwise, the key point is whether a page can store two rows of data, blob can be fully stored on data pages (a single row cannot exceed 8096 bytes), and varchar types may also be stored on overflow pages (a single row can contain more than 8096 bytes, the first 768 bytes are stored on the data page)
1.7 when operating data, mysql reads the page of the row of data, whether it is Update, insert, or delete, into the memory, and then performs operations, in this way, there is a hit rate problem. If a page can store enough rows, the hit rate will be relatively high and the performance will be improved.
1.8 The pages of BLOB, TEXT, or long VARCHAR columns stored in off-page are exclusive and cannot be shared. ThereforeWe strongly recommend that you do not use multiple long columns in a table.
In MySQL 1.9, the default FORMAT is Compact, which is also the most commonly used row format. You can runSHOW TABLE STATUS LIKE'table_name'
To view the row format used by the current table.Row_formatColumn indicates the type of the currently used Row record Structure
mysql>desc db_page;+-----------------+----------------+----------------+---------------+-------------------+-----------------+| Field | Type | Null | Key | Default | Extra |+-----------------+----------------+----------------+---------------+-------------------+-----------------+| id | int(11) | NO | PRI | | auto_increment || title | varchar(100) | NO | | | || name | varchar(100) | YES | | | || content | text | YES | | | |+-----------------+----------------+----------------+---------------+-------------------+-----------------+mysql>show variables like "innodb_file_format";+-------------------------+-----------------+| Variable_name | Value |+-------------------------+-----------------+| innodb_file_format | Barracuda |+-------------------------+-----------------+mysql>show table status like "db_page" \G*************************** 1. row *************************** Name: db_page Engine: InnoDB Version: 10 Row_format: Compact Rows: 2 Avg_row_length: 8192 Data_length: 16384Max_data_length: 0 Index_length: 0 Data_free: 0 Auto_increment: 3 Create_time: 2017-03-07 13:30:19 Update_time: Check_time: Collation: utf8_general_ci Checksum: Create_options: Comment: Block_format: Original
In msyql 5.7.9 and later versions, the default row format isinnodb_default_row_format
The default value isDYNAMIC
You can also specifyROW_FORMAT=DYNAMIC
.
Note: If you want to modify the row mode of an existing tablecompressed
Ordynamic
, You must first set the file format to Barracuda:set global innodb_file_format=Barracuda;
, And then useALTER TABLE tablename ROW_FORMAT=COMPRESSED;
Modify to take effect, otherwise the modification is invalid but there is no prompt
Ii. Impact on large fields such as TEXT and BLOB: 2.1 compact
The types of variable-length fields include blob, text, and varchar. If the length of the varchar column is greater than N, the overflow page is saved. The N value under the latin1 character set can be calculated as follows: the default size of innodb blocks is 16 KB. Because the innodb Storage engine table is an index organization table and the leaf node at the bottom of the tree is a two-way linked list, each page should have at least two rows of records, this determines that innodb cannot store a row of data more than 8 k, minus the number of bytes occupied by other column values, which is approximately equal to N. For InnoDB, the memory is extremely precious. If we put 768 bytes of blob on the data page, although it can save some IO, the number of rows that can be cached is reduced, that is, the number of indexes that can be cached is reduced, reducing the index efficiency.
2.2 dynamic
In the dynamic row format, whether the column store is placed on the off-page depends on the row size. It places the longest column in the row to off-page, until two rows can be stored on the data page. TEXT/BLOB columns <= 40 bytes are always stored on the data page. This method can avoid putting too many big column values in B-tree Node as compact does, because dynamic considers that as long as some of the big column values are placed in off-page, it is more effective to put the entire value into off-page.
CompressedThe physical structure is similar to dynamic, but the zlib algorithm is used to compress and store data rows in the table. When long blob columns are of many types, the off-page usage can be reduced and the storage space can be reduced (about 40% in general), but the CPU needs to be higher, the buffer pool may store both the compressed and non-compressed versions of data, so it also occupies part of the memory. MySQL 5.6 Manual innodb-compression-internals is very clear here.
In addition, becauseROW_FORMAT=DYNAMIC
AndROW_FORMAT=COMPRESSED
YesfromROW_FORMAT=COMPACT
They handle the changes.CHAR
Type storage is the same as COMPACT.
3. Optimize access to TEXT/BLOB fields
Mysql io is in the unit of page, so unnecessary data (large fields) will be read to the memory along with the data to be operated, this causes a problem because large fields occupy a large amount of memory (compared to other small fields), which leads to poor memory utilization and more random reads. From the above analysis, we have seen that the performance bottleneck is that large fields are stored in the data page, resulting in poor memory utilization and excessive random reads, how can we optimize the impact of this large field?
3.1 compression & merge
A. innodb provides the barracuda file format, which stores all large fields in the overflow segment and only 20 bytes in the data segment. This greatly reduces the space usage of data pages, so that a data page can store more data rows, it increases the memory hit rate (for this instance, the length of most rows does not exceed 8 k, so the optimization is limited ); if the data in the overflow segment is compressed, the space usage will be greatly reduced. The specific compression ratio can be set to key_blok_size.
B. You can use COMPRESS () to COMPRESS large fields and save them as BLOB, or COMPRESS them in the application before sending them to MySQL.
C. A table has multiple blob-like fields.<TEXT><f_big_col1>long..</f_big_col1> <f_content>long..</f_content></TEXT>
And then compress the storage.
D. Avoid TEXT if the expected length range is varchar.
3.2 split
Split the master table into two one-to-one joined tables. After placing large fields in another table, the length of a single row becomes very small, the row density of the page is much larger than that of the original table, so that enough rows can be cached, And the hit rate of the buffer pool will be increased, the application needs to maintain a large-field sub-table. It can also be optimized by overwriting the index to separate the index from the original table structure, from data pages with low access density to index pages with high access density, random io is converted to sequential io
Conclusion: We can let a single page store enough rows and constantly prompt the memory hit rate. Starting from the principle of the underlying database storage, we can deeply optimize the database.
To sum up, if you really need to store BLOB, TEXT, and long VARCHAR columns in the InnoDB table in actual business, we have the following suggestions:
Store all data in the same column after serialization and compression to avoid multiple off-page operations.
Avoid TEXT if the expected length range is varchar.
If you cannot integrate all columns into one column, you can sort and combine them according to the maximum length of each column and split them into multiple sub-tables, make sure that the total row length of each sub-table is smaller than 8 KB to reduce the frequency of off-page operations.
Reference:
Http://www.hudong.com/wiki/%E3%80%8AMySQL%E6%8A%80%E6%9C%AF%E5%86%85%E5%B9%95%EF%BC%9AInnoDB%E5%AD%98%E5%82%A8%E5%BC%95%E6%93%8E%E3%80%8B
Http://www.mysqlperformanceblog.com/2008/01/11/mysql-blob-compression-performance-benefits/
Http://www.mysqlperformanceblog.com/2012/05/30/data-compression-in-innodb-for-text-and-blob-fields/
Http://yoshinorimatsunobu.blogspot.com/2010/11/handling-long-textsblobs-in-innodb-1-to.html
Http://blog.opskumu.com/mysql-blob.html
Http://hidba.org /? P = 551
Http://blog.chinaunix.net/uid-24485075-id-3523032.html
Http://dev.mysql.com/doc/refman/5.6/en/innodb-row-format-dynamic.html