How to improve the storage efficiency of BLOB columns in InnoDB tables in MySQL, innodbblob
First, we will introduce several key points about the InnoDB Engine storage format:
1. InnoDB supports the use of shared or independent tablespaces. We recommend that you use independent tablespaces for ease of management and maintenance. Enable the innodb_file_per_table option. After 5.5, You can dynamically modify the TABLE to take effect online and execute alter table xx ENGINE = InnoDB to convert the existing TABLE to an independent tablespace. After you modify this option earlier than 5.5, A restart is required to take effect;
2. The default data page of InnoDB is 16 KB. After MySQL 5.6, the innodb_page_size option can be modified. In versions earlier than MySQL 5.6, the source code can only be modified and re-compiled, but this configuration is not recommended, unless you are very clear about its advantages and disadvantages;
3. When new data is written to the InnoDB data page, 1/16 of the space is reserved. The reserved space can be used for subsequent New Record writing to reduce the overhead of frequent new data pages;
4. Each data page must store at least two rows of records. Therefore, the maximum length of the theoretical upstream record is 8 KB, but in fact it should be smaller, because some InnoDB internal data structures should be stored;
5. Limited by the InnoDB Storage method, if data is written in sequence, the data page filling rate is 15/16 in the ideal case, but generally there is no way to ensure full sequential writing. Therefore, data page filling rate is generally 1/2 to 15/16. Therefore, it is best for each InnoDB table to have an auto-incrementing column as the primary key so that new records are written in sequence as much as possible;
6. When the data page filling rate is less than 1/2, InnoDB will contract and free up space;
7. MySQL 5.6 InnoDB engine currently supports four formats: COMPACT, REDUNDANT, DYNAMIC, and COMPRESSED. The default format is COMPACT. COMPRESSED is rarely used and is not recommended (see the next one ), if you need to use the compression feature, you can directly consider the TokuDB engine;
8. Compared with REDUNDANT, the COMPACT row format can save about 20% of the storage space, while COMPRESSED can save about 50% of the storage space compared with COMPACT, but may cause a 90% decrease in TPS. Therefore, the COMPRESSED row format is strongly not recommended;
9. When the row format is DYNAMIC or COMPRESSED, long columns such as TEXT/BLOB may also be other long columns, not necessarily only of the TEXT/BLOB type, depends on the specific situation) will be fully stored in an independent data page, clustered index page uses only 20 bytes pointer to point to the new page, this is called off-page, similar to ORACLE row migration, the disk space is wasted heavily and the I/O performance is poor. Therefore, we strongly recommend that you do not use BLOB, TEXT, or VARCHAR column types that exceed 255 characters;
10. When the InnoDB file format (innodb_file_format) is set to Antelope and the row format is COMPACT or REDUNDANT, BLOB, TEXT, or long VARCHAR columns only store the first 768 bytes on the clustered index page (the maximum 768 bytes is used to facilitate the creation of prefix indexes/prefix indexes ), the rest of the content is stored in an additional page, even if only one byte is added. Therefore, the shorter the length of all columns, the better;
11. The page of BLOB, TEXT, or long VARCHAR columns stored in off-page is exclusive and cannot be shared. Therefore, it is strongly not recommended to use multiple long columns in a table.
To sum up, if you really need to store BLOB, TEXT, and long VARCHAR columns in the InnoDB table in actual business, we have the following suggestions:
1. serialize and compress all data as much as possible and store the data in the same column to avoid multiple off-pages;
2. convert a column whose actual maximum storage length is less than 255 to the VARCHAR or CHAR type. (if the data is variable-length, there is no difference between them. If the data is fixed-length, the CHAR type is used );
3. If you cannot integrate all columns into one column, you can sort and combine them according to the maximum length of each column and split them into multiple sub-tables, the total row length of each sub-table should be smaller than 8 KB to reduce the frequency of off-page;
4. We recommend that you modify the data page size to 8 kb or another size based on the above theory to find the most suitable value;
5. When the bytes column length is less than 255, no matter whether CHAR or VARCHAR is used for storage, or the VARCHAR column length is defined as 255, the actual tablespace will not increase;
6. Generally, a large number of BLOB column types are used in the game industry.
The following is the test and verification process. You can take a look at it with patience:
# Test Case: InnoDB long column storage efficiency # test scenario description: # store 64 KB data in the InnoDB table, compare different storage methods # Write 5000 rows of records per table, compare the last tablespace file size # TABLE 0: all data is stored in the create table 't_ longcol_0 '('id' int (10) unsigned not null AUTO_INCREMENT, 'longcol' blob not null comment' store all data in a blob column ', primary key ('id') ENGINE = InnoDB default charset = utf8 ROW_FORMAT = COMPACT; # Data Writing and storage process: mysp_longcol_0_ins () create procedure 'mysp _ longcol_0_ins '(in cnt int) beginset @ I = 1; while @ I <cnt doinsert into t_longcol_0 (longcol) select repeat ('A', 65535); set @ I = @ I + 1; end while; end; # table 1: store 64 kB bytes in 9 columns on average create table 't_ longcol_1 '('id' int (10) unsigned not null AUTO_INCREMENT, 'longcol1' blob not null comment' store all data in 9 blob columns ', 'longcol2' blob not null, 'longcol3' blob not null, 'longcol4' blob not null, 'longcol5' blob not null, 'longcol6' blob not null, 'longcol7' blob not null, 'longcol8' blob not null, 'longcol9' blob not null, primary key ('id') ENGINE = InnoDB default charset = utf8; # corresponding data writing stored PROCEDURE: mysp_longcol_ins ins () create procedure 'mysp _ longcol_1_ins '(in cnt int) beginset @ I = 1; while @ I <cnt doinsert into t_longcol_1 (longcol1, longcol2, longcol3, longcol4, longcol5, longcol6, longcol7, longcol8, longcol9) selectrepeat ('A ', 7500), repeat ('A', 7500), repeat ('A', 7500), repeat ('A', 7500), repeat ('A', 7500 ), repeat ('A', 7500), repeat ('A', 7500), repeat ('A', 7500), repeat ('A', 5535 ); set @ I = @ I + 1; end while; end; # Table 2: store 64 KB data discretely in multiple BLOB columns create table 't_ longcol_2 '('id' int (10) unsigned not null AUTO_INCREMENT, 'longcol1' blob not null comment' store 100 bytes data', 'longcol2' blob not null comment' store 100 bytes data ', 'longcol3' blob not null comment' store 100 bytes data', 'longcol4' blob not null comment' store 100 bytes data ', 'longcol5' blob not null comment' store 100 bytes data', 'longcol6' blob not null comment' store 255 bytes data ', 'longcol7' blob not null comment' store 368 bytes data', 'longcol8' blob not null comment' store 496 bytes data ', 'longcol9' blob not null comment' store 512 bytes data', 'longcol10' blob not null comment' store 640 bytes data ', 'longcol11' blob not null comment' store 768 bytes data', 'longcol12' blob not null comment' store 912 bytes data ', 'longcol13' blob not null comment' store 1024 bytes data', 'longcol14' blob not null comment' store 2048 bytes data ', 'longcol15' blob not null comment' store 3082 bytes data', 'longcol16' blob not null comment' store 4096 bytes data ', 'longcol17' blob not null comment' store 8192 bytes data', 'longcol18' blob not null comment' store 16284 bytes data ', 'longcol19' blob not null comment' store 20380 bytes data', 'longcol20' blob not null comment' store 5977 bytes data', primary key ('id ')) ENGINE = InnoDB default charset = utf8; # corresponding data writing stored PROCEDURE: mysp_longcol_ins ins () create procedure 'mysp _ longcol_1_ins '(in cnt int) beginset @ I = 1; while @ I <cnt doinsert into partition (longcol1, longcol2, longcol3, longcol4, longcol5, longcol6, longcol7, longcol8, week, longcol10, longcol11, longcol12, longcol13, longcol14, longcol15, longcol16, longcol17, longcol18, longcol19, longcol20) selectrepeat ('a, 100), repeat ('A', 100), repeat ('A', 100 ), repeat ('A', 100), repeat ('A', 100), repeat ('A', 256), repeat ('A', 368 ), repeat ('A', 496), repeat ('A', 512), repeat ('A', 640), repeat ('A', 768 ), repeat ('A', 912), repeat ('A', 1024), repeat ('A', 2048), repeat ('A', 3082 ), repeat ('A', 4096), repeat ('A', 8192), repeat ('A', 16284), repeat ('A', 20380 ), repeat ('A', 5977); set @ I = @ I + 1; end while; end; # Table 3: store 64 KB data discretely in multiple CHAR, VARCHAR, BLOB columns create table 't_longcol_3 '('id' int (10) unsigned not null AUTO_INCREMENT, 'longcol1' char (100) not null default ''comment' store 100 bytes data', 'longcol2' char (100) not null default ''comment' store 100 bytes data ', 'longcol3' char (100) not null default ''comment 'store 100 bytes data', 'longcol4' char (100) not null default ''comment 'store 100 bytes data ', 'longcol5' char (100) not null default ''' COMMENT 'store 100 bytes data', 'longcol6' varchar (256) not null default ''' COMMENT 'store 255 bytes data ', 'longcol7' varchar (368) not null default ''' COMMENT 'store 368 bytes data', 'longcol8' varchar (496) not null default ''comment 'store 496 bytes data ', 'longcol9' varchar (512) not null default ''' COMMENT 'store 512 bytes data', 'longcol10' varchar (640) not null default ''comment 'store 640 bytes data ', 'longcol11' varchar (768) not null default ''' COMMENT 'store 768 bytes data', 'longcol12' varchar (912) not null default ''comment 'store 912 bytes data ', 'longcol13' varchar (1024) not null default ''' COMMENT 'store 1024 bytes data', 'longcol14' varchar (2048) not null default ''comment 'store 2048 bytes data ', 'longcol15' varchar (3082) not null default ''' COMMENT 'store 3082 bytes data', 'longcol16' varchar (4096) not null default ''comment 'store 4096 bytes data ', 'longcol17' blob not null comment' store 8192 bytes data', 'longcol18' blob not null comment' store 16284 bytes data ', 'longcol19' blob not null comment' store 20380 bytes data', 'longcol20' varchar (5977) not null default ''comment' store 5977 bytes data ', primary key ('id') ENGINE = InnoDB default charset = utf8; # corresponding data writing stored PROCEDURE: mysp_longcol_3_ins () create procedure 'mysp _ longcol_1_ins' (in cnt int) beginset @ I = 1; while @ I <cnt doinsert into t_longcol_3 (longcol1, longcol2, longcol3, longcol4, longcol5, longcol6, longcol7, longcol8, longcol9, longcol10, longcol11, longcol12, longcol13, longcol14, longcol15, longcol16, longcol17, longcol18, longcol19, longcol20) selectrepeat ('A', 100), repeat ('A', 100), repeat ('A ', 100), repeat ('A', 100), repeat ('A', 100), repeat ('A', 256), repeat ('A', 368 ), repeat ('A', 496), repeat ('A', 512), repeat ('A', 640), repeat ('A', 768 ), repeat ('A', 912), repeat ('A', 1024), repeat ('A', 2048), repeat ('A', 3082 ), repeat ('A', 4096), repeat ('A', 8192), repeat ('A', 16284), repeat ('A', 20380 ), repeat ('A', 5977); set @ I = @ I + 1; end while; end; # Table 4: 64 KB data is stored discretely in multiple VARCHAR and BLOB columns. When several columns in t_longcol_3 are CHAR, create table 't_ longcol_4 ('id' int (10) unsigned not null AUTO_INCREMENT, 'longcol1' varchar (100) not null default ''comment' store 100 bytes data', 'longcol2' varchar (100) not null default ''comment' store 100 bytes data', 'longcol3' varchar (100) not null default ''comment' store 100 bytes data', 'longcol4' varchar (100) not null default ''comment' store 100 bytes data', 'longcol5' varchar (100) not null default ''comment' store 100 bytes data', 'longcol6' varchar (256) not null default ''comment' store 255 bytes data', 'longcol7' varchar (368) not null default ''comment' store 368 bytes data', 'longcol8' varchar (496) not null default ''comment' store 496 bytes data', 'longcol9' varchar (512) not null default ''comment' store 512 bytes data', 'longcol10' varchar (640) not null default ''comment' store 640 bytes data', 'longcol11' varchar (768) not null default ''comment' store 768 bytes data', 'longcol12' varchar (912) not null default ''comment' store 912 bytes data', 'longcol13' varchar (1024) not null default ''comment' store 1024 bytes data', 'longcol14' varchar (2048) not null default ''comment' store 2048 bytes data', 'longcol15' varchar (3082) not null default ''comment' store 3082 bytes data', 'longcol16' varchar (4096) not null default ''comment' store 4096 bytes data', 'longcol17' blob not null comment' store 8192 bytes data ', 'longcol18' blob not null comment' store 16284 bytes data', 'longcol19' blob not null comment' store 20380 bytes data', 'longcol20' varchar (5977) not null default ''comment 'store 5977 bytes data', primary key ('id') ENGINE = InnoDB default charset = utf8; # corresponding data writing stored procedure: mysp_longcol_4_ins () create procedure 'mysp _ longcol_1_ins '(in cnt int) beginset @ I = 1; while @ I <cnt doinsert into t_longcol_4 (longcol1, longcol2, longcol3, longcol4, longcol5, longcol6, longcol7, longcol8, longcol9, longcol10, longcol11, longcol12, longcol13, longcol14, longcol15, longcol16, longcol17, longcol18, longcol19, longcol20) selectrepeat ('a, 100 ), repeat ('A', 100), repeat ('A', 100), repeat ('A', 100), repeat ('A', 100 ), repeat ('A', 256), repeat ('A', 368), repeat ('A', 496), repeat ('A', 512 ), repeat ('A', 640), repeat ('A', 768), repeat ('A', 912), repeat ('A', 1024 ), repeat ('A', 2048), repeat ('A', 3082), repeat ('A', 4096), repeat ('A', 8192 ), repeat ('A', 16284), repeat ('A', 20380), repeat ('A', 5977); set @ I = @ I + 1; end while; end; # Table 5: 64 KB data is stored discretely in multiple VARCHAR and BLOB columns. Compared with t_longcol_4, the length of the preceding columns is changed to 255, but the actual storage length is still 100 bytes create table 't_ longcol_5 '('id' int (10) unsigned not null AUTO_INCREMENT, 'longcol1' varchar (255) not null default ''comment' store 100 bytes data', 'longcol2' varchar (255) not null default ''comment' store 100 bytes data', 'longcol3' varchar (255) not null default ''comment' store 100 bytes data', 'longcol4' varchar (255) not null default ''comment' store 100 bytes data', 'longcol5' varchar (255) not null default ''comment' store 100 bytes data', 'longcol6' varchar (256) not null default ''comment' store 255 bytes data', 'longcol7' varchar (368) not null default ''comment' store 368 bytes data', 'longcol8' varchar (496) not null default ''comment' store 496 bytes data', 'longcol9' varchar (512) not null default ''comment' store 512 bytes data', 'longcol10' varchar (640) not null default ''comment' store 640 bytes data', 'longcol11' varchar (768) not null default ''comment' store 768 bytes data', 'longcol12' varchar (912) not null default ''comment' store 912 bytes data', 'longcol13' varchar (1024) not null default ''comment' store 1024 bytes data', 'longcol14' varchar (2048) not null default ''comment' store 2048 bytes data', 'longcol15' varchar (3082) not null default ''comment' store 3082 bytes data', 'longcol16' varchar (4096) not null default ''comment' store 4096 bytes data ', 'longcol17' blob not null comment' store 8192 bytes data', 'longcol18' blob not null comment' store 16284 bytes data ', 'longcol19' blob not null comment' store 20380 bytes data', 'longcol20' varchar (5977) not null default ''comment' store 5977 bytes data ', primary key ('id') ENGINE = InnoDB default charset = utf8; # corresponding data writing stored PROCEDURE: mysp_longcol_5_ins () create procedure 'mysp _ longcol_1_ins' (in cnt int) beginset @ I = 1; while @ I <cnt doinsert into values (longcol1, longcol2, longcol3, longcol4, longcol5, longcol6, longcol7, longcol8, longcol9, longcol10, longcol11, longcol12, longcol13, longcol14, longcol15, longcol16, longcol17, longcol18, longcol19, longcol20) selectrepeat ('A', 100), repeat ('A', 100), repeat ('A ', 100), repeat ('A', 100), repeat ('A', 100), repeat ('A', 256), repeat ('A', 368 ), repeat ('A', 496), repeat ('A', 512), repeat ('A', 640), repeat ('A', 768 ), repeat ('A', 912), repeat ('A', 1024), repeat ('A', 2048), repeat ('A', 3082 ), repeat ('A', 4096), repeat ('A', 8192), repeat ('A', 16284), repeat ('A', 20380 ), repeat ('A', 5977); set @ I = @ I + 1; end while; end; # from below, refer to the 3rd suggestions for table sharding, total length of all columns in each table # Table sharding 1, the maximum length of a row is 100 + 100 + 100 + 100 + 100 + 255 + 368 + 496 + 512 + 640 + 768 + 912 = 3082 bytes create table 't_ longcol_51 '(' id' int (10) unsigned not null AUTO_INCREMENT, 'longcol1' varchar (255) not null default ''comment' store 100 bytes data', 'longcol2' varchar (255) not null default ''comment' store 100 bytes data', 'longcol3' varchar (255) not null default ''comment' store 100 bytes data', 'longcol4' varchar (255) not null default ''comment' store 100 bytes data', 'longcol5' varchar (255) not null default ''comment' store 100 bytes data', 'longcol6' varchar (256) not null default ''comment' store 255 bytes data', 'longcol7' varchar (368) not null default ''comment' store 368 bytes data', 'longcol8' varchar (496) not null default ''comment' store 496 bytes data', 'longcol9' varchar (512) not null default ''comment' store 512 bytes data', 'longcol10' varchar (640) not null default ''comment' store 640 bytes data', 'longcol11' varchar (768) not null default ''comment' store 768 bytes data', 'longcol12' varchar (912) not null default ''comment' store 912 bytes data', 'longcol15' varchar (3082) not null default ''comment' store 3082 bytes data ', primary key ('id') ENGINE = InnoDB default charset = utf8; # Table sharding 2, the maximum length of a row is 1024 + 2048 + 4096 = 7168 bytes create table 't_ longcol_52 '('id' int (10) unsigned not null AUTO_INCREMENT, 'longcol13' varchar (1024) not null default ''comment' store 1024 bytes data', 'longcol14' varchar (2048) not null default ''comment' store 2048 bytes data', 'longcol16' varchar (4096) not null default ''comment 'store 4096 bytes data', primary key ('id') ENGINE = InnoDB default charset = utf8; # Table sharding 3, the maximum length of a row is 8192 bytes. create table 't_ longcol_53 '('id' int (10) unsigned not null AUTO_INCREMENT, 'longcol17' blob not null comment 'store 8192 bytes data ', primary key ('id') ENGINE = InnoDB default charset = utf8; # Table sharding 4, the maximum length of a row is 16284 + 20380 = 36664 bytes create table 't_ longcol_54 '('id' int (10) unsigned not null AUTO_INCREMENT, 'longcol18' blob not null comment' store 16284 bytes data', 'longcol19' blob not null comment' store 20380 bytes data', primary key ('id ')) ENGINE = InnoDB default charset = utf8; # TABLE sharding 5. the maximum length of a row is 5977 + 4 = 5981 bytes create table 't_ longcol_55' ('id' int (10) unsigned not null AUTO_INCREMENT, 'longcol20' varchar (5977) not null default ''comment 'store 5977 bytes data', primary key ('id ')) ENGINE = InnoDB default charset = utf8; # corresponding data writing stored PROCEDURE: mysp_longcol_51_ins () create procedure 'mysp _ longcol_51_ins '(in cnt int) beginset @ I = 1; while @ I <cnt doinsert into t_longcol_51 (longcol1, longcol2, longcol3, longcol4, longcol5, longcol6, longcol7, longcol8, longcol9, longcol10, longcol11, longcol12, longcol15) selectrepeat ('A', 100), repeat ('A', 100), repeat ('A', 100), repeat ('A', 100 ), repeat ('A', 100), repeat ('A', 256), repeat ('A', 368), repeat ('A', 496 ), repeat ('A', 512), repeat ('A', 640), repeat ('A', 768), repeat ('A', 912 ), repeat ('A', 3082); insert into t_longcol_52 (longcol13, longcol14, longcol16) selectrepeat ('A', 1024), repeat ('A', 2048 ), repeat ('A', 4096); insert into t_longcol_53 (longcol17) select repeat ('A', 8192); insert into t_longcol_54 (longcol18, longcol19) selectrepeat ('A ', 16284), repeat ('A', 20380); insert into t_longcol_55 (longcol20) select repeat ('A', 5977); set @ I = @ I + 1; end while; end;
After writing 5000 rows of records to each of the preceding test tables, compare the tablespace file size and the size after the tablespace is reorganized to observe the fragmentation rate. For detailed comparison, see:
In the last table sharding method, the total size of the tablespace files in the five sub-tables is 40960 + 40960 + 98304 + 286720 + 40960 = 507904 bytes.
We can see that the total size of this method is the least different from the size of the original table, and the other storage methods are larger than this.