For most applications, access to hotspot data exists, that is, the Access Frequency of some data in a certain period of time is much higher than that of other data.
Common hot data includes "the latest news", "the most popular news", and "movies with the largest downloads.
To understand the support of MySQL Innodb for hotspot data, I conducted a benchmark test. The test environment is as follows:
[Hardware configuration]
Hardware |
Configuration |
CPU |
Intel (R) Xeon (R) CPU E5620 clock speed 2.40 GHz, 2 Physical CPUs, 16 logical CPUs |
Memory |
24G (6 blocks * 4G DDR3 1333 REG) |
Hard Disk |
300 GB * 3, SAS hard drive 15000 rpm, no RAID, RAID card, and write-back function |
OS |
RHEL5 |
MySQL |
5.1.49/5.1.54 |
[MySQL configuration]
Configuration item |
Configuration |
Innodb_buffer_pool_size |
4G |
Innodb_log_file_size |
200 M |
Innodb_log_files_in_group |
3 |
Sync_binlog |
100 |
Innodb_flush_log_at_trx_commit |
2 |
Hotspot Data Model]
To simulate that hotspot data is mainly stored in the memory, use the range query to load the first 20% of the data into the memory as hotspot data, for example: select count (*) FROM BT_KV_SHORT_INT_CHAR_10KW WHERE col1 <20000000
Project |
Model |
Number of table records |
1KW (3G), 2KW (6G), 5KW (15G), 10KW (30G) |
Key |
INT |
Value |
CHAR (250) |
Hotspot data |
20% of total data |
The performance test results are as follows:
[Query]
The detailed analysis is as follows:
==> When the hotspot data is smaller than the Innodb buffer pool (1KW/2KW/5KW), the query operation performance is very high, similar to the performance when the table data is smaller than the Innodb buffer pool;
==> When the hotspot data exceeds the Innodb buffer pool (10 kW), the query performance decreases significantly;
==> The overall performance of hotspot data access is better than that of random access;
[Insert]
The detailed analysis is as follows:
==> When the hotspot data is smaller than the Innodb buffer pool (1KW/2KW/5KW), the performance gradually decreases with the increase of hotspot data because when the Innodb buffer pool is close to saturation, more operations are required for buffer management;
==> When the hotspot data exceeds the Innodb buffer pool (10 kW), the performance decreases sharply because disk IO has become a performance bottleneck;
[Update]
Analysis is the same as INSERT.
[Delete]
The analysis is as follows:
==> The performance is slightly different from INSERT/UPDATE. When hotspot data is smaller than the Innodb buffer pool, the performance does not change much because the DELETE operation does not need to generate a new Page, saves buffer management operations;
==> When hotspot data is greater than the Innodb buffer pool, the performance is greatly reduced because disk IO has become a performance bottleneck.
[Summary]
The Innodb buffer pool uses LRU to manage and eliminate data. According to the LRU algorithm, hotspot data is preferentially placed in the memory. Therefore, the testing performance of hotspot data is much higher than that of random access.
However, when hotspot data exceeds the Innodb buffer pool, disk I/O becomes a major performance bottleneck and the performance will drop sharply.
[Application recommendations]
In practical applications, Innodb is a good choice for high performance when it involves hotspot data access, but the premise isAbility to estimate the size of hotspot dataOnly when hotspot data is smaller than the Innodb buffer pool (that is, all hotspot data can be stored in the memory) can high performance be achieved.
Note:
The test data is only used for comparison. It does not mean that the performance of MySQL is usually so high. For comparison, many preparations are made during the test, and the test operation is special.