Mysql data compression performance comparison

Source: Internet
Author: User

The data required by data cube is rarely written or not updated at all. This type of data is ideal for compression to reduce disk usage. MySQL provides two compression methods: archive engine and myisampack for MyISAM engine. The two methods are tested respectively today, and their advantages and disadvantages in disk usage and query performance are compared. You should understand why this is done. I will introduce it later. And read the text:
 
1. test environment:
Software and Hardware
One 64-bit 2.6.18-92 kernel Linux development machine, 4 GB memory, 4 2800 Mhz Dual-Core AMD Opteron (tm) Processor 2220 CPU.
 
MySQL is placed on a 7200-to-SAT hard disk without raid;
 
MySQL does not perform any optimization and disables the query cache to avoid interference with the test results.
 
Table Structure
2424753 records, the actual data of a certain shard in the production environment;
 
The joint indexes (partition_by1, idx_rank) and (partition_by1, chg_idx) are created respectively. The partition_by1 is a 32-length varchar type for retrieval. The remaining two fields are floating-point numbers and are mostly used for sorting;
 
Autokid is used as the sub-column and serves as the primary key. It is used only for Atomicity guarantee during data loading and has no practical significance.
 
2. Purpose:
Compressed Space Comparison
The larger the compression ratio, the smaller the disk space occupied, directly reducing the data storage cost;
 
Query performance comparison
The query performance should not be significantly reduced after compression. Archive does not support indexing. Therefore, it is inevitable to reduce the performance. Therefore, we should have a spectrum in our mind to see whether the reduction is acceptable.
 
3. Test tool:
Mysqlslap
Of course, official tools are the best choice. For more information about mysqlslap, see the official documentation.
 
Test query
A total of 9973 actual SQL statements accessing the topranks_v3 table in the production environment are intercepted, and 7 with a large access volume are extracted, with a concurrency of 50 and repeated execution for 10 times. The command is as follows:
 
. /Mysqlslap -- defaults-file = .. /etc/my. cnf-u *****-p *****-c50-i10-q .. /t. SQL -- debug-info4. test conclusion
 
Comparison items Disk Space Time (seconds) CPU Idle LOAD concurrency
Benchmark table (MyISAM) 403956004 2.308 30 15 50
ARCHIVE 75630745> 300 75 4 1
PACK 99302109 2.596 30 22 50
 
Based on the test data given in the above table, we can draw the following conclusions:
 
For test tables, Archive tables occupy About 18.7% of the previous space, and myisampack occupies about 24.6% of the previous space. The two are similar in terms of space utilization, we seem to need to select an archive table;
Let's look at the query performance and compare it with the reference table. In terms of total time consumption and system load, the query performance of the pack table under 50 concurrency is equivalent to that of the benchmark table; the archive table takes more than 5 minutes in the case of single concurrency (you can't wait, kill it )!
Then, we can draw a conclusion that the ARCHIVE engine can basically ignore the table to be queried online.
 
Why is the ARCHIVE engine so slow during this test?
 
We know that mysql provides archive storage engines to reduce disk overhead, but there is also a premise that archive data is not required or rarely queried online, occasionally slow queries do not matter. For the above reasons, the archive table does not allow the creation of indexes outside of the auto-increment column.
 
With this consensus, we use a test SQL to analyze why the query performance difference between the two indexes is so big. There is one in our test SQL:
 
SELECT c1, c2,..., cn FROM mysqlslap. rpt_topranks_v3
WHERE... AND partition_by1 = '20140901'
Order by added_quantity3 DESC
LIMIT 500 we mentioned earlier that the tested table has an index on the partition_by1 field. Then, we preliminarily judge that the index of partition_by1 should be used in the benchmark table and myisampack table; EXPLAIN:
 
Mysql & gt; EXPLAIN
-& Gt; SELECT... FROM mysqlslap. rpt_topranks_v3
-& Gt; WHERE... AND partition_by1 = '20140901'
-& Gt; order by added_quantity3 DESC
-& Gt; LIMIT 500 \ G
* *************************** 1. row ***************************
Id: 1
Select_type: SIMPLE
TABLE: rpt_topranks_v3
Type: ref
Possible_keys: idx_toprank_pid, idx_toprank_chg
KEY: idx_toprank_pid
Key_len: 99
Ref: const
Rows: 2477
Extra: using where; USING filesort
1 row in set (0.00 sec) as we expected, this query uses an index created on the partition_by1 field. The number of matched target rows is 2477, then there is an order on the added_quantity3 field. Because added_quantity3 does not have an index, filesort is used.
 
Let's take a look at the EXPLAIN results of this SQL statement on the archiving table:
 
Mysql & gt; EXPLAIN
-& Gt; SELECT... FROM mysqlslap. rpt_topranks_v3 _ <strong> archive </strong>
-& Gt; WHERE... AND partition_by1 = '20140901'
-& Gt; order by added_quantity3 DESC
-& Gt; LIMIT 500 \ G
* *************************** 1. row ***************************
Id: 1
Select_type: SIMPLE
TABLE: rpt_topranks_v3_archive
Type: ALL
Possible_keys: NULL
KEY: NULL
Key_len: NULL
Ref: NULL
Rows: 2424753
Extra: using where; USING filesort
1 row in set (0.00 sec) EXPLAIN said: "I don't have an index available, so I can only scan 2424753 rows of records IN the entire table, and then try another filesort ." If you want to pursue performance, it is obviously a grievance to MySQL.
 

Additional reading: http://www.bkjia.com/database/201202/120001.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.