InnoDB Data Sheet compression principle and limitations

Source: Internet
Author: User

http://liuxin1982.blog.chinaunix.net/uid-24485075-id-3523032.html Compression Concept

By increasing CPU utilization and cost savings, the database capacity and I/O load are reduced, resulting in a significant increase in data throughput rates.

Compression principle

A compressed table reduces the size of the database on disk, allowing users to access data without having to frequently manipulate writes and reads. For InnoDB workloads and traditional user tables (especially in some read-intensive applications where memory has enough space to store common data), data compression not only greatly reduces the storage space required by the database, but also reduces the workload of I/O and increases data throughput, thus saving overhead processing costs. Saving storage costs is important, but reducing I/o costs is more critical.

In InnoDB, a 16K page is the basic storage unit. We know that InnoDB is the data stored in the clustered index, and only the PK of the corresponding data is stored in secondary index. Clustered index and secondary index are b-tree structures, so compression of the INNODB data pages and index pages is largely the compression of the B-tree node pages.

In InnoDB, in addition to the B-tree node page, there is a class of data page (page) called "Overflow page". When a long column needs to be stored, it is stored in the current page if the current page can fully store all the fields, and if the current page is not sufficient to store all, InnoDB selects the longest field and stores it in a separate page, which we call "Overflow page". The original data page only needs to store a 20Bytes pointer. Reference:


Compression is using the LZ77 algorithm in the Zlib library.

Compression restrictions in order to maintain the compatibility of database files, compression can only be specified when the "Barracuda" database file format is started using the Innodb_file_format configuration parameter. It is also not feasible to compress tables in the INNODB system table space. The system tablespace (space 0, the ibdata* file) contains not only user data but also INNODB internal system information and can never be compressed. Therefore, compression applies only to tables (and indexes) that are stored in the tablespace.

When to use compression

In general, for a table with a moderate number of strings, reading data is faster than writing data, and the compression performance is best. Compression should strive to reduce the size of the data file, the decisive factor affecting its compression efficiency is the data itself. Identifying duplicate strings in a set of data can undo compression. Completely random data is the worst. Traditional data tend to have duplicate values, which are relatively efficient to compress. The string is also often easily compressed, whether it is defined on char, VARCHAR, text, or BLOB columns. On the other hand, some tables contain most of the binary data (integers or floating-point numbers) or previously compressed data (such as JPEG or PNG images), which is often difficult to compress.

In addition to considering which tables to compress (and how the page size is set), the effort is another key factor in performance measurement. InnoDB the modified log is set for compressed data, and if the application reads primarily instead of updating, then only a handful of pages need to be reorganized and re-compressed after the index page occupies the space of "modify log" for each page. The cost of compression is acceptable if the update primarily changes non-indexed columns or contains columns that happen to be stored as "off-page" blobs and large strings. If the only change in the table is a inserts statement that uses a single incrementing primary key and does not have too many nonclustered indexes, then there is no need to reorganize or re-compress the index pages. Because InnoDB is able to "replace" the uncompressed data by compressing the page "Mark Delete" and deleting the record, it is relatively valid to delete operations in the table.

For some environments, the time it takes to load data is as significant as the time required to run the retrieval. Especially in a data warehouse environment, many tables have properties that are read-only or read-oriented. In this case, the cost of compression is unacceptable from an increased load-time perspective, unless the savings are significant in less disk reads or storage costs.

Basically, when CPU time can be used to compress and decompress data, the compression effect is the best. Therefore, if the workload is caused by I/O, rather than by the CPU, compression can improve overall performance. Therefore, when testing an application with a different compression configuration, you should test it on a platform similar to the product system planning configuration.

Compression process

When a compressed storage page is used, it is decompressed when the buffer pool is loaded. At this point, the page has both "compressed" and "unzipped" in buffer pool. When buffer pool needs to evict these pages, there are two things that happen: if InnoDB thinks the current application is io-bound, compared to the CPU has the additional ability to do the decompression operation, then InnoDB choose to evict only the "decompression version" of the page Otherwise, InnoDB will evict the two versions of the page. That is, the buffer pool will be the state:

InnoDB Data Sheet compression principle and limitations

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.