Cassandra1.0 improvements: Data compression

Last Update:2015-03-23 Source: Internet

Author: User

Keywords Name this algorithm you can

Tags based create data database systems default disk function get

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Cassandra 1.0 provides data compression based on columnfamily, which is also a function of the people's voice. Compression can effectively reduce the volume of data, but also reduce disk I/O, especially for those who read many of the scenarios.

What are the benefits of compression

Compression can effectively reduce the volume of data and can store more data on the same memory and disk. In addition, by decompressing only the data blocks of the specified portion, Cassandra also improves the performance of reading data from disk.

Unlike traditional database systems, traditional database systems use compression to have a negative impact on write performance, because traditional methods need to extract raw data first, personalized raw data, and then compress the data after the personality to store. And because Cassandra's personality operation is append, do not need to operate on the original data, thus avoiding two times compression problem, write performance will be compared to do not use compression increase about 10%.

In general, Cassandra uses compression to get some of the following performance improvements:

Data size may only be half to One-fourth of the original

25-35% Read Performance improvement

LED Write performance improvement

When to use data compression

It is more appropriate to columnfamilies data compression on multiple lines and the same fields, such as a columnfamily that stores information such as Username,email. The more the same value in each row of data, the better the compression effect.

Conversely, if you have data that is not the same for each row of fields, the effect of using compression is not very good.

columnfamily compression Configuration

When you create or modify a column accessibility, you can set the relevant compression options, the compression option contains the following two:

Sstable_compression: This option is used to configure the specific compression algorithm, Cassandra supports two built-in compression algorithms: Snappycompressor and Deflatecompressor, which have different advantages, Snappy is faster in compression and decompression, while deflate is higher in compression ratio. Choose which compression algorithm you want, and you can decide your own application scenario. If it is a read-more application, it is recommended to use the snappy algorithm. Alternatively, you can develop your own compression algorithm on the interface provided by Cassandra, as long as the Org.apache.cassandra.io.compress.ICompressor interface is implemented.

CHUNK_LENGTH_KB: This option is used to set the size of the compression block, the default is 64KB,64KB this default setting is more appropriate, for the project more rows, you can not extract the entire line can get to the 64kb of the data, for the project relatively few rows, Although set to 64KB may use the data you need to extract more than you need, but its compression rate is also considerable, it can be said that this number is the compression rate and the cost of the tradeoff between a more appropriate number. Of course, you can personalize this number for your own scenario so that you can achieve relatively high performance against your typical reading and writing methods.

You can set its compression options when creating a column accessibility, or you can personalize the compression options for an existing column accessibility. However, this modification will only affect subsequent data, and the previously saved sstable will not be automatically uncompressed. Of course, if you have to get old data to be uncompressed, you can do it manually by using the Nodetool Scrub tool provided by Cassandra.

Here is an example of using the Cassandra command-line method to create a compressed column accessibility:

[Default@demo] CREATE COLUMN Accessibility users

With Key_validation_class=utf8type

and Column_metadata = [

{column_name:name, Validation_class:utf8type}

{Column_name:email, Validation_class:utf8type}

{column_name:state, Validation_class:utf8type}

{Column_name:gender, Validation_class:utf8type}

{column_name:birth_year, Validation_class:longtype}

]

and Compression_options={sstable_compression:snappycompressor, chunk_length_kb:64};

Summary

In Cassandra 1.0, compression is used to reduce the size of the data, which makes it very easy to improve performance. You can also adjust the block size after you upgrade the Cassandra.

(Responsible editor: The good of the Legacy)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More