Cassandra 1.0 provides data compression based on columnfamily, which is also a function of the people's voice. Compression can effectively reduce the volume of data, but also reduce disk I/O, especially for those who read many of the scenarios.
What are the benefits of compression
Compression can effectively reduce the volume of data and can store more data on the same memory and disk. In addition, by decompressing only the data blocks of the specified portion, Cassandra also improves the performance of reading data from disk.
Unlike traditional database systems, traditional database systems use compression to have a negative impact on write performance, because traditional methods need to extract raw data first, personalized raw data, and then compress the data after the personality to store. And because Cassandra's personality operation is append, do not need to operate on the original data, thus avoiding two times compression problem, write performance will be compared to do not use compression increase about 10%.
In general, Cassandra uses compression to get some of the following performance improvements:
Data size may only be half to One-fourth of the original
25-35% Read Performance improvement
LED Write performance improvement
When to use data compression
It is more appropriate to columnfamilies data compression on multiple lines and the same fields, such as a columnfamily that stores information such as Username,email. The more the same value in each row of data, the better the compression effect.
Conversely, if you have data that is not the same for each row of fields, the effect of using compression is not very good.
columnfamily compression Configuration
When you create or modify a column accessibility, you can set the relevant compression options, the compression option contains the following two:
Sstable_compression: This option is used to configure the specific compression algorithm, Cassandra supports two built-in compression algorithms: Snappycompressor and Deflatecompressor, which have different advantages, Snappy is faster in compression and decompression, while deflate is higher in compression ratio. Choose which compression algorithm you want, and you can decide your own application scenario. If it is a read-more application, it is recommended to use the snappy algorithm. Alternatively, you can develop your own compression algorithm on the interface provided by Cassandra, as long as the Org.apache.cassandra.io.compress.ICompressor interface is implemented.
CHUNK_LENGTH_KB: This option is used to set the size of the compression block, the default is 64KB,64KB this default setting is more appropriate, for the project more rows, you can not extract the entire line can get to the 64kb of the data, for the project relatively few rows, Although set to 64KB may use the data you need to extract more than you need, but its compression rate is also considerable, it can be said that this number is the compression rate and the cost of the tradeoff between a more appropriate number. Of course, you can personalize this number for your own scenario so that you can achieve relatively high performance against your typical reading and writing methods.
You can set its compression options when creating a column accessibility, or you can personalize the compression options for an existing column accessibility. However, this modification will only affect subsequent data, and the previously saved sstable will not be automatically uncompressed. Of course, if you have to get old data to be uncompressed, you can do it manually by using the Nodetool Scrub tool provided by Cassandra.
Here is an example of using the Cassandra command-line method to create a compressed column accessibility:
[Default@demo] CREATE COLUMN Accessibility users
With Key_validation_class=utf8type
and Column_metadata = [
{column_name:name, Validation_class:utf8type}
{Column_name:email, Validation_class:utf8type}
{column_name:state, Validation_class:utf8type}
{Column_name:gender, Validation_class:utf8type}
{column_name:birth_year, Validation_class:longtype}
]
and Compression_options={sstable_compression:snappycompressor, chunk_length_kb:64};
Summary
In Cassandra 1.0, compression is used to reduce the size of the data, which makes it very easy to improve performance. You can also adjust the block size after you upgrade the Cassandra.
(Responsible editor: The good of the Legacy)