Cassandra.yaml Configuration

Last Update:2016-07-01 Source: Internet

Author: User

Tags cassandra

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Copy from:http://blog.csdn.net/y_h_t/article/details/11917531

All run configurations in Cassandra are configured in configuration file Cassandra.yaml.

The following explains the configuration items in Cassandra:
Cluster_Name
Sets the name of the Cassandra Cluster.
In a Cassandra cluster, each server must have the corresponding name of the cluster. If the names are inconsistent, the current Cassandra server cannot join the cluster.

Initial_token
The initialization token value of the Cassandra Server, which represents the location of the Cassandra server in a consistent hash ring.
When the Cassandra is first started, it is read from the configuration item, and if left blank, a token value is randomly generated. If Cassandra is not first started, the token value will be read from the system table.

Auto_bootstrap
The first time you start, you get data from other servers that belong to this server when you join the Cassandra Cluster.
If the current Cassandra server is not in the SEED configuration option and is started for the first time, data belonging to this server will be obtained from other servers in the Cassandra cluster.

Hinted_handoff_enabled
Whether to turn on the hint operation of the current Cassandra server.
If this feature is turned on, the Cassandra server caches data to other Cassandra servers that are temporarily invalidated, waits for the failed server to recover, and then sends the cached data to the recovered server.

Authenticator
Verifying that users using Cassandra are legitimate is the first step in securing authentication.
The Cassandra defines a set of policies that validate the user, and the items you can select are:
1. Org.apache.cassandra.auth.AllowAllAuthenticator
All users are legal.
2. Org.apache.cassandra.auth.SimpleAuthenticator
Both the legitimate user and the corresponding password are defined in the Passwd.properties file.

Authority
Verify that the user has permission to manipulate a column family, which is the first step in security authentication.
Cassandra defines a set of policies that verify user permissions, and the items you can select are:
1. org.apache.cassandra.auth.AllowAllAuthority
All users have all the permissions.
2. org.apache.cassandra.auth.SimpleAuthority
Both the legitimate user and the corresponding permissions are defined in the Access.properties file.

Partitioner
Cassandra a policy for partitioning data in a cluster.
This configuration needs to be consistent in each server in the same Cassandra cluster.

Cassandra defines a series of data partitioning policies that can be selected as:
1. Org.apache.cassandra.dht.RandomPartitioner
2. Org.apache.cassandra.dht.ByteOrderedPartitioner
3. Org.apache.cassandra.dht.OrderPreservingPartitioner
4. Org.apache.cassandra.dht.CollatingOrderPreservingPartitioner

Data_file_directories
The location where the sstable file is stored on disk.
This option can set multiple values, that is, if the server has more than one disk, you can specify these disks as the location where the sstable files are stored. If possible, consider setting data_file_directories and commitlog_directory on separate disks, which can help to distract the disk I/O to the overall system.

Commitlog_directory
The location where the Commitlog file is stored on disk.
If possible, consider setting data_file_directories and commitlog_directory on separate disks, which can help to distract the disk I/O to the overall system.

Saved_caches_directory
The location where the data cache file is stored on disk.
Commitlog_rotation_threshold_in_mb
The size of each commitlog file.

Commitlog_sync
The way to record Commitlog.
The items you can select are:
1. Periodic
The cycle record Commitlog, each time has the data more Xindu will operate the Commitlog.
2. Batch
Batch record Commitlog, the update of data in a period of time will be bulk Operation Commitlog.

Commitlog_sync_period_in_ms
The time interval at which the Commitlog file is refreshed when the period record Commitlog. This option can only be set when commitlog_sync= periodic.

Commitlog_sync_batch_window_in_ms
The time interval for bulk operations caching when Commitlog is recorded in bulk. This option can only be set when commitlog_sync= batch.

Seeds
Cassandra the seed node address in the cluster
This option allows you to set multiple values, that is, there are multiple seed nodes in the Cassandra Cluster.
All servers in the cluster will communicate with the seed node at boot time to obtain information about the cluster. If a server is set up as a seed node, it is automatically added to the cluster at boot time and does not perform bootstrap operations, i.e., the data cannot be fetched from other nodes in the cluster.

Disk_access_mode
Cassandra use the form of a virtual memory map when accessing the data file and the index file in the sstable file.

The items you can select are:
1. Auto
Automatically select the appropriate form of file access, if it is a 64-bit system, it is mmap form, otherwise standard form.

2. Mmap
When accessing the data file and the index file in the Sstable file, the virtual memory map is used.

3. Mmap_index_only
The index file in the sstable file is accessed in the form of a virtual memory map.

4. Standard
When accessing the data file and the index file in the Sstable file, the virtual memory map is not used.
Accessing files in the form of virtual memory maps can speed up read and write files, but this is at the expense of memory consumption. Therefore, according to the actual memory size and file size to choose the appropriate way to access the file.

Concurrent_reads
The number of threads concurrently read.
The larger the option is set, the more threads Cassandra can use to perform the read operation. The recommended configuration is: number of CPUs.
Concurrent_writes
The number of concurrently written threads.
The larger the option is set, the more threads the Cassandra can use for write operations.

Memtable_flush_writers
The number of concurrent writes of data in memtable to the disk becoming a sstable file.
The default configuration for this option is the number of directories specified in Data_file_directories.

sliced_buffer_size_in_kb
Reads the cache size used by the Sstable file when a range read operation is performed.

Storage_port
Cassandra the port number that the server and server communicate with each other in the cluster.

Listen_address
Cassandra addresses that communicate with each other between servers and servers in a cluster. If left blank, the server's machine name will be used by default.

Rpc_address
The address of the Cassandra Server for external service. If left blank, the server's machine name will be used by default.

Rpc_port
The port number of the Cassandra Server for external service.

Rpc_keepalive
Cassandra Server External Service connection has been maintained.

Thrift_framed_transport_size_in_mb
Use the thrift frame to pass the data size each time. If this option is 0, thrift Frame is disabled.

Thrift_max_message_length_in_mb
The maximum number of data passed using thrift.

Snapshot_before_compaction
Cassandra whether to take a data snapshot (snapshot) of the sstable file that needs to be compressed before performing a data compression operation.

Binary_memtable_throughput_in_mb
The cache size of binary memtable.
Binary memtable is used for initialization operations of large amounts of data.

column_index_size_in_kb
The data file in the Sstable file corresponds to the size interval of the column index. If the value is smaller, it is faster to find the corresponding value in the column index, but consumes more memory and disk space.

In_memory_compaction_limit_in_mb
When Cassandra performs data compression, if the size of the data corresponding to a key exceeds the limit of IN_MEMORY_COMPACTION_LIMIT_IN_MB, it will compress with a mechanism of delayed compression to avoid excessive memory.

Rpc_timeout_in_ms
If the Cassandra server is processing an external request, it will throw a timeout exception to the calling client if the Rpc_timeout_in_ms limit is exceeded.

Endpoint_snitch
The selection strategy of the network in the Cassandra Cluster.
Cassandra defines a selection strategy for a range of networks, which can be selected as:
1. Org.apache.cassandra.locator.SimpleSnitch
2. Org.apache.cassandra.locator.RackInferringSnitch
3. Org.apache.cassandra.locator.PropertyFileSnitch

Dynamic_snitch
Whether to enable dynamic node selection policies. Starting this option can be done effectively to avoid the corresponding slow node.

The other options associated with this option are:
Dynamic_snitch_update_interval_in_ms
Dynamic_snitch_reset_interval_in_ms
Dynamic_snitch_badness_threshold

Request_scheduler
Set Resource scheduling allocation policy
Cassandra defines a selection strategy for a range of networks, which can be selected as:
1. Org.apache.cassandra.scheduler.NoScheduler
All of the compute resources allocated for the request are equal.
2. Org.apache.cassandra.scheduler.RoundRobinScheduler
Assign different computing resources to different keyspace.
Suitable for use with Roundrobinscheduler in multi-tenancy situations.

Index_interval
The index file in the Sstable file corresponds to the data size interval for the memory index. If the value is smaller, it is faster to find the corresponding value in the memory index, but consumes more memory.

Keyspaces
Defines the properties of the keyspace.
Name: Defines the names of the keyspace.
Replica_placement_strategy: Defines the backup strategy for the data, optional items are:
1. Org.apache.cassandra.locator.SimpleStrategy
2. Org.apache.cassandra.locator.OldNetworkTopologyStrategy
3. Org.apache.cassandra.locator.NetworkTopologyStrategy
4. Org.apache.cassandra.locator.LocalStrategy

Replication_factor: Defines the number of backups for the data.
Column_families: Defines the properties of column family
Column_type: Defines the type of column family. Can be set to super or standard, if not set, for standard type. The collation of the
Compare_with:column name. The optional entries are:
1. Asciitype
2. Utf8type
3. Lexicaluuidtype
4. Timeuuidtype
5. Longtype
6. Integertype

The collation of the column name under

Compare_subcolumns_with:supercolumn. The optional entries are:
1. Asciitype
2. Utf8type
3. Lexicaluuidtype
4. Timeuuidtype
5. Longtype
6. Integertype
The number of Rows_cached:row caches, which can be integers or percentages. The number of
Keys_cached:key caches, which can be integers or percentages.
Row_cache_save_period_in_seconds: Defines the time interval for persisting row caches in column family, and if 0, turns off the persisted row cache feature.
Key_cache_save_period_in_seconds: Defines the time interval for persisting key caches in column family and, if 0, turns off the persistent key cache feature.
Gc_grace_seconds: Defines the time interval for which data in column family is marked for deletion to true physical deletion, or 10 days (864,000 seconds) if not set.
Memtable_flush_after_mins: Defines the maximum time-to-live for memtable in column family.
MEMTABLE_THROUGHPUT_IN_MB: Defines the size of the data memtable the largest cache in column family.
Memtable_operations_in_millions: Defines the number of data bars memtable maximum cache in column family.
Min_compaction_threshold: Defines the minimum number of sstable files that perform data compression in column family.
Max_compaction_threshold: Defines the maximum number of sstable files that perform data compression in column family.

Default_validation_class: Defines the type rule for default checksum values in column family. The optional items are:
1. Asciitype
2. Utf8type
3. Lexicaluuidtype
4. Timeuuidtype
5. Longtype
6. Integertype

Column_metadata: Defines the properties of a level two index.
Name: Defines the column name that requires a level two index.

Validator_class: Defines the type rule for the value of column family. The optional items are:
1. Asciitype
2. Utf8type
3. Lexicaluuidtype
4. Timeuuidtype
5. Longtype
6. Integertype

Index_type: Defines the type of level two index, the currently supported options are: KEYS

Cassandra.yaml Configuration

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More