Cassandra distributed database, part 1th: Configuration, start-up and clustering

Source: Internet
Author: User
Tags cassandra config join

Detailed configuration of Cassandra

Understanding the meaning of a software configuration item is a prerequisite for using this software, which details the meaning of each configuration item in the Cassandra configuration file (Storage-config.xml), which contains a number of configuration parameters that we can adjust to achieve the desired performance. In order to save space there is no listing of the contents of the Storage-config.xml file, you can look at this document to see the following content.

ClusterName

Cluster Name represents the identity of a family, which typically represents a cluster. This configuration item must be specified when Cassandra does not store the data, and when Cassandra first starts it is written to the Cassandra system table, and if you want to modify Cluster Name You must delete the Cassandra data.

Autobootstrap

This configuration item looks very simple, but if you don't have a thorough understanding of Cassandra, I'm afraid I don't know what Cassandra might happen when you change this configuration.

We know that the Cassandra cluster is to achieve the autonomy of nodes in the cluster by maintaining an adaptive Token loop, not only to ensure the synchronization and consistency of the state of each machine, but also to ensure the rationality of Token distribution between them, by dividing the Token To achieve the balance of the load per machine.

What is the connection between this configuration item and the Token and load? In fact, it looks like this configuration item is automatically joined to the cluster when the node is started. But does it not join the cluster when you set it to False? Obviously not, it depends on whether you have configured seeds, and if you configure other seed, it will still join the cluster.

So what's the difference? By analyzing its startup code, this configuration entry is related not only to the seed configuration item, but also to whether Cassandra was first started. Cassandra's starting rules are as follows:

When Autobootstrap is set to FALSE, the first time Cassandra records autobootstrap=true in the system table to indicate that this is automatically set by the system, but this is only a sign to determine your future startup.

When Autobootstrap is set to TRUE for the first time, Cassandra determines whether the current node is configured as a seed node, i.e. whether the native IP is in seeds. If in seeds, the Cassandra is the same as 1.

When Autobootstrap is set to TRUE, the first boot and not configured as Seed,cassandra will have a lengthy startup process, of course, the length of this time and your current cluster data volume has a great relationship. At this point Cassandra will dynamically adjust their balance according to the current cluster load. The way to adjust the balance is to assign a suitable Token to the node based on the current Token loop and pass this Token data to it.

As can be seen from the above analysis, the main purpose of the Autobootstrap setting is to adjust the load balance in the current cluster. In fact, there is a very important problem is that if the first case to start, if not specified Token, the node Token will be randomly generated, then the problem comes, when this random generation is Token join the cluster Token ring, Cassandra how to ensure The consistency of the data corresponding to the Token and Token, which is described later in this question.

Keyspaces

Cassandra Keyspace equivalent to the concept of table spaces in relational databases, can be understood as a container of the action table, which can define multiple columnfamily, which is equivalent to the table, which is the entity that stores the data.

The meaning of several attributes in Columnfamily is as follows:

ColumnType. There are two types of columns: Standard and Super, which are standard and hyper columns, and columns that have a column and a parent column.

Comparewith. Represents the collation of a column, which can be sorted according to different data types, such as Timeuuidtype, and can be sorted according to the time of insertion

Comparesubcolumnswith. The collation of a child column is similar to Comparewith

Rowscached. Query when the amount of data cached, can be how many, can also be a percentage, such as 10% is the cache 10% of the amount of data, this query performance impact is great, if the hit rate is high, can significantly improve query efficiency.

Keyscached. Cache columnfamily Key, this key is corresponding to the index.db in the data, if not in the rowscached hit, then to each sstable query, then must query key, if in keyscached can hit No need to query in index.db, omit IO operation.

Cassandra is a key/value system that is divided into the logical structure of its storage: keyspace, Key, columnfamily, Super column, and column sections. Obviously we can see that each pair of key/value has a parasitic container, so it's actually made up of a Map container. This container structure can be expressed in Figure 1 and figure:

Figure 1. Standard Column Structure Chart

Figure 2. Structure diagram containing Super Column

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.