Cassandra Primary Key Explanation

Source: Internet
Author: User
Tags cassandra

In the abstract design model, we often need to face another problem, that is, how to specify each column family the various keys used. In various documents related to Cassandra, we often encounter the following series of key nouns: Partition key,clustering key,primary key and composite key. So what are they referring to?

Primary key is actually a very general concept. In Cassandra, it represents one or more columns that are used to obtain data from the Cassandra:

1 Create TableSample (2     Key text PRIMARY KEY,3Datatext4);

In the example above, we specified the key field as the primary key for sample. A primary key can also be composed of multiple columns, if needed:

1 Create TableSample {2Key_onetext,3Key_twotext,4Datatext,5     PRIMARY KEY(Key_one, Key_two)6};

In the example above, the primary key we created is a composite key consisting of two columns Key_one and Key_two. The first component of the composite key is called the partition key, and the subsequent components are called the clustering key. Partition key is used to determine which node in the cluster the Cassandra uses to record the data, and each Partition key corresponds to a specific Partition. Clustering key is used to sort inside the partition. If a primary key contains only one domain, it will only have partition key and no clustering key.

Partition Key and clustering key can also be made up of multiple columns:

1 Create TableSample {2Key_primary_onetext,3Key_primary_twotext,4Key_cluster_onetext,5Key_cluster_twotext,6Datatext,7     PRIMARY KEY(Key_primary_one, Key_primary_two), Key_cluster_one, Key_cluster_two)8};

In a CQL statement, the conditions indicated by the WHERE clauses can only use the columns used in primary key. Depending on your data distribution, you need to decide what should be partition key and which should be used as clustering key to sort the data.

A good partition key design often greatly improves the performance of the program. First, because partition key is used to control which node records data, partition key can determine whether the data can be distributed more evenly across Cassandra nodes to make the most of these nodes. At the same time, with the help of partition key, your read request should try to use a smaller number of nodes. This is because the Cassandra needs to coordinate processing of the datasets obtained from each node when the read request is performed. Therefore, in response to a read operation, fewer nodes can provide higher performance. Therefore, in the model design, how to specify the model's partition key according to each request that needs to run is a key in the whole design process. A field that is evenly distributed, but often in the request as an input condition, is often a partition Key that can be considered.

In addition to this, we should also consider how to set the clustering Key of the model properly. Because clustering key can be used to sort inside the partition, it is better supported for various requests that contain scope filtering.

Cassandra Primary Key Explanation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.