Cassandra CQL Analysis

Source: Internet
Author: User
Tags cassandra
Summary

There are many limitations to CQL compared to SQL because Cassandra is designed for large data storage, and its deployment patterns are based on partitioning, unlike MONGO and replica sets, a small database cluster design that is fragmented when data is large. To provide retrieval efficiency, the CQL syntax is limited to avoid inefficient query statements. The Cassandra data is distributed to each node according to the hash calculation of partition key, and the efficiency of scanning each node is very low. So one of the basic principles of Cassandra Query search is to find as few nodes as possible. CQ L Overview

relational database is a collection of rows, Cassandra is a collection of partitions, if there is no clustering key, each partition is a single line, and a partition containing multiple lines is called wide line (Wide-row). Cassandra depends on the hash value of the partition key to determine which node the data is stored in, which is the equivalent of a hash index, so it cannot be queried in scope. Then in each partition according to the clustering key to sort,
This is not based on hash, so you can query according to the scope. The following table, for example, node is partition key, (Date,number) is clustering key.

CREATE keyspace Test with REPLICATION = {
    ' class ': ' Simplestrategy ',
    ' replication_factor ': 1
};

CREATE TABLE Log (
    node text,
    date text,
    name text, number
    int,
    Primary Key (node,date,number)
);

That's how data is stored in a relational database.

Number
node Date name
N1 He 1 Name1
N1 He 2 Name2
N2 He 1 Name3
N2 He 2 Name4

And in Cassandra is

Partition1

{
    N1:feb {1:name:name1}} 
             {2:name:name2}}}

Partition2

{
    N2:feb {1:name:name3}} 
             {1:name:name4}}}

where 1:NAME:NAME3 this structure
Key is 1:name,name3 for value

Cassandra columns are not fixed, so the column names are also saved. Scope Query

In operation

Cassandra 2.2 was used in only the last column of the partition key, and after 2.2 improved any column that could be used for partition key. But note in the efficiency of the operation is low, in the summary has been said to reduce the query node as far as possible, in the operation will query multiple nodes, if set Replicat factor to 3, the query node number will increase three times times. This coordinate node query pressure will increase, you need to save the query results of each node, it may cause GC pause, heap memory increase.

Cassandra 3.0 in support of the columns in the clustering key, but if it is a single column, you must specify the value of the previous clustering key. (= or in)

Range Query >,<

Partition Key

Cassandra is based on the hash-step data of the partition key, so it does not support scope queries.
Allows you to use the token function for scoping queries on partition key fields.

SELECT * FROM log WHERE token (node) > token (' 1 ') 

Note: The partition Byteorderedpartitioner is the orderly distribution of data, so the theory should be able to support the scope query, but the use of this partition can easily lead to the imbalance of data distribution, so a move does not recommend the use of

Clustering Key

In a single row, a range query can only be used for the last column. The preceding column must be given

Select *from Log where node = ' 1 ' and date >= ' date1 '

select *from Log where node = ' 1 ' and date = ' Date ' and number >=1

This is what's not valid.

SELECT *from Log where node = ' 1 ' and number >= 1
Summary

Partition key is based on hash, does not support greater than, less than this range query, in operation is supported.

The clustering key is sorted, supported in, greater than or less than the query. It's the equivalent of a federated index, so it's guaranteed to have the previous fields for the federated index to take effect.

When no partition key is given, and given clustering key, all nodes need to be scanned and filtered. Therefore, execution efficiency may be lower. So Cassandra requires the use of allow filtering. When the data is filtered to take up the proportion of the query data is relatively high, but also more effective. problem

Problem 1:where syntax does not support or

Not supported, or is typically used in SQL to reduce requests to DB. Each of these conditions constitutes a single query. Cassandra requires that as much as possible to reduce the request node, requiring each query to specify partition key, obviously for or so many fields of conditions, it is difficult to analyze the context.

The problem 2:cassandra support does not support fuzzy query

Like operations are not supported, cassandra3.4 can be used later, but there are many limitations. You can look at this article to introduce http://www.tsoft.se/wp/2016/08/12/sql-like-operation-in-cassandra-is-possible-in-v3-4/

Question 3: Does a table support thousands of fields?

Support, there is no limit on the number of fields, only a 2 billion limit on the number of paritition. So the more columns, the corresponding number of rows can be stored less. Reference

https://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/

Https://wiki.apache.org/cassandra/CassandraLimitations

http://www.tsoft.se/wp/2016/08/12/sql-like-operation-in-cassandra-is-possible-in-v3-4/

Https://github.com/xedin/sasi

Http://opensourceconnections.com/blog/2013/07/24/understanding-how-cql3-maps-to-cassandras-internal-data-structure/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.