Cassandra CQL Analysis

Last Update:2018-07-26 Source: Internet

Author: User

Tags cassandra

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Summary

There are many limitations to CQL compared to SQL because Cassandra is designed for large data storage, and its deployment patterns are based on partitioning, unlike MONGO and replica sets, a small database cluster design that is fragmented when data is large. To provide retrieval efficiency, the CQL syntax is limited to avoid inefficient query statements. The Cassandra data is distributed to each node according to the hash calculation of partition key, and the efficiency of scanning each node is very low. So one of the basic principles of Cassandra Query search is to find as few nodes as possible. CQ L Overview

relational database is a collection of rows, Cassandra is a collection of partitions, if there is no clustering key, each partition is a single line, and a partition containing multiple lines is called wide line (Wide-row). Cassandra depends on the hash value of the partition key to determine which node the data is stored in, which is the equivalent of a hash index, so it cannot be queried in scope. Then in each partition according to the clustering key to sort,
This is not based on hash, so you can query according to the scope. The following table, for example, node is partition key, (Date,number) is clustering key.

CREATE keyspace Test with REPLICATION = {
    ' class ': ' Simplestrategy ',
    ' replication_factor ': 1
};

CREATE TABLE Log (
    node text,
    date text,
    name text, number
    int,
    Primary Key (node,date,number)
);

That's how data is stored in a relational database.

Number

node	Date		name
N1	He	1	Name1
N1	He	2	Name2
N2	He	1	Name3
N2	He	2	Name4

And in Cassandra is

Partition1

{
    N1:feb {1:name:name1}} 
             {2:name:name2}}}

Partition2

{
    N2:feb {1:name:name3}} 
             {1:name:name4}}}

where 1:NAME:NAME3 this structure
Key is 1:name,name3 for value

Cassandra columns are not fixed, so the column names are also saved. Scope Query

In operation

Cassandra 2.2 was used in only the last column of the partition key, and after 2.2 improved any column that could be used for partition key. But note in the efficiency of the operation is low, in the summary has been said to reduce the query node as far as possible, in the operation will query multiple nodes, if set Replicat factor to 3, the query node number will increase three times times. This coordinate node query pressure will increase, you need to save the query results of each node, it may cause GC pause, heap memory increase.

Cassandra 3.0 in support of the columns in the clustering key, but if it is a single column, you must specify the value of the previous clustering key. (= or in)

Range Query >,<

Partition Key

Cassandra is based on the hash-step data of the partition key, so it does not support scope queries.
Allows you to use the token function for scoping queries on partition key fields.

SELECT * FROM log WHERE token (node) > token (' 1 ')

Note: The partition Byteorderedpartitioner is the orderly distribution of data, so the theory should be able to support the scope query, but the use of this partition can easily lead to the imbalance of data distribution, so a move does not recommend the use of

Clustering Key

In a single row, a range query can only be used for the last column. The preceding column must be given

Select *from Log where node = ' 1 ' and date >= ' date1 '

select *from Log where node = ' 1 ' and date = ' Date ' and number >=1

This is what's not valid.

SELECT *from Log where node = ' 1 ' and number >= 1

Summary

Partition key is based on hash, does not support greater than, less than this range query, in operation is supported.

The clustering key is sorted, supported in, greater than or less than the query. It's the equivalent of a federated index, so it's guaranteed to have the previous fields for the federated index to take effect.

When no partition key is given, and given clustering key, all nodes need to be scanned and filtered. Therefore, execution efficiency may be lower. So Cassandra requires the use of allow filtering. When the data is filtered to take up the proportion of the query data is relatively high, but also more effective. problem

Problem 1:where syntax does not support or

Not supported, or is typically used in SQL to reduce requests to DB. Each of these conditions constitutes a single query. Cassandra requires that as much as possible to reduce the request node, requiring each query to specify partition key, obviously for or so many fields of conditions, it is difficult to analyze the context.

The problem 2:cassandra support does not support fuzzy query

Like operations are not supported, cassandra3.4 can be used later, but there are many limitations. You can look at this article to introduce http://www.tsoft.se/wp/2016/08/12/sql-like-operation-in-cassandra-is-possible-in-v3-4/

Question 3: Does a table support thousands of fields?

Support, there is no limit on the number of fields, only a 2 billion limit on the number of paritition. So the more columns, the corresponding number of rows can be stored less. Reference

https://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/

Https://wiki.apache.org/cassandra/CassandraLimitations

http://www.tsoft.se/wp/2016/08/12/sql-like-operation-in-cassandra-is-possible-in-v3-4/

Https://github.com/xedin/sasi

Http://opensourceconnections.com/blog/2013/07/24/understanding-how-cql3-maps-to-cassandras-internal-data-structure/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More