Summary
There are many limitations to CQL compared to SQL because Cassandra is designed for large data storage, and its deployment patterns are based on partitioning, unlike MONGO and replica sets, a small database cluster design that is fragmented when data is large. To provide retrieval efficiency, the CQL syntax is limited to avoid inefficient query statements. The Cassandra data is distributed to each node according to the hash calculation of partition key, and the efficiency of scanning each node is very low. So one of the basic principles of Cassandra Query search is to find as few nodes as possible. CQ L Overview
relational database is a collection of rows, Cassandra is a collection of partitions, if there is no clustering key, each partition is a single line, and a partition containing multiple lines is called wide line (Wide-row). Cassandra depends on the hash value of the partition key to determine which node the data is stored in, which is the equivalent of a hash index, so it cannot be queried in scope. Then in each partition according to the clustering key to sort,
This is not based on hash, so you can query according to the scope. The following table, for example, node is partition key, (Date,number) is clustering key.
CREATE keyspace Test with REPLICATION = {
' class ': ' Simplestrategy ',
' replication_factor ': 1
};
CREATE TABLE Log (
node text,
date text,
name text, number
int,
Primary Key (node,date,number)
);
That's how data is stored in a relational database.
node |
Date |
| Number
name |
N1 |
He |
1 |
Name1 |
N1 |
He |
2 |
Name2 |
N2 |
He |
1 |
Name3 |
N2 |
He |
2 |
Name4 |
And in Cassandra is
Partition1
{
N1:feb {1:name:name1}}
{2:name:name2}}}
Partition2
{
N2:feb {1:name:name3}}
{1:name:name4}}}
where 1:NAME:NAME3 this structure
Key is 1:name,name3 for value
Cassandra columns are not fixed, so the column names are also saved. Scope Query
In operation
Cassandra 2.2 was used in only the last column of the partition key, and after 2.2 improved any column that could be used for partition key. But note in the efficiency of the operation is low, in the summary has been said to reduce the query node as far as possible, in the operation will query multiple nodes, if set Replicat factor to 3, the query node number will increase three times times. This coordinate node query pressure will increase, you need to save the query results of each node, it may cause GC pause, heap memory increase.
Cassandra 3.0 in support of the columns in the clustering key, but if it is a single column, you must specify the value of the previous clustering key. (= or in)
Range Query >,<
Partition Key
Cassandra is based on the hash-step data of the partition key, so it does not support scope queries.
Allows you to use the token function for scoping queries on partition key fields.
SELECT * FROM log WHERE token (node) > token (' 1 ')
Note: The partition Byteorderedpartitioner is the orderly distribution of data, so the theory should be able to support the scope query, but the use of this partition can easily lead to the imbalance of data distribution, so a move does not recommend the use of
Clustering Key
In a single row, a range query can only be used for the last column. The preceding column must be given
Select *from Log where node = ' 1 ' and date >= ' date1 '
select *from Log where node = ' 1 ' and date = ' Date ' and number >=1
This is what's not valid.
SELECT *from Log where node = ' 1 ' and number >= 1
Summary
Partition key is based on hash, does not support greater than, less than this range query, in operation is supported.
The clustering key is sorted, supported in, greater than or less than the query. It's the equivalent of a federated index, so it's guaranteed to have the previous fields for the federated index to take effect.
When no partition key is given, and given clustering key, all nodes need to be scanned and filtered. Therefore, execution efficiency may be lower. So Cassandra requires the use of allow filtering. When the data is filtered to take up the proportion of the query data is relatively high, but also more effective. problem
Problem 1:where syntax does not support or
Not supported, or is typically used in SQL to reduce requests to DB. Each of these conditions constitutes a single query. Cassandra requires that as much as possible to reduce the request node, requiring each query to specify partition key, obviously for or so many fields of conditions, it is difficult to analyze the context.
The problem 2:cassandra support does not support fuzzy query
Like operations are not supported, cassandra3.4 can be used later, but there are many limitations. You can look at this article to introduce http://www.tsoft.se/wp/2016/08/12/sql-like-operation-in-cassandra-is-possible-in-v3-4/
Question 3: Does a table support thousands of fields?
Support, there is no limit on the number of fields, only a 2 billion limit on the number of paritition. So the more columns, the corresponding number of rows can be stored less. Reference
https://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/
Https://wiki.apache.org/cassandra/CassandraLimitations
http://www.tsoft.se/wp/2016/08/12/sql-like-operation-in-cassandra-is-possible-in-v3-4/
Https://github.com/xedin/sasi
Http://opensourceconnections.com/blog/2013/07/24/understanding-how-cql3-maps-to-cassandras-internal-data-structure/