Cassandra data Model design, design according to your query--inverse paradigm design essence: space Change time

Source: Internet
Author: User
Tags cassandra map data structure relational database table

transferred from: Http://www.infoq.com/cn/articles/best-practice-of-cassandra-data-model-design don't put Cassandra model think of it as a relational database Table

Instead, think of it as an orderly map structure.

For a novice, the following relational database terminology is often mapped to the Cassandra model

This comparison can help us to transition from a relational database to a non-relational database. But do not use this analogy when designing Cassandra column Famiy. Instead, consider that it is a map embedded in another map: The external map key is row key, the internal map key is column key, and two map keys are ordered. As follows:

Sortedmap<rowkey, Sortedmap<columnkey, columnvalue>>
Why ?

Imagine that column family is more accurate than the relational database table description, which will help you properly design the Cassandra model.

How ?
    • Maps can be efficiently queried, while sorting features can be efficiently column-scanned. In Cassandra, we can use row key and column key for efficient find and range scanning
    • The number of Column keys is very large (translator note: The Cassandra1.2.5 version currently used by the translator, each row supports up to 2 billion columns). In other words you, you can have a wide rows.
    • Column key itself can store a value, that is, you can have a column that has no value.

If the cluster uses the order preserving Partitioner (OOP) policy for data storage, the row key can be queried in scope. However, OOP is not recommended in most cases (translator Note: The Rowkey is stored sequentially on the node, if the partition is uneven, will result in data read and write unbalanced), so you can assume that the external map is not sorted, as follows:

Map<rowkey, Sortedmap<columnkey, columnvalue>>

The "Super Column" mentioned above considers them to be a set of Column, so that the level two nested map becomes a level three nested map as shown below:

Map<rowkey, Sortedmap<supercolumnkey,
           Sortedmap<columnkey, columnvalue>>>

Attention:

    • You need to pass timestamp to each column value, because Cassandra uses it to do the internal conflict handling mechanism. But in the modeling process you can ignore it (the translator Note: Timestamp information is automatically added to column when you manipulate the column). Also, do not consider using column timestamp in your program, because it is not designed for you, and unlike HBase, they do not generate new version data (Translator Note: Same Rowkey and column in HBase) Key's data will hold multiple version, and Cassandra will overwrite the same data, timestamp only save the last update time).
    • Because of the performance problems of super column and the lack of a two-level index support problem, the Cassandra community has had a strong controversy about its use. Therefore, it is recommended to use composite columns instead of Super column implementation. (Translator Note: Using Super column, if you want to get one of the Columnvalue, scan the entire Super column, which can cause poor query performance)
around the query pattern Column Family Modeling

Modeling as far as possible starts with the entities and their relationships

    • Unlike relational databases, it is not easy to create a new or modify query in Cassandra by creating a two-level index or writing complex SQL (using joins, order by, group by). Because the Cassandra has very high distributed features, consider the query pattern before you design the column family.
    • Keeping in mind the previously mentioned embedded sort map data structure, consider how to organize your data into maps to meet the requirements of fast query/Sort/grouping/filtering/aggregation.

In most cases, entities and their relationships are important (except for special use cases, such as log storage or other time series data). If I give you a query pattern to create a Cassandra model for an ecommerce site, but don't tell you any entities and their relationships. You will intentionally or unintentionally find the relationship between entities and their relationships from the query pattern or from the understanding of your previous domain objects (because we describe the real world through entities and relationships). It is best to start with entities and relationships when designing the data model, and then continue to model the query pattern in a way that is reversed-normalized and redundant. If this sounds confusing, it can be understood through the following detailed examples.

Note: It would be helpful to consider the following points when modeling. To differentiate between frequently-frequency queries and small-frequency queries, some queries may be queried only thousands of times, others may be queried billions of times, and which queries are sensitive to data latency. Make sure your model takes precedence over queries with large queries and critical queries.

Cassandra data Model design, design according to your query--inverse paradigm design essence: space Change time

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.