Original Top-down of the Cassandra Basic data model

Source: Internet
Author: User
Tags cassandra

A preface

  In the previous article, I briefly described the installation and launch of Cassandra on the Windows platform, and described the basic data model of Cassandra in a bottom-up perspective. Before I learn a new thing, I think the best way to do this is to get to the macro and start with the details. This article analyzes the Cassandra data model from a top-down perspective.

Two concept of clusters

At the beginning of the design, Cassandra took into account the need to work together across multiple hosts, presenting a whole distributed system architecture to the user. So, the need for this is that the outermost structure of Cassandra is the cluster (Cluster), or ring, because Cassandra makes all the nodes in the cluster a ring, and assigns the data to the nodes in the cluster.

When it comes to clustering, we think of data synchronization. Cassandra clusters do not have a master-slave concept, all nodes are peers, and they synchronize data between nodes with peer to peer protocol.

Three Keyspace

A cluster is a keyspace container, and a cluster usually has only one keyspace. Keyspace is the outermost container of data in Cassandra, so it can be likened to a DB instance in a relational db. In the design of Cassandra, Keyspace has the following basic properties:

1 replica factor (Replication factor)

The role of a replica factor is to control how many copies of the data are in the cluster. In essence, the replica factor determines how much the performance cost is to be paid for C in the CAP theory.

2 Copy placement policy (Replica placement strategy)

Refers to how the data is distributed to the ring, and this policy directly affects how the data's key values map to nodes on the ring.

3 Row family (column family)

In Cassandra, Keyspace is a container for one or more column families, similar to a table in a relational db, which is a container that aggregates multiple rows of data.

Four row family (column family)

A column family is a container that holds an ordered set of rows, each containing a set of ordered columns. Here are some ways to tell the difference between a column family and a table in a relational db:

1 Cassandra is considered to be schema-free because arbitrary columns can be added to the column family at will. Instead of a relational db, a column is determined when a property of the table is defined.

2 column families In addition to the tables in the relational DB have a name, there is also a property called the comparer (comparator). It determines how the columns that are returned when querying data can be sorted by Long,byte,utf8 or other means.

3 in relational db, the way data is organized on disk is transparent to the user. However, in Cassandra, each column family is stored as a different file, so it is important to put the related columns in the same column family from the query performance.

It needs to be clear that the column family is the container for the row, and the row is a container for many columns. Rows and columns are the name values that uniquely identify them locally.

Five columns (column)

The column is the most basic data structure in Cassandra. It is a ternary group of names (name), values (value), and timestamps, as shown in:

  

The structure of the column is very simple, plainly speaking is name-value. It is important to note that it differs from the columns in the relational DB: First, in the relational db, you need to preview the names and attributes of all the columns to define the structure of the table, and then you can provide the values according to the defined properties, and in Cassandra, the application will write to the columns as needed, which greatly improves flexibility. It also allows the data to change incrementally over time, and second, once the name of the column is defined in the relational db, it is not recommended to modify it, and all rows of data are parsed according to the corresponding columns. In Cassandra, the definition of a column is entirely up to the application, meaning that name and value in a column can store any data that the user wants to store, and name is also a data in a sense, and finally, the biggest difference between the columns in the Cassandra column and the relational DB is that The number of rows in the Cassandra is variable, depending on the requirements of the application, and the columns of each row in the relational db are invariant.

Summary of Six

This article analyzes Cassandra's data model from a top-down perspective, and briefly describes some of the differences with relational db. The biggest difference between Cassandra and relational db is that it is no longer data-centric when designing a database's schema, but rather a query-centric one. You need to design a particular data schema for a particular query to get the highest query performance.

If there is any mistake, please leave a message!

Original Top-down of the Cassandra Basic data model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.