The wrong idea about Cassandra

Source: Internet
Author: User
Tags cassandra set set uuid

Just as the name of the Apache Cassandra comes from the famous thing like the witch, there is indeed a variety of misunderstandings in it. Like most misunderstandings, they do have a point at least in the first place, but as Cassandra continues to deepen and improve, the content of these misconceptions has ceased to exist. In this article, I will explain five common puzzles and clarify people's confusion.

Misconception: Cassandra is a nested map

As applications that use Cassandra become more complex, the following ideas are becoming clearer: schemas and data types make it easier to develop and maintain large applications, compared to the design of "everything is an array buffer" or "anything is a string".

Today, the best way to understand the Cassandra data model is to think of it as a combination of tables and rows, and similar to relational data, Cassandra columns are strongly typed and can be indexed.

You may have heard these other statements:

"Cassandra is a column database. The column database holds all the data for a column on disk, which is more appropriate for the data warehouse retrieval, but is not appropriate for applications that need quick access to specific rows.

"Cassandra is a wide-row database. "There is some truth to this, because the Cassandra Storage engine is designed by BigTable, which is the ancestor of a wide-row database," he said. But the data model of the wide-row database is too tightly combined with the storage engine, and while it is easier to implement, development for it adds to the difficulty, and it makes many optimizations less likely.

One of the reasons we chose to avoid "table and row" in the first part is because there are some subtle differences between Cassandra's table and the tables you know about relational databases. First, the first element of the primary key is the partitioning key, and all rows in the same partition are stored on the same server, and the partitions are distributed throughout the cluster.

Second, Cassandra does not support associative queries and subqueries because of the poor performance of associative queries across hardware in distributed systems. Cassandra's approach is to encourage you to use a regularization (denormalization) approach, get the data you need from a separate table, and provide tools such as collections to simplify the operation.

For example, consider the users table represented by the following code:

CREATE TABLE users (
  user_id uuid PRIMARY KEY,
  name text, state
  text,
  birth_year int

);

Most mainstream services now take into account the situation where a user can have multiple email addresses. In a relational database, we need to establish a many-to-many relationship, and then associate the address with the user using an associated query, as shown in the following:

CREATE TABLE users_addresses (
  user_id uuid REFERENCES users,
  email text
);

SELECT * from
users NATURAL JOIN users_addresses;

In the Cassandra, we would add all the email addresses directly to the user table in a formalized manner, using a set set to achieve this perfectly:

ALTER TABLE users ADD email_addresses set<text>;

We can then add multiple addresses to the user in the following ways:

UPDATE users
SET email_addresses = {' jbe@gmail.com ', 

' jbe@datastax.com '}
WHERE user_id = ' 73844cd1-c16e-11e2-8bbd-

7cd1c3f676e3 '

For more on the Cassandra data model, including from expiration data (self-expiring) and distributed counters, refer to the online documentation,

Misconception: Cassandra is slower to read

Cassandra's log-structured storage engine means that it does not look for updates on the hard drive, nor does it cause a solid-state hard drive to be amplified, while its reading speed is fast.

The following illustrations are for random access read, random access and sequential scans, and mixed read and write data, which come from the results of the NoSQL performance metrics analysis at the University of Toronto:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.