Cassandra Basic Introduction (2)-Cassandra Overview

Last Update:2016-06-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the previous section we described the problems that the RDBMS encountered, and this section describes whether Cassandra and Cassandra can resolve the issue.

Through this section, we will learn:

What is Cassandra
Hash distribution of Cassandra data
Cassandra trade-offs in caps
Cassandra Replication
Cassandra Adjustable Consistency
Cassandra Multi-Data center

What is Cassandra

Apache Cassandra is an open-source, distributed, center-free, resilient, highly available, fault-tolerant, consistent, adjustable, column-oriented database that is created on Facebook based on the distributed design of Amazon Dynamo and the Google bigtable database. The summary features are as follows:

Distributed and non-centric

Distributed means that it can run on more than one machine, while presenting to the user is a whole. No center means that there is no single point in the Cassandra, that is, each node is the same, and no node takes on special management tasks. In contrast to the master/slave structure, the Cassandra protocol is peer-to, and uses gossip to maintain a list of surviving or dead nodes.

The PS:GOSSIP algorithm is also called the inverse Entropy (anti-entropy), entropy is a physics concept, represents the chaos, and the inverse entropy is in the chaos seeks the agreement, this fully illustrates the gossip characteristic: in a bounded network, each node randomly communicates with other nodes, After a chaotic communication, the state of all nodes will eventually be agreed upon. Each node may know all other nodes, or only a few neighbor nodes, as long as they can be connected through the network, eventually their state is consistent, of course, this is the characteristics of the spread of the epidemic.

High availability and fault tolerance

From a general architecture point of view, the high availability of the system is measured by the ability to satisfy the request. But computers can have a variety of failures, from hardware failures to network outages. So for a system that needs to be highly available, it must be made up of multiple networked computers, and the software running on it must be able to operate under cluster conditions, with the device able to identify the node failure and to recover the failed interrupt function on the remaining system. The Cassandra is highly available. The failure node can be replaced without disrupting the system, and data can be distributed across multiple data centers to provide better local access performance and prevent the system from being completely paralyzed in the event of an irresistible disaster such as a fire in a data center. Linear scaling because the Cassandra uses the peer-to protocol, it is easy to scale horizontally, and the performance increases linearly. Acid Support Good Cassandra Consistency adjustable: strict consistency ~ final consistency. Also supports lightweight transactions through CAS (compareandset). Without Spof (single point of failure) easy to manage operations Cassandra it is easy to add, delete, replace nodes, and so on.

Hash distribution of Cassandra data

The data is partitioned around the ring.
All nodes store data and respond to queries (both readable and writable)
The data is located by the partition key (partition key).

650) this.width=650; "Src=" Http://s5.51cto.com/wyfs02/M01/82/78/wKioL1dWYyuhwJk8AABtLN_kCmQ548.jpg-wh_500x0-wm_3 -wmp_4-s_3940501810.jpg "title=" 6.jpg "alt=" Wkiol1dwyyuhwjk8aabtln_kcmq548.jpg-wh_50 "/>

650) this.width=650; "src=" http://s5.51cto.com/wyfs02/M01/82/78/wKioL1dWY8nwc1vMAACYLbk-LgQ086.jpg "title=" 7.jpg " alt= "Wkiol1dwy8nwc1vmaacylbk-lgq086.jpg"/>

Cassandra trade-offs in caps

It is impossible to meet consistency and highly at the same time in satisfying partitioning conditions available
Cross-Datacenter latency also results in inconsistent inconsistencies
Cassandra selected availability and partitioning (Cassandra consistency is adjustable)

Ca:

The primary support for consistency and availability means that you will most likely need to use a two-phase commit distributed transaction. In other words, if the network splits, the system may stop responding.

Ap:

The primary support for availability and partition fault tolerance means that you may have to return less accurate data, but the system will always be available.

650) this.width=650; "Src=" Http://s4.51cto.com/wyfs02/M01/82/78/wKioL1dWan3RVCNyAAFL33zoo_c876.png-wh_500x0-wm_3 -wmp_4-s_2028030975.png "title=" CAP "alt=" Wkiol1dwan3rvcnyaafl33zoo_c876.png-wh_50 "/>

Cassandra Replication

The data is automatically copied, and you only need to select the number of replication servers. Define the number of copies we call "replication factor" or RF.

If a machine is down, the lost data is played back through the "prompt handover" (hinted handoff). (hinted handoff will be in the follow-up course)

650) this.width=650; "Src=" Http://s1.51cto.com/wyfs02/M00/82/78/wKioL1dWa5KQccgwAAKuWzrKgXk673.png-wh_500x0-wm_3 -wmp_4-s_1823836270.png "title=" 1.png "alt=" Wkiol1dwa5kqccgwaakuwzrkgxk673.png-wh_50 "/>

Cassandra Adjustable Consistency
Each query can specify a consistency level: All,quorum,one. means how many copies of the response.

Cassandra is often referred to as "final consistency", which is actually a bit misleading. Simply put, Cassandra sacrifices a bit of consistency in exchange for full availability. But Cassandra should actually be described as "tunable consistency", which allows you to easily select the exact consistency and eventual coherence needed to find a balance between the two.

650) this.width=650; "Src=" Http://s4.51cto.com/wyfs02/M00/82/7A/wKiom1dWbDuwf98YAAXQB5rflow258.png-wh_500x0-wm_3 -wmp_4-s_4019820473.png "title=" 1.png "alt=" Wkiom1dwbduwf98yaaxqb5rflow258.png-wh_50 "/>

Cassandra Multi-Data center

Typical use case: Clients writes to the local DC, asynchronously replicates to other DCs
Each data center has a replication factor for each keyspace, which means that each data center is highly available
The data center can be physical or logical

650) this.width=650; "Src=" Http://s3.51cto.com/wyfs02/M01/82/7A/wKiom1dWbNbARFx7AAJMJUtBw5s564.png-wh_500x0-wm_3 -wmp_4-s_1792251589.png "title=" 1.png "alt=" Wkiom1dwbnbarfx7aajmjutbw5s564.png-wh_50 "/>

This article is from the Java Architect's Road blog, so be sure to keep this source http://eric100.blog.51cto.com/2535573/1786942

Cassandra Basic Introduction (2)-Cassandra Overview

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Cassandra Basic Introduction (2)-Cassandra Overview

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support