Cassandra Basic Introduction (2)-Cassandra Overview

Source: Internet
Author: User

In the previous section we described the problems that the RDBMS encountered, and this section describes whether Cassandra and Cassandra can resolve the issue.

Through this section, we will learn:

    1. What is Cassandra

    2. Hash distribution of Cassandra data

    3. Cassandra trade-offs in caps

    4. Cassandra Replication

    5. Cassandra Adjustable Consistency

    6. Cassandra Multi-Data center


    • What is Cassandra

Apache Cassandra is an open-source, distributed, center-free, resilient, highly available, fault-tolerant, consistent, adjustable, column-oriented database that is created on Facebook based on the distributed design of Amazon Dynamo and the Google bigtable database. The summary features are as follows:

    1. Distributed and non-centric

Distributed means that it can run on more than one machine, while presenting to the user is a whole. No center means that there is no single point in the Cassandra, that is, each node is the same, and no node takes on special management tasks. In contrast to the master/slave structure, the Cassandra protocol is peer-to, and uses gossip to maintain a list of surviving or dead nodes.

The PS:GOSSIP algorithm is also called the inverse Entropy (anti-entropy), entropy is a physics concept, represents the chaos, and the inverse entropy is in the chaos seeks the agreement, this fully illustrates the gossip characteristic: in a bounded network, each node randomly communicates with other nodes, After a chaotic communication, the state of all nodes will eventually be agreed upon. Each node may know all other nodes, or only a few neighbor nodes, as long as they can be connected through the network, eventually their state is consistent, of course, this is the characteristics of the spread of the epidemic.

    • High availability and fault tolerance

From a general architecture point of view, the high availability of the system is measured by the ability to satisfy the request. But computers can have a variety of failures, from hardware failures to network outages.    So for a system that needs to be highly available, it must be made up of multiple networked computers, and the software running on it must be able to operate under cluster conditions, with the device able to identify the node failure and to recover the failed interrupt function on the remaining system. The Cassandra is highly available. The failure node can be replaced without disrupting the system, and data can be distributed across multiple data centers to provide better local access performance and prevent the system from being completely paralyzed in the event of an irresistible disaster such as a fire in a data center. Linear scaling because the Cassandra uses the peer-to protocol, it is easy to scale horizontally, and the performance increases linearly. Acid Support Good Cassandra Consistency adjustable: strict consistency ~ final consistency. Also supports lightweight transactions through CAS (compareandset). Without Spof (single point of failure) easy to manage operations Cassandra it is easy to add, delete, replace nodes, and so on.


    • Hash distribution of Cassandra data

    1. The data is partitioned around the ring.

    2. All nodes store data and respond to queries (both readable and writable)

    3. The data is located by the partition key (partition key).

650) this.width=650; "Src=" Http://s5.51cto.com/wyfs02/M01/82/78/wKioL1dWYyuhwJk8AABtLN_kCmQ548.jpg-wh_500x0-wm_3 -wmp_4-s_3940501810.jpg "title=" 6.jpg "alt=" Wkiol1dwyyuhwjk8aabtln_kcmq548.jpg-wh_50 "/>

650) this.width=650; "src=" http://s5.51cto.com/wyfs02/M01/82/78/wKioL1dWY8nwc1vMAACYLbk-LgQ086.jpg "title=" 7.jpg " alt= "Wkiol1dwy8nwc1vmaacylbk-lgq086.jpg"/>

    • Cassandra trade-offs in caps

    1. It is impossible to meet consistency and highly at the same time in satisfying partitioning conditions available

    2. Cross-Datacenter latency also results in inconsistent inconsistencies

    3. Cassandra selected availability and partitioning (Cassandra consistency is adjustable)

Ca:

The primary support for consistency and availability means that you will most likely need to use a two-phase commit distributed transaction. In other words, if the network splits, the system may stop responding.

Ap:

The primary support for availability and partition fault tolerance means that you may have to return less accurate data, but the system will always be available.

650) this.width=650; "Src=" Http://s4.51cto.com/wyfs02/M01/82/78/wKioL1dWan3RVCNyAAFL33zoo_c876.png-wh_500x0-wm_3 -wmp_4-s_2028030975.png "title=" CAP "alt=" Wkiol1dwan3rvcnyaafl33zoo_c876.png-wh_50 "/>


    • Cassandra Replication

The data is automatically copied, and you only need to select the number of replication servers. Define the number of copies we call "replication factor" or RF.

If a machine is down, the lost data is played back through the "prompt handover" (hinted handoff). (hinted handoff will be in the follow-up course)

650) this.width=650; "Src=" Http://s1.51cto.com/wyfs02/M00/82/78/wKioL1dWa5KQccgwAAKuWzrKgXk673.png-wh_500x0-wm_3 -wmp_4-s_1823836270.png "title=" 1.png "alt=" Wkiol1dwa5kqccgwaakuwzrkgxk673.png-wh_50 "/>

    • Cassandra Adjustable Consistency

      Each query can specify a consistency level: All,quorum,one. means how many copies of the response.

Cassandra is often referred to as "final consistency", which is actually a bit misleading. Simply put, Cassandra sacrifices a bit of consistency in exchange for full availability. But Cassandra should actually be described as "tunable consistency", which allows you to easily select the exact consistency and eventual coherence needed to find a balance between the two.

650) this.width=650; "Src=" Http://s4.51cto.com/wyfs02/M00/82/7A/wKiom1dWbDuwf98YAAXQB5rflow258.png-wh_500x0-wm_3 -wmp_4-s_4019820473.png "title=" 1.png "alt=" Wkiom1dwbduwf98yaaxqb5rflow258.png-wh_50 "/>


    • Cassandra Multi-Data center

    1. Typical use case: Clients writes to the local DC, asynchronously replicates to other DCs

    2. Each data center has a replication factor for each keyspace, which means that each data center is highly available

    3. The data center can be physical or logical

650) this.width=650; "Src=" Http://s3.51cto.com/wyfs02/M01/82/7A/wKiom1dWbNbARFx7AAJMJUtBw5s564.png-wh_500x0-wm_3 -wmp_4-s_1792251589.png "title=" 1.png "alt=" Wkiom1dwbnbarfx7aajmjutbw5s564.png-wh_50 "/>


This article is from the Java Architect's Road blog, so be sure to keep this source http://eric100.blog.51cto.com/2535573/1786942

Cassandra Basic Introduction (2)-Cassandra Overview

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.