Cassandra kernel introduction-write operations

Source: Internet
Author: User
Tags cassandra

Reprint: http://www.dbthink.com /? P = 420

We have started to use Cassandra in onespot as our next-generation storage engine (replacing a very large PostgreSQL machine with an EC2 machine cluster). Therefore, I have been using Cassandra for the past few weeks. as I am an infrastructure nerd and firmly believe that I need to understand all aspects of the system stack, I have read some information about how Cassandra works, I also want to write a summary to help later users. as Cassandra's excellent write performance is well known, I think my introduction should begin from this.

The first thing to understand is that Cassandra is better to run on multiple machines. as far as I know, Twitter uses a cluster consisting of 45 machines. running Cassandra on a machine may not make sense, because you will lose the advantage of a system without spof.

The client sends a Write Request to a random Cassandra node. this node writes data to the cluster as a proxy. the cluster of the node is stored in a node "ring". The write will be copied to N nodes according to the replication placement policy (replication placement strategy. when using the rackawarestrategy policy, to ensure reliability and availability, Cassandra divides the replication node into three buckets based on the distance from the replication node to the current node: it is located in the same rack as the current node, in the same data center as the current node, or in different data centers. you have configured Cassandra to write data to N nodes for redundancy. Cassandra writes the first copy to the master node of the data, and the second copy to the node in another data center on the ring, the rest are copied to the machine where the proxy node is located in the same data center. this ensures that a single point of failure does not cause the entire cluster to become unavailable, even if the entire data center is unavailable, the cluster remains available.

Therefore, the Write Request starts from your client to a single random node. This node sends write operations to n Different Nodes Based on the replication and placement policy.Edge Use CasesIn extreme cases (node downtime, new nodes in the cluster, and so on), but the node needs to wait for N nodes to return success and return success to the client.(The description here has a problem. In cassandra, there is another W parameter, that is, the result is returned to the client only after several copies are successfully written ).

Each node receives the write request in the form of a "rowmutation" message. For this message, the node takes the following two actions:

  • Append this change to the commit log for transactional purposes
  • Use this change to modify the memtable structure in one memory

This is the end of its work. this is why Cassandra writes so quickly: the slowest part is the operation of appending change logs to files. unlike relational databases, Cassandra does not modify data stored on disks or update indexes. Therefore, there is no intensive Disk Synchronization operation to block this write operation.

There are also multiple scheduled asynchronous operations:

  • When the memtable structure data is full, it needs to be written to sstable, a disk-based structure, so we won't have much data that only exists in the memory.
  • A temporary sstable group for each given columnfamily will be merged into a large sstable. in this case, the temporary sstable is useless and will be reclaimed as garbage at a certain point in the future.

There are a large numberEdge Use CasesI have not discussed the extreme and complex situations here. I strongly suggest you read at least the descriptions about ubuntureinternals and operations in Cassandra wiki. the distributed system is quite complex, and Cassandra is no exception.

If you find any error or want to add more details, please leave your comments. I am not a developer of Cassandra, so I am sure there must be 1-2 errors hidden.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.