In cassandra, Data Consistency refers to the update and synchronization of data rows on each replication node (replicas. By providing tunable consistency, Cassandra extends the concept of eventual consistency. For any read or write operations, the client determines the degree of data consistency (Per-request consistency) based on the response time and data accuracy requirements ).
In addition to tunable consistency, Cassandra also provides several built-in repair mechanisms to ensure data consistency across replicas.
Cap theorem (Theorem) • Consistency: All nodes view the same data at the same time.
• Availability: ensures that each request receives a successful or failed response
• Partition Tolerance: The system runs continuously regardless of accidental message loss.
In a distributed storage system, it is impossible to satisfy all the features at the same time. Up to two of three can be met. Cassandra allows users to determine the CAP features of each request and select between consistency, performance, and error tolerance.
Write operation consistency (about write consistency) The consistency of write operations determines how many nodes must be successfully written (commitlog and memtable) before the write operation is successfully confirmed to the client ).
Suppose: r = nodes read, W = node written, n = replication factor, q = quorum = n/2 + 1.
Any: at least one node is successfully written, even if it is a hinted Handoff
One: at least one replication node is successfully written)
Quorum: At least Q replication nodes (q = n/2 + 1) are successfully written)
Local_quorum: At least Q replication nodes are successfully written to the current DC where the Coordinator node is located.
Each_quorum: successfully writes Q replication nodes to each DC.
ALL: successfully written to each replication node in the cluster.
Any provides absolute write availability, but at the expense of consistency (with the worst consistency ), because it is not guaranteed when the written data will be readable (depending on how long the replicas server has been running ). Any can only be used for write operations. The write operation is sent to any node, and then delivered to the target node through the hinted Handoff Mechanism. Any is applicable to data programs that do not want to lose write operations, do not care about data consistency and sending latency.
All has the strongest consistency, but the lowest availability.
Quorum is a compromise with strong consistency, but it also allows a certain degree of failure. For example, if replication_factor is 3, quorum is 2 (failure on a replica is allowed ). If replication_factor is 6, quorum is 4 (failure on two replicas is allowed ).
Unlike normal columns, counter write operations require a read operation in the background to ensure that the distributed counter values are consistent across replicas. If the write operation with consistency level = one is used, the implicit read operation will not delay the write operation. Therefore, counter usually uses the consistency level to be one.
Read operation consistency (about read consistency) The consistency of read Operations determines how many replicas must return results before returning the results to the client. Suppose: r = nodes read, W = node written, n = replication factor, q = quorum = n/2 + 1.
One: return results from the nearest replication node (determined by Snitch ). By default, read repair runs in the background to make other nodes consistent.
Quorum: After Q (q = n/2 + 1) replication nodes return data, a record with the latest timestamp is returned to the client.
Local_quorum: After the Q replica node of the Current DC where the Coordinator node is located returns data, the record with the latest timestamp is returned to the client.
Each_quorum: after each DC returns data from Q replication nodes, a record with the latest timestamp is returned to the client.
ALL: After data is returned by each replication node in the cluster, a record with the latest timestamp is returned to the client. Any node failure will cause the read operation to fail.
Data Consistency in cassandra