"NoSQL essence" Reading notes, reproduced please indicate the source "Jiq Technical Blog"
As mentioned earlier, the main reason for NoSQL is the need for a database that can run on a large cluster. But migrating from a relational database to a cluster-oriented NoSQL database, one of the biggest changes is a consistent way of thinking . relational databases avoid problems with "strong consistency", which is not the case with NoSQL.
1 Update Consistency
A " write conflict "occurs when two users modify the same data at the same time. The server will certainly process both requests in a specific order, and the write conflict is that the data on which the update request was submitted was not updated by the former.
a single server must have update consistency, consistency can be ensured through concurrency control mechanisms in two ways: pessimistic and optimistic, in order to avoid conflicts, to request locks before writing, and the latter to allow conflicts, the most common being conditional updating, which is to determine whether the current value of the data is the same as the last read before performing the update operation. , the update fails if the update is successful. Regardless of the data can only be processed in one update order, both pessimistic and optimistic methods can be used to avoid write conflicts.
distributed environments Handle write conflicts: in a distributed environment:
1) If it is a "peer copy" distribution model with multiple backups of the same data, the order in which each server processes the update request may be different, resulting in a difference in the values saved after the update. In this case, you can take a way called " write quorum" to avoid the write conflict, even if occurs two conflicting write operations, only one of the operations can be more than half of the nodes recognized.
2) If it is a "master-slave Replication" distribution model, the method is to take all the update operations of a certain data to one node, through the concurrency control mechanism to ensure the consistency of the update.
2 Read Consistency
A read- write conflict occurs when a client reads data during another client write operation . consider an order containing items and freight, a user can update, B users can read, a user update the order will be updated in the item, and then update the freight, happen to be in the two write operation between the B user read out the order, will result in inconsistent data, this is the phenomenon of read and write conflict.
a single server must have read consistency, For example, a relational database uses the concept of "transactions" to encapsulate two writes into a single transaction, ensuring that the data read by other users is either the value before the transaction executes or the value after the transaction executes.
distributed environment handles read-write conflicts: most NoSQL databases, especially aggregations-oriented databases, do not support transactions, but aggregation data units support "Atomic updates," so the aggregation can maintain read consistency, and both commodity items and freight charges can be placed in an order aggregation. Obviously, we can't put all the data into one aggregation, and when it comes to multiple aggregated update operations, read consistency cannot be guaranteed.
In a distributed environment, whether master-slave replication or peer copy (which reads data from different replicas), it raises another new inconsistency, assuming that a hotel's online booking, and finally only one room left, separating A and B couples from both London and Los Angeles on the phone to discuss the room, c in Beijing to set up this room , but this updated data arrives in Los Angeles at a time earlier than the arrival of a copy of London, which causes a and B to open the browser again to see a different result, which is also a "read consistency", and the values are obtained when the same data item is obtained from different replicas. To maintain strong consistency at this point:
1) If the "peer copy" distribution model, a method called " read arbitration" can be taken;
2) If the "Master-slave Replication" distribution model, just read the data from the master node is good.
But this update will eventually propagate to all replicas, which is called " eventual consistency ."
3 Easing consistency constraints
To be truly consistent, you must give up some of the other features of the system, which may be essential, and often require a certain amount of consistency for different scenarios to guarantee other features. For example, a single-server relational database ensures strong consistency through transactions, whereas transactional systems often have the ability to relax "isolation levels", and even some relational databases can completely abandon transactions in pursuit of performance.
cap Theory: This is why there is a need to loosen consistency constraints in the NoSQL world. The basic formulation of this theorem is that in the three attributes of consistency (consistency), availability (availability), partition endurance (Partition tolerance), only two of them can be met at most.
U Consistency: the previously mentioned update consistency, read consistency, replication consistency, and so on.
U availability: Every request received by a fault-free node in a distributed system, whether successful or failed, is bound to be responsive.
U Partitioning Endurance: the occurrence of a brain fissure is still available in the cluster. A brain fissure is a communication failure that causes the cluster to be partitioned into multiple network partitions that cannot communicate with each other.
The essence of cap theory: once a brain fissure occurs, the cluster is not available, and the cost is enormous, so the cluster must meet the "partitioning endurance" that the system is still available when a brain fissure occurs. This is the essence of the CAP, and the cap can be expressed as: "When a distributed system is likely to suffer from brain fissures, we need to make a tradeoff between consistency and availability."
In particular, this is not a two-choice process, usually we will slightly abandon "consistency" to obtain a certain degree of "availability", so that the resulting system of a few of the perfect usability and no perfect consistency, two imperfect combination, but can meet the specific needs.
available = delay reasonable: Looking back at the meaning of usability, instead of considering how to weigh "consistency" and "availability", consider how to weigh "consistency" and "latency", because when a fault-free node processes a request, if it exceeds the maximum tolerable latency, we should discard the operation and assume that the node is unavailable.
Discard Strong consistency: from the previous consistency introduction can be seen, in order to ensure strong consistency, "master-slave distribution" model needs to read and write to the master node, "peer-copy" distribution model needs to be arbitrated, the more nodes involved, the greater the consistency obtained, It can be seen that the two models to ensure strong consistency in the "delay" to make a great sacrifice. In addition, there are many scenarios where "inconsistent updates" or "read inconsistencies" are allowed in distributed environments. Therefore, in a distributed environment, it is often necessary to sacrifice some consistency to get a lower latency.
In addition, latency is related to persistence, and you can discard some of the "persistence" to reduce latency, such as keeping the database running in memory for most of the time, and the update operation writing directly to memory, and periodically writing data changes back to disk.
NOSQL (iv) Easing consistency constraints