ACID, Data Replication, caps and base

Source: Internet
Author: User

ACID, Data Replication, caps and base

Http://www.cnblogs.com/hustcat/archive/2010/09/07/1820970.html

ACID

In a database system, transactions have an acid 4 attribute (Jim Gray has a detailed discussion of transactions in transaction processing: Concepts and techniques).

(1) atomicity (atomicity): A transaction is an atomic manipulation unit whose modifications to the data are either all executed or not executed.

(2) Consistency (consistent): data must be in a consistent state at the beginning and completion of a transaction. This means that all relevant data rules must be applied to the modification of the transaction to preserve the integrity of the data; At the end of the transaction, all internal data structures, such as B-tree indexes or doubly linked lists, must also be correct.

(3) Isolation (Isolation): The database system provides a certain isolation mechanism to ensure that transactions are performed in a "stand-alone" environment that is not affected by external concurrency operations. This means that the intermediate state in the transaction process is not visible to the outside, and vice versa.

(4) Persistence (durable): After a transaction is complete, it changes the data to be permanent, even if a system failure occurs.

For transactions of a single node, the database is guaranteed the acid characteristics of the transaction through concurrency control (two-phase blocking, phase-locking or multiple-version, multiversioning) and recovery Mechanisms (log technology). For distributed transactions that span multiple nodes, the acid of the transaction is guaranteed through a two-phase commit protocol (phase commiting).

It can be said that the database system is accompanied by the needs of the financial industry and rapid development. For the financial industry, usability and performance are not the most important, and consistency is the most important, the user can tolerate system failure and stop the service, but can not tolerate the money on the account for no reason to reduce (of course, without undue increase is possible). And strong consistency of affairs is the fundamental guarantee of all this.

Data Replication

Data replication is a category of distributed computing, it is not confined to the database, but mainly refers to the replication of distributed database.

In the distributed database system composed of multiple replicas, the differences between the transaction characteristics and the single database system are mainly manifested in two aspects of atomicity and consistency. In terms of atomicity, all operations that require the same distributed transaction are either committed on all relevant replicas or rolled back, that is, in addition to guaranteeing the atomicity of the original local transaction, the atomicity of the global transaction needs to be controlled, and in terms of consistency, a single copy consistency is required between multiple replicas.

After nearly 20 years of research, a variety of replication protocols have been proposed for the core problems of the two replication protocols, which are the atomicity and consistency of distributed transactions. These protocols have significant differences in both external functions and internal implementations. Accordingly, we can classify these two big aspects.

From the perspective of external function, according to the literature [1], the location and time of transaction execution can be classified from two aspects. The place from which transactions are executed can be divided into two categories: Master-slave (priamry/copy) mode and update all (update-anywhere) mode.

The process of the former is usually to specify only one primary node in the system to accept the update request, after the transaction operation is completed, the operation is broadcast to other copy nodes before or after the transaction commits.

The latter is slightly more complex to process, and any replica in the system has the same status, and can receive update requests to propagate the update of individual nodes to other replica nodes before detecting transaction conflicts, transaction commits, or later.

Primary/copy mode concurrency control is relatively simple, by the Primary local transaction control can be implemented, the atomic implementation of the transaction is relatively simple, generally by the Primary node as a coordination node to achieve. However, the flaw is also obvious: only a single node provides update request processing power, and for update-intensive types of applications, such as OLTP, it is easy to create a single-point performance bottleneck. The Update-anywhere method is complementary to each other, which can increase the throughput rate of the transaction through multipoint, but the complicated concurrency control and atomicity problem between multiple distributed transactions is followed.

From the point of view of a transaction submission, it can be divided into positive (Eager) and negative (Lazy) two categories. The difference is that the former propagates the update before the transaction commits, and the latter propagates the transaction operations to the other replicas after committing. In fact, the former is usually meaningless synchronous replication (synchronous replication), which is meaningless asynchronous replication (asynchronous replication).

The advantage of asynchronous replication is that it can improve responsiveness, but at the expense of consistency, the algorithms that implement such protocols generally require additional compensation mechanisms. The advantage of synchronous replication is that it guarantees consistency (typically through a two-phase commit protocol), but with greater overhead and poor usability (see the CAP section), which leads to more conflicts and deadlocks. It is worth mentioning that the Lazy+primary/copy replication protocol is very practical in the actual production environment, and MySQL replication actually belongs to this.

CAP

At the 2000 PODC (Principles of distributed Computing) Conference, Brewer presented the famous cap theory. In 2002, Seth Gilbert and Nancy Lynch proved the theory. CAP refers to: consistency, availability, and partition tolerance.

(1) Consistency (consistency): Consistency is the atomic nature of the data, which is guaranteed by transactions in the classic database, and when the transaction is complete, whether it is successful or rolled back, the data will be in a consistent state. In a distributed environment, consistency means that data from multiple nodes is consistent.

(2) Availability (availability): Availability means that the service can always be guaranteed to be available, and when a user makes a request, the service can return the result within a limited time.

(3) Partition tolerance (partition fault tolerance): Partition refers to the partition of the network. In general, the key data and services are located in different IDC.

The CAP theory tells us that a distributed system cannot meet the three requirements of consistency, availability, and partition fault tolerance at the same time, with a maximum of two points in three features. Three can not be taken into account, this so-called fish and bear Paw can not have both! In the case of distributed Data system, partition fault tolerance is the basic requirement, otherwise it is not called distributed system. Therefore, architects should not waste their energies on the design of a perfect distributed system that can satisfy all three, but trade-offs should be made. This also means that the design process of distributed systems, that is, based on the business characteristics of C (consistency) and a (availability) to find a balance between the process, requires architects to really understand the system requirements, grasp the business characteristics.

BASE

Base comes from the practice of the Internet e-commerce field, it is based on the CAP theory evolved gradually, the core idea is even can not achieve strong consistency (strong consistency), but according to the application characteristics of the appropriate way (log, retransmission, etc.) to achieve final consistency ( Eventual consistency) effect. Base is a shorthand for the three phrases of basically Available, Soft state, eventually consistent, and is an extension of C & A in the cap. The meaning of base:

(1) Basically Available: basic available;

(2) Soft-state: Soft state/flexible transaction, that is, the state can have a period of time of the different steps;

(3) Eventual consistency: final consistency;

Base is anti-acid, which is completely different from the acid model, sacrificing strong consistency, obtaining basic availability and flexible reliability, and requiring eventual consistency.

CAP and base theory is the theoretical basis of the current NoSQL that is very popular in the Internet field.

Main references

[1] J.n.gray, P.helland,and a.d.s.p.o ' Neil. The dangers of replication and a solution. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pages 173–182,montreal, Canada, June 1996.SIGMOD.

[2] Gilbert, S., Lynch, N. 2002. Brewer ' s conjecture and the feasibility of consistent, available, partition-tolerant Web services. ACM sigact News 33 (2).

[3]http://www.allthingsdistributed.com/2008/12/eventually_consistent.html

[4]http://queue.acm.org/detail.cfm?id=1394128

[5]http://en.wikipedia.org/wiki/acid

ACID, Data Replication, caps and base

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.