Acid, data replication, cap, and base

Source: Internet
Author: User

Acid

In the database transfer system, transactions have acid attributes (Jim Gray discusses transactions in detail in transaction processing: Concepts and technology ).

(1) atomicity (Atomicity): a transaction is an atomic operation unit, and its modifications to the data are either all executed or all are not executed.

(2) consistent: data must be consistent at the beginning and end of the transaction. This means that all relevant data rules must be applied to transaction modifications to maintain data integrity. At the end of the transaction, all internal data structures (such as B-tree indexes or two-way linked lists) it must also be correct.

(3) isolation: the database system provides a certain isolation mechanism to ensure that transactions are executed in an "independent" environment not affected by external concurrent operations. This means that the intermediate state in the transaction processing process is invisible to the outside, and vice versa.

(4) durable: after the transaction is completed, the modification to the data is permanent and can be maintained even if a system failure occurs.

For transactions on a single node, the database ensures ACID properties of transactions through Concurrency Control (two-phase blocking, two phase locking or multi-version, multiversioning) and recovery mechanism (log technology. For distributed transactions across multiple nodes, two-phase commit protocol (two phase commiting) is used to ensure acid of transactions.

It can be said that the database system is rapidly developing along with the needs of the financial industry. For the financial industry, availability and performance are not the most important, while consistency is the most important. Users can tolerate system faults and stop services, however, the amount of money in the account cannot be reduced without reason (of course, it is acceptable to increase without reason ). A highly consistent transaction is the fundamental guarantee of all this.

Data Replication

Data replication is a type of distributed computing. It is not limited to databases, but mainly refers to the replication of distributed databases.

In a distributed database system composed of multiple replicas, the difference between its transaction characteristics and a single database system is mainly manifested in two aspects: atomicity and consistency. In terms atomicity, all operations of the same distributed transaction must be either committed or rolled back on all related replicas, that is, except to ensure the atomicity of the original local transaction, we also need to control the atomicity of global transactions. In terms of consistency, the consistency between multiple replicas must be ensured for a single copy.

For the core issues in the two replication protocols, atomicity and consistency of distributed transactions, after nearly 20 years of research, we have proposed a variety of replication protocols. These protocols differ greatly in both external functions and internal implementations. Therefore, we can classify and describe these two major aspects.

From the perspective of external functions, we can classify the transaction execution location and time. From where transactions are executed, they can be divided into two types: The Master/Slave Mode (priamry/copy) and the update all (Update-anywhere) mode.

The processing process of the former is generally to specify only one primary node in the system to accept the update request. After the transaction operation is completed, the operation is broadcast to other copy nodes before or after the transaction is committed.

The processing process of the latter is a little complicated. Any copy in the system has the same status and can receive the update request, update of each node is propagated to other replica nodes before or after transaction conflicts are detected, and transactions are committed.

The concurrency control in Primary/copy mode is relatively simple. It can be implemented by the local transaction control of primary, and the atomic Implementation of the transaction is also relatively simple. It is generally implemented by the primary node as the coordination node. However, the defect is also obvious: Only one node provides the update request processing capability. for intensive update applications, such as OLTP, it is easy to form a single point of performance bottleneck. The update-anywhere method complements each other and can improve transaction throughput through multiple points. However, the complicated concurrency control and atomic problems between multiple distributed transactions follow.

From the perspective of transaction commit time, there are two types: positive (EAGER) and negative (lazy. The difference is that the former transmits updates before the transaction is committed, and the latter only propagates transaction operations to other copies after the transaction is committed. In fact, the former is usually unnecessary synchronous replication (synchronous replication), and the latter is meaningless asynchronous replication (asynchronous replication ).

The advantage of asynchronous replication is that it can increase the response speed, but at the expense of consistency. Generally, algorithms that implement this type of Protocol require additional compensation mechanisms. The advantage of synchronous replication is that it can ensure consistency (generally through the two-phase commit protocol), but the overhead and availability are not good (see the Cap Section ), it brings about more problems such as conflicts and deadlocks. It is worth mentioning that the lazy + primary/Copy Replication protocol is very practical in the actual production environment, and MySQL replication actually belongs to this type.

Cap

At the 2000 podc (Principles of distributed computing) Conference, Brewer proposed the famous cap theory. In 2002, Seth Gilbert and Nancy Lynch proved this theory. Cap indicates consistency, availability, and partition tolerance.

(1) consistency (consistency): consistency refers to the atomicity of data. This Atomicity is guaranteed by transactions in a classic database. When a transaction is completed, data is consistent whether it is successful or rolled back. In a distributed environment, consistency means whether the data of multiple nodes is consistent.

(2) Availability: availability means that the service can always be guaranteed to be available. When a user sends a request, the service can return results within a limited period of time.

(3) Partition tolerance (partition fault tolerance): partition refers to the network partition. Generally, key data and services are located in different IDCs.

The CAP theory tells us that a distributed system cannot meet the consistency, availability, and partition Fault Tolerance requirements at the same time. Up to two of the three factors can be met at the same time. The three cannot take into account. The so-called fish and the bear's paw cannot have both sides! For Distributed Data Systems, partition fault tolerance is a basic requirement, otherwise it will not be called a distributed system. Therefore, architects should not waste their energy on designing a perfect distributed system that meets both of them, but should make trade-offs. This also means that the design process of the distributed system, that is, the process of finding a balance between C (consistency) and a (availability) based on business characteristics, requires the architect to truly understand the system requirements, GRASP business characteristics.

Base

Base is a practice in the e-commerce field of the Internet. It is gradually evolved based on the CAP theory. The core idea is that even if it cannot reach strong consistency ), however, eventual consistency can be achieved through appropriate methods based on application characteristics. Base is short for the three phrases basically available, soft state, and eventually consistent. It is an extension of C & A in CAP. Meaning of base:

(1) basically available: Basic available;

(2) soft-state: Soft State/flexible transaction, that is, the State may not be synchronized for a period of time;

(3) eventual consistency: final consistency;

Base is anti-acid, which is completely different from the acid model. It sacrifices strong consistency, obtains basic availability and flexible reliability, and requires final consistency. Cap and base theories are the theoretical basis of nosql, which is currently very popular in the Internet field.

Acid, data replication, cap, and base

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.