Distributed: distributed architecture details, architecture details
I. Preface
In big data systems, distributed systems have become an unavoidable component. For example, zookeeper has become an industrial standard. Therefore, for Big Data research, we must also study the characteristics of distributed systems.
Ii. Centralized System
A central node composed of one or more computers. The data is centrally stored on this central node, and all business units of the entire system are centrally deployed on this central node, all functions of the system are processed in a centralized manner. It is easy to deploy without considering the distributed collaboration among multiple nodes.
Iii. Distributed System
A distributed system is a system where hardware or software components are distributed on different network computers and communicate and coordinate with each other only through message transmission. It has the following features:
3.1 Distribution
Multiple computers in the distributed system are randomly distributed in the space, and the distribution of machines changes at any time.
3.2 equality
In a distributed system, there is no Master/Slave division for computers, neither controlling the hosts of the entire system nor being controlled from the slave. All computer nodes that constitute the distributed system are equivalent, A copy is a type of redundancy that a distributed system provides for data and services. To provide highly available services, we often process copies of data and services. A data copy refers to the persistence of the same data on different nodes. When the data stored on a node is lost, the data can be read from the copy, this is the most effective way to solve the problem of data loss in distributed systems. A service replica is a service provided by multiple nodes. Each node can accept external requests and process them accordingly.
3.3 concurrency
Multiple nodes in the same distributed system may operate some shared resources concurrently, such as databases or distributed storage, how to efficiently coordinate distributed concurrent operations has become one of the greatest challenges in distributed system architecture and design.
3.4 lack of global clock
A typical distributed system is composed of a series of randomly distributed processes in the space, which have obvious distribution. These processes communicate with each other through message exchange. Therefore, in a distributed system, it is difficult to determine who is the first two times, because the distributed system lacks a global clock sequence control.
3.5 faults always happen
All computers that make up a distributed system may have any form of faults. Any exceptions that are considered during the design phase will occur in the actual operation of the system.
Iv. Distributed Environment Problems
4.1 communication exceptions
From centralized to distributed, network factors will inevitably be introduced, and the network itself is not reliable, so it introduces additional problems. The network communication between nodes in the distributed system can be performed normally, and the delay is much higher than that of a single machine. In the process of sending and receiving messages, message loss and delay become very common.
4.2 network Partition
When a network exception occurs, the network latency between some nodes in the distributed system increases constantly. In the end, only some nodes in the distributed dashboard can communicate normally, however, some nodes cannot. This phenomenon is called network partitioning. When network partitioning occurs, the distributed system may have local small clusters. In extreme cases, these local small clusters will independently complete functions that originally required the entire Distributed System to complete, including data transaction processing, which poses a great challenge to distributed consistency.
4.3 tri-state
Due to various network problems, each request and Response of a distributed system has a unique three-State concept: Success, failure, and timeout. When the network is abnormal, timeout may occur. Generally, the following two situations occur: 1. due to network reasons, the request is not successfully sent to the receiver, but the message is lost during the sending process. 2. After the request is successfully accepted by the receiver and processed, the message is lost when the response is fed back to the sender.
4.4 node faults
Node failure refers to the failure or freezing of the server nodes that constitute a distributed system. Each node may be faulty and coal may occur.
5. From ACID to CAP/BASE
5.1 ACID
A transaction is a program execution unit consisting of a series of operations to access and update data in the system. In a narrow sense, food refers to a database transaction. On the one hand, when multiple applications access the database concurrently, food can provide an isolation method between these applications to prevent mutual operation interference. On the other hand, food provides a method for restoring the database operation sequence from failure to normal, and also provides a way for the database to maintain data consistency even in Yichang. Transactions are atomic, Consistency, Isolation, and Durability.
① Atomicity indicates that a transaction must be an atomic operation sequence unit. operations contained in a transaction can only appear in one of the following two States during one execution. All operations are successfully executed, do not execute all. Any operation failure will cause the entire transaction to fail, and other operations that have been executed will be revoked and rolled back. Only when all operations are successful can the entire transaction be completed successfully.
② Consistency means that the execution of a transaction cannot undermine the integrity and consistency of the database data. Before and after a transaction is executed, the database must be in a consistent state, that is, the result of transaction execution must be that the database changes from one consistent state to another consistent State. Therefore, when the database only contains the results of successful transaction commit, it can be said that the database is in a consistent state, if the database system fails during operation, some transactions are forced to be interrupted before they are completed. Some of the modifications made to the database by these unfinished transactions have been written to the physical database, the database is in an incorrect or inconsistent state.
③ Isolation means that in a concurrent environment, concurrent transactions are isolated from each other, and the execution of a transaction cannot be disturbed by other transactions, that is, when different transactions concurrently operate the same data, each transaction has its own complete data space, that is, the operations and data used within a transaction are isolated from other concurrent transactions, and each transaction that is executed concurrently cannot interfere with each other.
④ Durability means that once a transaction is committed, its status changes to the corresponding data in the database should be permanent, that is, once a transaction is successfully completed, the updates made to the database must be permanently saved. Even if the system crashes or goes down, as long as the database can be restarted, the transaction will be restored to the State at the end of the transaction.
5.2 distributed transactions
Distributed transactions refer to the transaction participants, servers supporting transactions, resource servers, and transaction managers on different nodes of the distributed system, generally, a distributed transaction involves operations on multiple data sources or business systems. A distributed transaction can be viewed as composed of multiple distributed operation sequences. Generally, this distributed operation sequence is called a subtransaction. In distributed transactions, the execution of each sub-transaction is distributed. Therefore, it is extremely complicated to implement a distributed transaction processing system that can ensure ACID properties.
5.3 CAP
CAP theory tells us that a distributed system cannot meet the three basic requirements of consistency, availability, and partition fault tolerance at the same time, and can only meet two of them at most.
① Consistency: determines whether data can be consistent among multiple replicas. When a system performs an update operation in the state of data consistency, ensure that the system data is still consistent. For a system that distributes data copies on different distributed nodes, if the data at the first node is updated successfully, however, the data on the second node is not updated accordingly. Therefore, when reading data on the second node, the old data (dirty data) is still obtained ), this is a typical case of distributed data inconsistency. In a distributed system, if the update operation for a data item is successful, all users can read the latest expired value, such a system is considered to be highly consistent.
② Availability means that the services provided by the system must always be available, and each operation request of the user can always return results within a limited period of time.
③ Partition Fault Tolerance: In case of any network partition failure, the distributed system still needs to provide services that meet the consistency and availability, unless the entire network environment is faulty.
5.4 BASE
BASE is short for three phrases: Basically Available, Soft state, and Eventually consistent.
① Basic Availability refers to the loss of some availability allowed by the distributed system in the case of unpredictable faults, such as loss of response time or functional loss.
② A weak State, also known as a soft state, allows two data in the system to exist in an intermediate state and assumes that the existence of this intermediate state does not affect the overall availability of the system, that is, the process of data synchronization between data copies of different nodes is delayed.
③ Eventual consistency refers to the synchronization of all data copies in the system, which can eventually reach a consistent state after a while, therefore, the essence of final consistency is that the system must ensure data consistency, without the need to ensure strong consistency of system data in real time.