Zookeeper_01: zookeeper overview,
Compared with a single program developed on a computer, it is very difficult to make multiple independent programs in an application work together. Developing such an application can easily bring many developers into the logic of how to make multiple programs work collaboratively. In the end, there is no time to better think about and implement their own application logic; or, developers do not pay enough attention to the system logic, but only use a small amount of time to develop a simple and fragile primary coordinator, resulting in a single unreliable failure point.
Collaborate with multiple tasks in a distributed system. A collaboration task is a task that contains multiple processes. This task can be for collaboration or management competition. Collaboration means that multiple processes need to handle some things together, and some processes take some action. Others can continue to work. For example, in a typical master-slave mode, when the slave node is idle, the master node is notified to accept the work. Therefore, the master node is assigned a task to the slave node. Competition is different. It means that two processes cannot process work at the same time, and one process must wait for another process.
Provides an ordered shared storage component.
- Example: master-slave Application
In this architecture, the master node process is responsible for tracking the status of the slave node and the validity of the task, and assigning the task to the slave node.
1. master node crash
If the master node fails to send an error, the system will not be able to assign a new task or re-assign a failed task.
The master node crashes. We need a backup master node. Back up the master node to take over the role of the master node for failover. The new master node must be able to restore to the state when the old master node crashes. You can obtain the recoverability of the master node status through ZooKeeper.
Status recovery is not the only important issue. If the master node is valid, the backup node considers the master node to have crashed. If the load on the master node is too high and the message is delayed, the backup master node will take over as the master node and execute necessary programs. It may eventually start with the master node role and become the second master node.
This situation becomes split-brain: two or more parts of the system start to work independently, resulting in inconsistent overall behavior.
2. Slave node crash
If the slave node crashes, the assigned task cannot be completed.
3. Communication faults
If information exchange cannot be performed between the master node and slave node, the slave node cannot notify the new task to be assigned to it.
1. master node selection
The master node can assign tasks to slave nodes.
2. Crash Detection
The master node must be able to detect slave node crashes or loses connection.
3. Maintenance of group members
The master node must have the ability to guide which slave node can execute tasks.
3. Metadata Management
The master node and slave node must have the ability to save and allocate the status and execution status in a reliable way.