First, distributed coordination technology
Before we introduce zookeeper to you, we introduce a technology-distributed coordination technology. So what is distributed coordination technology. So let me tell you, in fact, distributed coordination technology is mainly used to solve the distributed environment among the multiple processes of synchronization control, so that they have an orderly access to a critical resource, to prevent the consequences of "dirty data." At this point, someone may say this simple, write a scheduling algorithm easy to solve. People who say this may not know much about distributed systems, so this misunderstanding occurs. If these processes are all running on a single machine, it is relatively good to do, the problem is that he is in a distributed environment, then the problem comes again, what is distributed it. This one or two sentence I also can not say clearly, but I have drawn a picture for everyone to help you understand this aspect of the content, if you feel wrong to make bricks, let us look at this picture, as shown in Figure 1.1.
Figure 1.1 Distributed system diagram
For you to analyze this picture, in this diagram there are three machines, each machine running an application. Then we connect the three machines through the network to form a system to serve the user, the architecture of the system is transparent to the user, and he can't feel the architecture of my system. Then we can call this system a distributed system .
So let's take a look at how the process is scheduled in this distributed system, I assume that a resource is mounted on the first machine, and then the three physically distributed processes compete for the resource, but we don't want them to have access at the same time, and we need a coordinator . To get them to visit this resource in an orderly way. This coordinator is the one we often refer to , such as "Process 1" when using the resource, the first to obtain the lock, "process 1" to obtain the lock will be exclusive to the resource, so that other processes can not access the resource, "Process 1" After using the resource, we can release the lock and let the other process get the lock, then through this locking mechanism, we will ensure that multiple processes in the distributed system can access the critical resource in an orderly manner. So we're putting this lock in a distributed environment called a distributed lock . This distributed lock is the core of the implementation of our distributed coordination technology , so how to achieve this distributed, that is what we want to talk about later. the realization of the distributed type lock
Well we know that in order to prevent interference between multiple processes in a distributed system, we need a distributed coordination technique to dispatch these processes. The core of this distributed coordination technique is to implement this split -type lock . So how does this lock come true? This is actually relatively difficult to achieve. 1.1 issues facing the
After looking at the distributed environment shown in Figure 1.1, one might feel that this is not difficult. Is the original on the same machine on the process of scheduling the original language, through the network implementation in the distributed environment. Yes, it can be said on the surface. But the problem is in the network, in distributed systems, all assumptions on the same machine do not exist: Because the network is unreliable.
For example, on the same machine, if your call to a service succeeds, that is success, and if the call fails, such as throwing an exception, the call fails. However, in a distributed environment, because the network is unreliable, your call to a service fails to indicate that it must have failed, that it might have succeeded, but that the response has failed to return. Also, A and B are going to call the C service, in the time a also call some, b after the call, then the final result is not a request for a must precede B to arrive. These assumptions on the same machine, we all have to rethink, and we have to think about how these issues affect our design and coding. Also, in a distributed environment, we tend to deploy multiple services in order to improve reliability, but how to achieve consistency across multiple sets of services, which is relatively easy to synchronize between multiple processes on the same machine, is a big challenge in a distributed environment.
So distributed coordination is much more difficult than scheduling multiple processes on the same machine, and if you develop a separate coordinator for each distributed application. On the one hand, it is wasteful to write the coordination program repeatedly, and it is difficult to form a universal and good scalability coordinator. On the other hand, the coordination program overhead is relatively large, it will affect the original performance of the system. Therefore, a high-reliability, high-availability universal coordination mechanism is urgently needed to coordinate distributed applications. 1.2 Implementation of distributed locks
At present, in the distributed coordination technology is done better is Google's chubby and Apache zookeeper they are the implementation of distributed locks. Some people will ask that since the chubby why still want to make a zookeeper, don't chubby do well enough. Not so, mainly chbby is non-open source, Google from home. Later, Yahoo imitated Chubby developed the zookeeper, also implemented a similar distributed lock function, and zookeeper as an open source program donated to Apache, so you can use zookeeper to provide the lock service. Moreover, the reliability and usability of the distributed field are proven by theory and practice. So when we build some distributed systems, we can build our systems as a starting point for this type of system, which will save a lot of money and fewer bugs.
iii. Overview of Zookeeper
Zookeeper is a highly available, high-performance, consistent open source coordination service designed for distributed applications that provides a basic service: Distributed lock Service . Due to zookeeper's open source features, our developers later explored other ways of using distributed locks: configuration maintenance, group services, distributed Message Queuing , distributed notification/coordination , and so on. Note:The performance characteristics of ZooKeeper determine that it can be used in large, distributed systems. In terms of reliability , it does not crash because of a node error. In addition, its strict sequential access control means that complex control primitives can be applied on the client. Zookeeper's commitment to consistency, usability, and fault tolerance is also zookeeper's success, and all of its success is inextricably linked to the protocol--zab protocol it employs, which will be described later.
There are so many services, such as distributed locks, configuration maintenance, group services, and so on, how are they implemented, and I believe this is what people care about. Zookeeper in implementing these services, it first designs a new data structure--znode, and then defines some primitives based on the data structure, that is, some operations on the data structure. With these data structures and primitives is not enough, because our zookeeper is working in a distributed environment, our services are sent to our distributed applications via messages in the form of a network, so we also need a notification mechanism --watcher mechanism. So to summarize, the service provided by zookeeper mainly through: data structure + primitive language +watcher mechanism, three parts to achieve. Then I will from these three aspects, to introduce you to zookeeper. Iv. Zookeeper Data Model 4.1 Zookeeper data Model Znode
Zookeeper has a hierarchical namespace, which is very similar to the standard file system, as shown in Figure 3.1 below.
Figure 4.1 Zookeeper data Model and file system directory tree
We can see from the graph that zookeeper's data model, structurally similar to the standard file system, is based on this tree hierarchy, where each node in the zookeeper tree is called-znode. Like the file system's directory tree, each node in the zookeeper tree can have child nodes. But there are also differences:
(1) Citation method
Zonde is referenced by a path, like a file path in Unix. The paths must be absolute, so they must be preceded by a slash character. In addition, they must be unique, meaning that each path has only one representation, so these paths cannot be changed. In zookeeper, the path consists of a Unicode string and has some limitations. The string "/zookeeper" is used to hold management information, such as critical quota information.
(2) Znode structure
The Znode in the Zookeeper namespace combines both file and directory features. Data structures such as data, meta-information, ACLs, timestamps, and so on are maintained as files, and can be used as part of a path identity as a directory. Each node in the diagram is called a znode. Each znode consists of 3 parts:
① Stat: This is a status message that describes the Znode version, permissions, and other information
② : Data associated with the Znode
③ Children: Sub-node under this Znode
Zookeeper Although it can be associated with some data, but is not designed as a regular database or big data storage, instead, it is used to manage scheduling data , such as distributed applications in the configuration file information, state information, collection location and so on. The common feature of these data is that they are all very small data, usually in kilobytes per unit of size. Zookeeper servers and clients are designed to strictly check and limit the data size of each znode to up to 1M, but should be much smaller than this value in general usage.
(3) data access
The data stored by each node in the zookeeper is atomically manipulated . This means that the read operation will get all the data related to the node, and the write will replace all the data of the node. In addition, each node has its own ACL (access control list), which defines the user's permissions, which define the actions that a particular user can perform on the target node.
(4) node type
There are two types of nodes in zookeeper, temporary and permanent nodes , respectively. The type of the node is determined when it is created and cannot be changed.
① Temporary node: the node's life cycle relies on the session in which it was created. Once the session has ended, the temporary node will be automatically deleted, and of course it can be deleted manually. Although each temporary znode is bound to a client session, they are still visible to all clients. In addition, zookeeper temporary nodes are not allowed to have child nodes.
② Permanent node: the node's life cycle is not dependent on the session, and can only be deleted if the client displays a delete operation.
(5) sequential node
When creating a Znode, the user can request that an incremented count be added at the end of the zookeeper path. This count is unique for the parent node of this node , in the form of "%10d" (10 digits, with no numeric digits supplemented by 0, for example, "0000000001"). When the count value is greater than 232-1, the counter overflows.
(6) Observation
The client can set watch on the node, which we call the monitor . When the node state changes (Znode, delete, change) will trigger the action of watch. When watch is triggered, zookeeper will send to the client and send only one notification, because watch can only be triggered once, which reduces network traffic. 4.2 Time in Zookeeper
Zookeeper has several forms of recording time, which contain the following main attributes:
(1) Zxid
Each operation that causes the Zookeeper node state to change will cause the node to receive a timestamp in the ZXID format, and the timestamp is globally ordered. That is, in other words, each change to the node will produce a unique zxid. If the value of ZXID1 is less than the value of ZXID2, the event that corresponds to ZXID1 occurs before the event that corresponds to ZXID2. In fact, each node maintainer of zookeeper has three ZXID values, which are: Czxid, Mzxid, Pzxid.
①czxid: Is the timestamp of the ZXID format that corresponds to the creation time of the node.
②mzxid: Is the timestamp of the ZXID format that corresponds to the modification time of the node.
The implementation of ZXID is a 64-digit number, and its high 32-bit is the epoch used to identify whether the leader relationship has changed, and each time a leader is chosen, it will have a new epoch. A low 32-bit is an incrementing count . (2) version number
Each operation on the node will cause the version number of this node to increase. Each node maintains three version numbers, respectively:
①version: node data version number
②cversion: child node version number
③aversion: The ACL version number owned by the node 4.3 zookeeper node Properties
From the previous introduction, we can see that a node itself has many important properties that represent its state, as shown in the following figure.
Figure 4.2 Znode Node Property structure
v. operation in zookeeper service
There are 9 basic operations in zookeeper, as shown in the following figure:
Figure 5.1 Zookeeper class method description
There is a limit to the update zookeeper operation. Delete or SetData must explicitly update the version number of the Znode that we can call exists find. If the version number does not match, the update will fail.
The update zookeeper operation is non-blocking. So if the client loses an update (because another process is updating the znode at the same time), he can choose to retry or do something else without blocking the execution of other processes.
Although zookeeper can be viewed as a file system, it is convenient to discard some file systems to manipulate the primitives. Because the files are very small and read-write, they do not need to be opened, closed, or ground-seeking operations. VI. Watch Trigger
(1) Watch Overview
Zookeeper can set watch for all read operations , including: Exists (), GetChildren (), and GetData (). The Watch event is a one -time trigger that triggers the event for watch on this object when the object state of watch changes. The Watch event is sent asynchronously to the client, and zookeeper provides an orderly consistency guarantee for the watch mechanism. Theoretically, the client receives the watch event more quickly than it can see when the Watch object changes state.
(2) Watch type
The watch that zookeeper manages can be divided into two categories:
① Data watches:getData and exists are responsible for setting up data watch
② Kids Watch (Child watches):GetChildren is responsible for setting child watch
We can set different watch by manipulating the returned data :
①getdata and exists: returns data information about the node
②getchildren: return child list
So
① A successful SetData operation will trigger the Znode data watch
② A successful create Operation will trigger the Znode data watch and the child Watch
③ A successful delete operation will trigger the Znode data watch and the child Watch
(3) Watch registration and Trigger
Figure 6.1 Watch set operation and corresponding trigger as shown in the figure below:
The watch on the ① exists operation is triggered when the monitored Znode is created , deleted , or data updated .
Watch on the ② GetData operation is triggered when the monitored Znode is deleted or the data is updated . Cannot be triggered when it is created, because only znode must exist, GetData operation will succeed.
The watch on the ③ GetChildren operation is triggered when the child node of the monitored Znode is created or deleted , or the znode itself is deleted . You can distinguish between znode by looking at the Watch event type, or by deleting the child node: Nodedelete means Znode is deleted, nodedeletedchanged indicates that the child node is deleted.
Watch is maintained locally by the zookeeper server to which the client is connected, so watch can be easily set up, managed, and dispatched. When a client connects to a new server, any session events will likely trigger watch. In addition, watch will not be received when disconnected from the server. However, when a client re-establishes the connection, any previously registered watch will be re-registered.
(4) points to note
Zookeeper Watch actually handles two types of events:
① Connection Status Events (Type=none, Path=null)
This type of event does not require registration, and we do not need to trigger it continuously, we just have to deal with it.
② Node Events
node creation, deletion, modification of data. It is one time trigger, we need to keep registering the trigger, and the event may be lost.
The above 2 types of events are handled in watch, which is the overloaded process (event event)
The triggering of node events, which are handled by function Exists,getdata or GetChildren, has a double effect:
① Registration Trigger event
functions of the ② function itself
Functions of the function itself can be implemented with asynchronous callback functions, overloading the function of the function itself in the process of Processresult (). VII. Application Examples of zookeeper
In order to facilitate everyone to understand zookeeper, here is to give an example, see how zookeeper is the implementation of his services, I zookeeper provide basic services distributed lock for example. 7.1 Distributed lock application Scenarios
In the Distributed lock service, one of the most typical application scenarios is to solve a single point of failure in a distributed system by the master election of the cluster. What is a single point of failure in a distributed system: usually the distributed system adopts the master-slave mode, which is a main control machine connecting multiple processing nodes. The primary node is responsible for distributing the task, from the node is responsible for processing the task, when our master node fails, then the entire system is paralyzed, then we called this fault single point of failure. As shown in Figures 7.1 and 7.2 below:
Figure 7.1 Master-slave mode Distributed System Figure 7.2 Single point of failure
7.2 Legacy Solutions
The traditional way is to use a standby node, the standby node periodically sends a PING packet to the current primary node, the primary node receives a PING packet and sends a reply ACK to the standby node, and when the standby node receives a reply, it considers the current master to be alive, allowing him to continue serving. As shown in Figure 7.3:
Figure 7.3 Traditional Solutions
When the primary node is hung up, the backup node is not receiving a reply, and then he thinks the master node is hung to replace him as the master node as shown in Figure 7.4 below:
Figure 7.4 Traditional Solutions
But this way is to have a hidden trouble, is the network problem, to see what the consequences of a network problem, as shown in Figure 7.5:
Figure 7.5 Network failure
That is, our main node is not hanging, just in response to the network failure, so that our backup node also received a reply, it will be considered the primary node is hung, and then the backup node to start his master instance, so that our distributed system has two primary nodes that is--- Dual Master, after the advent of master, our slave node will be part of what it does to report to the master node, part of the report to the slave node, so that the service is all messed up. To prevent this, we introduced ZooKeeper, which does not avoid network failures, but it guarantees that there is only one master at a time. Let me see how the zookeeper is implemented. 7.3 Zookeeper Solutions
(1) Master boot
After the introduction of zookeeper we started two primary nodes, "Master node-A" and "master-B" after they started, they all went to zookeeper to register a node. We assume that the "master node-A" Lock Register node is "master-00001", "Primary node-B" Registered node is "master-00002", after the registration of the election, the number of the smallest node will win in the election to obtain a lock into the master node, that is, our "Master node-A" The lock will be acquired as the primary node, and then "master-B" will be blocked as an alternate node. In this way, the scheduling of two master processes is completed.
Figure 7.6 ZooKeeper Master Election
(2) Master fault
If "Master node-a" hangs, the node that he registers will be automatically deleted, zookeeper will automatically perceive the node changes and then issue an election again, when "master-B" will win the election, instead of "Master node-A" become the master node.
Figure 7.7 ZooKeeper Master Election
(3) Master recovery
Figure 7.8 ZooKeeper Master election
If the master node resumes, he will register a node again to zookeeper, when his registered node will be "master-00003", zookeeper will be aware of the changes in the node to start the election again, this time "main node-B" in the election will continue to win the "Master Node", Master node-A will serve as the standby node.