Observer: Make zookeeper more scalable one, zookeeper Observer1.1 zookeeper role
After the introduction, I think we all know that in the zookeeper cluster there are two roles leader and follower. Leader can accept client requests and receive write requests forwarded by other servers, and is responsible for updating the system state. Follower can also receive client requests, and if a write request is forwarded to leader to update the system state, the read request is directly responded to by follower's memory database. Zookeeper is shown in cluster 1.1.
Figure 1.1 Zookeeper Cluster service
But after Zookeeper's 3.3.3 release, a new role observer was added to zookeeper. Observer's role is similar to that of follower, except that it does not participate in the main selection process. We can then divide the server in the ZK cluster into two types based on this feature:
(1) Votes Server:leader, Follower
(2) Non-voting server:observer
1.2 Why introduce Observer
(1) Zookeeper scalability
So why did zookeeper introduce observer to this role? In fact, the introduction of observer in zookeeper, mainly to make zookeeper have better scalability. So, what is scalability? About scalability, it means different things to different people. And in this case, if our workloads can be shared by allocating more resources to the system, then the system is scalable, and an non-scalable system cannot increase the performance by increasing the resources, or even when the workload increases, performance can drop sharply.
Prior to the advent of observer, the scalability of zookeeper was implemented by follower, and we can guarantee the read performance of zookeeper services by adding the number of follower nodes. However, as the number of follower nodes increases, the write performance of the Zookeeper service is affected. Why does this happen? Here, we need to first look at how this "ZK service" works.
(2) ZK service process
Each server in the Zookeeper service can serve more than one client, and the client can connect to any server in the ZK service to submit the request. If the request is read , the local copy database of each server responds directly. If you change the write request of the server state, it needs to be handled by the consistency protocol, which is the ZAB protocol we described earlier.
To put it simply, the Zab protocol stipulates that all write requests from the client are forwarded to the only server-Leader in the ZK service, where Leader initiates a proposal based on the request. The other server then vote the proposal. After that, leader collects the vote, and leader sends a notification message to all servers when the number of vote is more than half. Finally, when the server that the client connects to receives the message, it updates the operation to memory and responds to the client's write request. The workflow is shown in 1.2.
Figure 1.2 ZK write request workflow flowchart
We can see that the ZooKeeper server actually played two functions in the above agreement. They accept connection and operation requests from clients on one hand, and vote on the results of operations on the other. These two functions are restricted to each other when the zookeeper cluster expands. For example, when we want to increase the number of clients in the ZK service, we need to increase the number of servers to support so many of them. However, from the processing of write requests from the ZAB protocol, we can find that increasing the number of servers increases the pressure on the voting process in the protocol. Because the leader node must wait for more than half of the servers in the cluster to respond to the vote, the increase in the nodes makes some computers run slower, slowing the likelihood of the entire polling process, and the write operation decreases. This is the problem we see in real-world operations-as the ZooKeeper cluster gets larger, the throughput of the write operation drops.
(3) Zookeeper extension
So, we have to weigh the expectation of increasing the number of clients and the expectation that we want to maintain good throughput. To break this coupling, we introduced a server that does not participate in the poll, called Observer. The observer can accept the client's connection and forward the write request to the leader node. However, the leader node does not require observer to participate in the poll. On the contrary, observer does not participate in the voting process, only in the above-mentioned 3rd, and other service nodes to get the results of the poll.
Figure 1.3 Observer Write throughput test
Figure 1.3 shows the result of a simple evaluation. The vertical axis is the number of simultaneous write operations per second that a single client can emit. The horizontal axis is the size of the ZooKeeper cluster. The Blue is that each server is a polling server, and only three of the Green is the polling server, the others are Observer. As we can see, we can almost keep our writing performance in the Observer when we expand it. However, if the number of polling servers is extended, write performance will decrease significantly, obviously observers is valid.
This simple extension brings a whole new image to the scalability of ZooKeeper. We can now join many Observer nodes without worrying about the significant impact on write throughput. But he is not invulnerable, because the notification phase in the protocol is still linearly related to the number of servers. However, the serial overhead here is very low. Therefore, we can assume that the overhead in the notification server phase cannot be a major bottleneck.
Second, observer application
(1) Observer to improve the scalability of read performance
The increase in the number of clients is an important use case for observer, but in fact it brings many other benefits to the cluster. Observer as an optimization of zookeeper, the observer server can directly acquire leader local data storage without having to go through the voting process. But this also faces a certain " time travel " risk, which means that the old value may be read after reading the new value. But this only happens when a server fails. In fact, in this case, the client can use the "sync" operation to ensure that the next value is up-to-date.
Therefore, under the workload of a large number of read operations, observer can greatly improve the performance of zookeeper. To increase the number of polling servers to take on read operations, the write performance of the Zookeeper service is affected. And observer allows us to separate read performance from write performance, which makes zookeeper more suitable for some read-focused scenarios.
(2) observer provides WAN capability
Observer can do more. The observer is a good candidate for a client across a WAN connection. Observer can be used as a candidate for three reasons:
① in order to get a good read performance, it is necessary to keep the client as close to the server as possible, so that the round trip delay is not too high. However, it is very undesirable to distribute the ZooKeeper cluster to two clusters, because a well-configured ZooKeeper should allow for a low-latency connection between polling servers-otherwise, we will encounter the low-speed problem mentioned above.
② and observer can be deployed in any data center that requires access to ZooKeeper. In this way, the voting agreement is not affected by the Gao Shiyan of the link between the data centers, and performance is improved. The message between the Observer and the leader node in the voting process is much less than the message between the polling server and the leader node. This helps reduce bandwidth requirements in the case of high write loads in remote data centers.
③ because the observer does not affect the voting cluster even if it fails, this will not affect the availability of the service itself if the link between the data centers goes down. The probability of this failure is much higher than the failure probability of the connection between the racks in a data center, so it is an advantage to not rely on such a link.
Three, zookeeper cluster construction case
There are several roles in the Zookeeper cluster, and we'll show you how to use these roles to build a zookeeper cluster with good performance. I take a project as an example to give you an analysis of how to plan our zookeeper cluster.
Suppose our project needs to operate across the engine room, our headquarters room is located in Hangzhou , but he also with the United States , Qingdao and other computer rooms for data exchange between. But the computer room between the network delay is relatively large, such as the U.S.-China machine room to walk the submarine cable has a ping operation of 200ms delay, Hangzhou and Qingdao engine Room has 70ms delay. In order to improve the network performance of the system, we deploy the node in each room when we deploy the Zookeeper network, and then form a large network between the multiple rooms to ensure the consistency of the whole ZK cluster data.
According to the previous introduction, the final deployment structure would be:
(headquarter) >=3 Station of Hangzhou computer room: Voting cluster composed of Leader/follower
(branch) Qingdao Machine Room >=1: The ZK cluster composed of observer
(branch) American computer room >=1: ZK cluster composed of observer
Figure 3.1 Zookeeper cluster deployment diagram
We can see that we form a voting cluster in a single computer room, and the peripheral computer room will be a observer cluster and a polling cluster for data interaction. As for some of the benefits of deployment, we zookeeper the role of the previous introduction, compared with the experience, I think this will help you understand zookeeper. And for such a deployment structure, we will introduce a priority cluster problem : For example, in the United States computer room client, need priority to visit the ZK cluster in this room, access to the Hz (headquarters) room.
If you think reading this blog gives you something to gain, you might want to click "recommend" in the lower right corner.
If you want to find my new blog more easily, click on "Follow Me" in the lower left corner.
If you are interested in what my blog is talking about, please keep following my follow-up blog, I am "sunddenly".
This article is copyright to the author and the blog Park, Welcome to reprint, but without the consent of the author must retain this paragraph, and in the article page obvious location to the original link, otherwise reserves the right to pursue legal responsibility.
Hadoop Development Phase 20--zookeeper series (eight)