Analysis on zookeeper Technology

Last Update:2014-07-29 Source: Internet

Author: User

Tags node server

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Zookeeper is a sub-project of hadoop. Although it originated from hadoop, I found that zookeeper is increasingly used to develop distributed frameworks Out Of The hadoop category. Today I want to talk about zookeeper. This article will not talk about how to use zookeeper, but about the practical use of zookeeper and the advantages of zookeeper, finally, let's talk about the role of zookeeper in Distributed website architecture.

Zookeeper is a highly reliable coordination system for large-scale distributed systems. By this definition, we know that zookeeper is a coordination system and its role is a distributed system. Why do distributed systems need a coordination system? The reasons are as follows:

Developing a distributed system is very difficult. The difficulties are mainly reflected in the "partial failure" of the distributed system ". "Partial failure" means that when information is transmitted between two nodes in the network, if the network fails, the sender cannot know whether the receiver has received the information, and the cause of this failure is very complicated, the recipient may have received or failed to receive the message before a network error occurs, or the recipient's process may have died. The only way for a sender to obtain the real situation is to reconnect to the receiver and ask the recipient for the cause of the error. This is part of the "failure" issue in distributed system development.

Zookeeper is a framework for solving "partial failures" of distributed systems. Zookeeper does not allow the distributed system to avoid "partial failure", but allows the distributed system to correctly handle such problems when it encounters partial failure, so that the distributed system can run normally.

The following describes the actual use cases of zookeeper:

Scenario 1: A group of servers provide services to the client (for example, the server of the distributed website I previously created is a cluster composed of four servers that provide services to the front-end cluster ), we hope that the client can find a server in the server cluster every time it requests the server, so that the server can provide the services required by the client to the client. In this scenario, our program must have a list of servers in this group. Each client request reads the list of servers from this list. This list cannot be stored on a single-node server. Otherwise, the entire cluster will fail and we hope this list will be highly available. The high-availability solution is: This list is distributed storage, which is managed by the servers that store this list. If a server in the list breaks down, other servers can immediately replace the faulty server, and delete the faulty server from the list so that the faulty server can exit the entire cluster, all these operations will not be performed by the faulty server, but by the normal server in the cluster. This is an active distributed data structure that can actively modify the status of data items when external conditions change. The Zookeeper framework provides such services. The service name is: Uniform Naming Service, which is similar to the JNDI service in javaee.

Scenario 2: Distributed Lock Service. When operating data in a distributed system, such as reading data, analyzing data, and finally modifying data. In a distributed system, these operations may be distributed to different nodes in the cluster. In this case, there is a problem of consistency during data operations. If they are inconsistent, we will get an incorrect calculation result. In a single process program, the consistency problem is well solved, but it is difficult to get to the distributed system, because the operations on different servers in the distributed system are in independent processes, and the intermediate results and processes of the operations must be transmitted over the network, it is much more difficult to achieve data operation consistency. Zookeeper provides a lock service to solve this problem. This allows us to ensure data operation consistency during distributed data operations.

Scenario 3: Configuration Management. In a distributed system, we deploy a service application to N servers respectively. The configuration files of these servers are the same (for example, in my distributed website framework, the server has four servers, the applications on the four servers are the same, and the configuration files are the same). If the configuration options of the configuration file change, we have to modify these configuration files one by one. If we need to change a small number of servers, these operations are not too troublesome. If we have a large number of distributed servers, for example, if a hadoop cluster of some large Internet companies has thousands of servers, changing the configuration option is troublesome and dangerous. At this time, Zookeeper can be used. We can use zookeeper as a high-availability configuration memory and hand over such a thing to zookeeper for management, we copy the cluster configuration file to a node in the zookeeper file system, and then use zookeeper to monitor the status of configuration files in all distributed systems. Once we find that the configuration file has changed, each server will receive a notification from zookeeper, allowing each server to synchronize the configuration files in zookeeper. zookeeper service will also ensure the atomicity of synchronization operations, make sure that the configuration files of each server are correctly updated.

Scenario 4: fault recovery is provided for distributed systems. Cluster Management is very difficult. Adding zookeeper to the distributed system makes it easy for us to manage clusters. The most troublesome thing about cluster management is node failure management. zookeeper allows the cluster to select a healthy node as the master node, and the master node will know the running status of each server in the current cluster, once a node fails, the master will notify other servers in the cluster of this situation and re-allocate computing tasks for different nodes. Zookeeper can not only detect faults, but also identify faulty servers to see what faults the faulty servers are. If the fault can be repaired, zookeeper can automatically fix or tell the system administrator the cause of the error so that the administrator can quickly locate the problem and fix node faults. You may have another question: What should I do if the master node fails? Zookeeper also takes this into consideration. zookeeper has an internal "election leader algorithm", which can be dynamically selected by the master node. When the master node fails, Zookeeper can immediately select a new master node to manage the cluster.

The following describes the features of zookeeper:

Zookeeper is a streamlined file system. This is a bit like hadoop, but the file system zookeeper manages small files, while hadoop manages large files.
Zookeeper provides a wide range of components that can coordinate data structures and protocols. For example, distributed queues, distributed locks, and the "Leader Election" algorithm for a group of nodes at the same level.
Zookeeper is highly available, and its stability is quite good. Distributed clusters can rely on zookeeper cluster management to avoid single point of failure in distributed systems.
Zookeeper adopts a loosely coupled interaction mode. This is the most evident in Zookeeper's distributed locking mechanism. zookeeper can be used as a dating mechanism to prevent the participating processes from learning about other processes (or networks) when the process ends, you can discover and interact with each other, and the parties involved do not even need to exist at the same time, as long as a message is left in zookeeper, after the process ends, another process can read this information to decouple the relationship between nodes.
Zookeeper provides a shared repository for the cluster, where the cluster can read and write the shared information in a centralized manner, avoiding the Sharing Operation Programming of each node and reducing the development difficulty of the distributed system.
Zookeeper is designed to use the observer design mode. zookeeper is mainly responsible for storing and managing the data that everyone cares about, and then accepting the registration of the observer. Once the status of the data changes, zookeeper will be responsible for notifying those observers who have registered on zookeeper to respond accordingly, so as to implement the cluster management mode similar to Master/Slave.

It can be seen that zookeeper is very conducive to distributed system development, it can make distributed systems more robust and efficient.

Not long ago, I participated in the hadoop Interest Group of the department. I installed hadoop, mapreduce, hive, and hbase in the testing environment. zookeeper should be installed in advance during hbase installation, I installed zookeeper on four servers at the earliest, but my colleague said that installing four servers and three servers is the same. This is because zookeeper requires more than half of the machines to be available and zookeeper can provide services, therefore, more than half of the three servers are two, and more than half of the four servers are two. Therefore, three servers can achieve the effect of four servers, this problem indicates that an odd number of servers are usually used for zookeeper installation. In the process of learning hadoop, I feel that zookeeper is the most difficult sub-project to understand. The reason is not that it is technically responsible, but that its application direction is confusing to me, therefore, my first article on hadoop technology started with zookeeper and did not talk about specific technical implementation. I started from the application scenarios of zookeeper and understood the application fields of zookeeper, I want to learn zookeeper more effectively.

The reason why I want to talk about zookeeper today is to supplement the distributed website framework in my previous article. Although I designed a distributed website architecture and implemented a simple fault handling mechanism, such as the heartbeat mechanism, there is still no way to resolve single points of failure in clusters, if a server breaks down, the client will attempt to connect to the server, blocking some requests and wasting server resources. However, I do not want to modify my own framework at present, because I always feel that adding zookeeper to the existing service will affect the website's efficiency, if zookeeper is deployed in an independent server cluster, it is worth considering. However, this is unlikely because the server resources are too valuable. Fortunately, our department has also discovered such a problem. Our Department will develop a powerful remote call framework to strip out cluster management and communication management and provide efficient and available services in a centralized manner, after the remote framework development of the department is completed, our website will be added to the new service. I think our website will be more stable and efficient.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More