zookeeper technology analysis

Last Update:2014-12-22 Source: Internet

Author: User

Keywords Server fault can us

Tags aliyun analysis application applications broken change client configuration

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Zookeeper is a sub-project of hadoop. Although it originated from hadoop, I found that zookeeper is more and more used to develop distributed framework from the category of hadoop. Today I would like to talk about zookeeper, this article will not talk about how to use zookeeper, but in the end zookeeper what practical use, what types of applications can play the advantages of zookeeper, and finally talk about zookeeper on distributed http://www.aliyun.com /zixun/aggregation/11116.html "> Web site architecture can produce what kind of effect.

Zookeeper is a highly reliable and coordinated system for large distributed systems. By this definition we know that zookeeper is a coordination system, the role of the object is a distributed system. Why does a distributed system require a coordinated system? The reasons are as follows:

Developing distributed systems is a difficult undertaking, with the major difficulties that lie in the "partial failure" of distributed systems. "Partial failure" means that when the message is transmitted between two nodes in the network, if the network fails, the sender can not know whether the receiver has received this message, and the cause of such failure is complicated and the receiver may be present The network error has received the message before, may not receive, or the recipient's process is dead. The only way the sender can get the real thing is to reconnect to the receiver and ask why the receiver was wrong, which is the "partial failure" problem in distributed system development.

Zookeeper is the framework to solve the "partial failure" of distributed systems. Instead of letting distributed systems avoid the "partial failure" problem, Zookeeper lets the distributed system run correctly when it encounters a partial failure, so that the distributed system can run normally.

Now I will talk about the actual use of zookeeper scene:

Scenario 1: There is a group of servers that provide some kind of service to the client (for example, the service side of the distributed website that I made in front of is a cluster of four servers that provides service to the front-end cluster). We hope that each time the client requests The server can find a server in the server cluster so that the server can provide the client with the services required by the client. For this scenario, our program must have a list of this group of servers, each time the client requests, from the list to read the server list. Then this sub-list obviously can not be stored in a single-node server, or the node is hung up, the entire cluster will fail, we hope this list is available. Highly available solution is: This list is distributed storage, it is stored by the list of servers co-management, if the storage list of a server is broken, other servers can immediately replace the broken server , And can remove the broken server from the list, let the failed server out of the operation of the entire cluster, and all these operations will not operate from the failed server, but the normal server in the cluster to complete. This is a proactive, distributed data structure that can proactively modify the state of data items when external conditions change. The Zookeeper framework provides this service. This service name is: a unified naming service, which is very similar to JNDI service in javaEE.

Scene two: Distributed lock service. When the distributed system operates on data, for example: read the data, analyze the data and finally modify the data. In a distributed system these operations may be scattered to different nodes in the cluster, then this time there is consistency in the operation of the data, if inconsistent, we will get a wrong operation results in a single process of the program , The problem of consistency is solved very well, but it is more difficult to go to a distributed system. Because different servers in a distributed system operate in an independent process, intermediate results and processes of operations are also transmitted through the network. Then Want to be consistent data consistency is more difficult. Zookeeper provides a lock service to solve this problem, allows us to do distributed data operations, to ensure consistency of data operations.

Scene three: configuration management. In a distributed system, we deploy a service application to n servers with the same configuration file (for example, in the distributed website framework that I designed, the server has 4 servers and 4 servers Server programs are the same, the configuration file is the same), if the configuration file configuration options change, then we have to change these configuration files one by one, if we need to change the server is relatively small, these operations are not too Trouble, if we have a particularly large number of distributed servers, such as the Hadoop cluster in some large Internet companies with thousands of servers, changing the configuration options can be a bothersome and dangerous thing. At this time zookeeper can come in handy, we can zookeeper as a highly available configuration memory, such things to zookeeper management, we will copy the cluster configuration file to a node in zookeeper's file system, And then use zookeeper to monitor the status of the configuration files in all distributed systems, once found the configuration file has changed, each server will receive zookeeper notice, so that each server synchronization zookeeper configuration file, zookeeper service will ensure synchronization Atomicity ensures that each server's configuration files are properly updated.

Scenario 4: Provides distributed system fault repair function. Cluster management is very difficult, in the distributed system to join the zookeeper service, allows us to easily manage the cluster. Cluster management is the most troublesome node failure management, zookeeper allows the cluster to select a healthy node as master, the master node will know the current cluster of each server's health, once a node fails, the master will this situation Notify other servers in the cluster to reassign computing tasks for different nodes. Zookeeper can not only find the fault, but also identify the faulty server to see what kind of fault the fault server. If the fault can be repaired, zookeeper can automatically repair or tell the system administrator the reason for the error to quickly locate the problem, Fix node failure. We may still have a question, master failure, then how to do? zookeeper also considered this point, zookeeper internal "electoral leader algorithm", master can be dynamically selected, when the master failure, zookeeper can immediately select a new master to manage the cluster.

Now I will talk about the characteristics of zookeeper:

zookeeper is a streamlined file system. This is a bit like hadoop, but zookeeper is a file system for managing small files, whereas hadoop is for managing large files.

zookeeper provides a wealth of "components", these components can achieve a lot of coordination of data structures and protocols. For example: distributed queues, distributed locks, and a group of sibling nodes "leader election" algorithm.

zookeeper is highly available, its inherent stability is quite good, distributed cluster can rely on zookeeper cluster management, the use of zookeeper to avoid single point of failure of distributed systems.

zookeeper uses loosely coupled interactive mode. This is most obvious with zookeeper providing distributed locking. Zookeeper can be used as an appointment mechanism to allow participating processes to discover and interact with each other without knowledge of other processes (or networks) Do not have to exist at the same time, as long as a message left in zookeeper, at the end of the process, another process can read this information, thus decoupling the relationship between the various nodes.

zookeeper provides a shared repository for the cluster, from which the cluster can read and write shared information centrally, avoiding the shared operation and programming of each node, and alleviating the difficulty of the distributed system development.

zookeeper is designed with observer design patterns in mind. zookeeper is primarily responsible for storing and managing data of interest to everyone and then accepting observer registration. Once the state of these data changes, Zookeeper will be responsible for notifying that it has been registered with Zookeeper Those observers react accordingly to achieve a similar Master / Slave management model in the cluster.

This shows that zookeeper is conducive to the development of distributed systems, which make the distributed system more robust and efficient.

Not long ago I participated in the sector hadoop interest groups, hadoop test environment, mapreduce, hive and hbase are all I installed to install hbase installation to pre-install zookeeper, I was the first to install zookeeper on four servers, However, my colleagues said that installing four and installing three is one thing, because zookeeper requires more than half of the machines available, zookeeper to provide services, so more than half of the three is 2, and more than half of 4 is also two , So installed three servers can achieve the full effect of four servers, the problem that zookeeper usually choose to install an odd number of servers. In the process of learning hadoop, I feel zookeeper is the most difficult to understand a subproject, because it is not responsible for the technology, but its application is very confusing, so my first article about hadoop technology from zookeeper , Do not talk about the specific technology to achieve, but from the zookeeper application scenarios talk about understanding the zookeeper application areas, I would like to learn zookeeper will be more with less.

The reason why we talk about zookeeper today is also a supplement to my previous article, Distributed Website Framework. Although I design the site architecture is distributed architecture, but also a simple troubleshooting mechanism, such as: heartbeat mechanism, but there is no single point of failure of the cluster, if a server is broken, the client will certainly Attempting to connect to this server, causing some requests to block, can also result in a waste of server resources. However, I currently do not want to modify my framework because I always feel that adding the zookeeper service to existing services will affect the website's efficiency. If there is a separate server cluster deploying zookeeper is worth considering, but the server resources are too expensive, This is unlikely. Fortunately, our department also found such a problem, our department will develop a powerful remote call framework, the cluster management and communications management will be stripped out to provide efficient and efficient centralized services, such as the development of the remote framework of the department, our Website to join the new service, I think our website will be more stable and efficient.

Original link: http: //f.dataguru.cn/forum.php? Mod = viewthread & tid = 344366

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More