Analysis of Zookeeper Technology

Source: Internet
Author: User

Transferred from: http://www.cnblogs.com/sharpxiajun/archive/2013/06/02/3113923.html

Zookeeper is a sub-project of Hadoop, and although it originates from Hadoop, I find that the zookeeper of developing a distributed framework out of the scope of Hadoop is growing. Today I want to talk about zookeeper, this article does not talk about how to use zookeeper, but zookeeper actually have what practical use, which type of application can play zookeeper advantage, finally talk about the zookeeper on the distributed Web site architecture can produce what role.

Zookeeper is a highly reliable coordination system for large-scale distributed systems. By this definition, we know that zookeeper is a coordination system, and the object is distributed system. Why does the distributed system need a coordination system? The reasons are as follows:

Developing a distributed system is a difficult task, and the difficulty lies in the "partial failure" of the distributed system. "Partial failure" is the transmission of information between the two nodes of the network, if the network fails, the sender can not know whether the recipient received this information, and the cause of the problem is very complex, the receiver may have received a network error before the message, may not have received, or the receiver's process has died. The only way for the sender to get to the real situation is to reconnect to the receiver and ask for the reason for the receiver's error, which is the "partial failure" issue in Distributed system development.

Zookeeper is the framework for solving the "partial failure" of distributed systems. Zookeeper not let the distributed system avoid "partial failure" problem, but let the distributed system when encountering partial failure, can correctly handle this kind of problem, let the distributed system can run normally.

Let me tell you about the practical use of zookeeper:

  Scenario One : There is a group of servers to provide a service to the client (for example: the service side of the distributed Web site I have previously done, which is a cluster of four servers, to provide services to the front-end cluster), we want the client to find a server in the service-side cluster every time the service is requested. This allows the server to provide the client with the services required by the client. For this scenario, there must be a list of these servers in our program, and each time the client requests it, it reads the list of servers from this table. Then this table clearly cannot be stored on a single node of the server, otherwise this node hangs, the entire cluster will fail, we hope this list is highly available. The highly available solution is that the list is distributed storage, which is managed by the server that stores the list, and if one of the servers in the table is broken, the other server can immediately replace the broken server, and the broken server can be removed from the table, leaving the failed server out of the cluster , and all of this is not done by the failed server, but the normal server in the cluster. This is an active distributed data structure that can proactively modify the state of data items when external conditions change. This service is provided by the zookeeper framework. The service name is the unified naming service, which resembles the Jndi service in Java EE.

  Scenario two : Distributed lock service. When the distributed system operates data, for example: reading data, analyzing data, and finally modifying data. In the distributed system, these operations may be dispersed to different nodes in the cluster, then there is the problem of consistency in the data operation process, if not consistent, we will get a wrong result, in a single process program, the consistency of the problem is very good solution, but to the distributed system is more difficult, Because the operation of the different servers in the distributed system is in the independent process, the intermediate result and the process of the operation also pass through the network, so it is more difficult to achieve the consistency of data operation. Zookeeper provides a lock service that solves this problem and allows us to ensure the consistency of data operations when doing distributed data operations.

  Scenario Three : Configuration management. In the distributed system, we will deploy a service application to n servers, the configuration files are the same (for example: I designed the distributed site framework, the server has 4 servers, 4 servers are the same, the configuration files are the same), If configuration options change, then we have to change each of these configuration files, if we need to change the number of servers less, these operations are not too cumbersome, if we have more distributed servers, such as some large internet companies Hadoop cluster has thousands of servers, Changing configuration options is a cumbersome and dangerous thing to do. This time zookeeper can come in handy, we can use zookeeper as a high-availability configuration memory, to the zookeeper to manage such things, we copy the cluster configuration file to a node of the zookeeper file system, Then use zookeeper to monitor the status of the configuration files in all distributed systems, once the configuration files have been found to change, each server will receive zookeeper notifications, each server to synchronize the zookeeper configuration files, The Zookeeper service also guarantees the atomicity of the synchronization operation, ensuring that each server's configuration file is updated correctly.

  Scenario Four : provides fault-repair functionality for distributed systems. Cluster management is very difficult, in the distributed system to join the zookeeper service, can make it easy for us to manage the cluster. Cluster management The most troublesome thing is node fault management, zookeeper can let the cluster choose a healthy node as the Master,master node will know the current cluster of each server health, once a node fails, Master will notify the other servers in the cluster to redistribute the compute tasks for the different nodes. Zookeeper not only can find fault, but also the fault of the server screening, see what the fault server is the fault, if the fault can be repaired, zookeeper can automatically repair or tell the system administrator the cause of the error to let the administrator quickly locate the problem, repair the fault of the node. We may have a question, master fault, then how to do? Zookeeper also considered this, zookeeper has an "election leader algorithm", master can be dynamically selected, when master failure, zookeeper can immediately select a new master to manage the cluster.

Next I want to talk about the characteristics of zookeeper:

    1. Zookeeper is a streamlined file system. It's a bit like Hadoop, but zookeeper this file system manages small files, and Hadoop manages huge files.
    2. Zookeeper provides a rich number of "artifacts" that can be used to coordinate the operation of data structures and protocols. For example: Distributed queues, distributed locks, and "leader election" algorithms for a set of sibling nodes.
    3. Zookeeper is highly available, its own stability is quite good, distributed cluster can rely on the management of zookeeper cluster, using zookeeper to avoid the problem of single point of failure of distributed system.
    4. The zookeeper uses a loosely coupled interaction pattern. This is most evident in the zookeeper offering distributed locking, zookeeper can be used as an appointment mechanism that allows the participating process to discover and interact with each other without knowing the other processes (or networks), and the parties involved do not even have to exist at the same time. As long as a message is left in zookeeper, after the process is finished, another process can read the message, decoupling the relationships between the nodes.
    5. Zookeeper provides a shared repository for the cluster, where the cluster can centrally read and write shared information, avoids the shared operation programming of each node, and eases the development of the distributed system.
    6. Zookeeper design uses the observer's design pattern, Zookeeper is responsible for storing and managing the data that everyone cares about, then accepting the observer's registration, once the status of these data changes, Zookeeper will be responsible for notifying the Zookeeper The observers on the Register react accordingly to achieve similar master/slave management patterns in the cluster.

This shows that zookeeper is conducive to the development of distributed systems, it can make distributed systems more robust and efficient.

I joined the Hadoop interest Group in the department shortly before, and I installed the Hadoop, MapReduce, Hive, and hbase in the test environment, and installed HBase to install zookeeper beforehand. I was the first to install the zookeeper on four servers, but colleagues said that installing four and installing three is one thing, because zookeeper requires more than half of the machines available, zookeeper to provide services, so more than half of 3 units is 2 units, More than half of the 4 units are also two, so the three servers can fully achieve the effect of 4 servers, this problem indicates that zookeeper when installing the odd server is usually selected. In the process of learning Hadoop, I feel that zookeeper is the most difficult to understand a sub-project, because it is not technically responsible, but it is the direction of the application is very confusing me, so I about the first article on Hadoop technology from the zookeeper, but also do not speak of specific technology implementation, And from the Zookeeper application scenario, understand the zookeeper application of the field, I want to learn zookeeper will be more effective.

The reason to talk about zookeeper today is also a supplement to my previous post on the Distributed Web site framework. Although I design the site architecture is a distributed structure, but also do a simple fault-handling mechanism, such as: heartbeat mechanism, but the single point of failure of the cluster is still no way, if a server is broken, the client will still try to connect to this server, resulting in the blocking of some requests, will also lead to waste of server resources. However, I do not want to change the framework at the moment, because I always feel that adding zookeeper services on existing services will affect the efficiency of the site, if there is a separate server cluster deployment zookeeper is worth considering, but the server resources is too valuable, this is unlikely. Fortunately, our department has found such a problem, our department will develop a strong remote call framework, the cluster management and communication management This piece stripped out, centralized to provide efficient service, and other departments of the remote framework development, our site to join the new service, I think our site will be more stable and efficient.

Analysis of Zookeeper Technology

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.