Zookeeper Introduction (Shallow) let you understand.

Source: Internet
Author: User
Tags zookeeper

Zookeeper, some have heard, some people do not, I also because of their own in doing a distributed system, by Dubbo+zookeeper integration, so contact. What the hell is that thing? On this issue I first to its official website and Baidu Encyclopedia. It's roughly what Zookepper is a subproject of Hadoop, a project under the Apache Software Foundation that acts as a distributed coordinating role for the type of interaction with our brain. And as for what Hadoop is, I can only tell you that it is a large data frame, specifically what, the small is not clear, haha. In fact zookeeper in the role of Hadoop, I can make a metaphor. Whether it's Hadoop or other distributed systems, it's like our human body, with its heart, stomach, and respiratory system. Legs, hands ... Wait a minute. So many subsystems in the distributed environment, how to coordinate it. That is our brain (zookeeper), we can think of Zookeepr as the brain, there is no other function, only responsible for coordination, contact subsystem functions, specific coordination functions, look down


Back to the theme, zookeeper official website is so introduced it

Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed Synchronizatio N, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot to work that goes to fixing the bugs and race conditions that are Le. Because of the difficulty of implementing these kinds to services, applications initially usually skimp on them, which MAK E them brittle in the presence of change and difficult to manage. When even was done correctly, different implementations of the this services leads to management complexity when the applications are deployed.


Chinese translation

Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All these types of services are used in one form or another of a distributed application. Each time they are implemented, there is a lot of work to fix the errors and the competitive conditions are unavoidable. Because of the difficulty of implementing these services, the application often mocks them initially, making them vulnerable and unmanageable in the presence of change. Even if done correctly, the different implementations of these services result in administrative complexity when the application is deployed.


In fact, zookeeper how to coordinate Hadoop and other distributed systems, we can see from its introduction (superscript red part), the following for the above 4, we simply say, the purpose is that we can simply know, understand zookeeper.


1, Maintenance configuration information

The application scenario for this:

Let's say you have more than 20 or even more projects that work together using a lot of configuration files. When you want to modify the configuration file, you have to copy it all, which is time-consuming, energy, and is not conducive to management. And now the project is distributed projects, plus if the project has been online, then if you change each one, and then restart the service, it is not trouble dead.

Zookeeper provides this configuration management, which is to extract this common configuration file into a place, to the local (directory node) to monitor, once the configuration information changes, each application will receive zookeeper notification, and then from zookeeper Getting new configuration information is applied to the system, and the project does not need to be restarted so that the first question is solved.
Reference article: http://www.jianshu.com/p/01388f06e75d


2, naming

The naming function actually manifests on the Dubbo, I believe everybody knows jndi, if do not know then I also have no way. Zookeeper's naming service is almost as much about Jndi functionality as it should be, so you should probably know the naming function of zookeeper. If a duboo+zookeeper integrated system is known, it is because of this naming service that In order to facilitate the distribution of the link between the project. Within the SOA framework, the RMI framework is to invoke services by simply using the URL on a server to get objects on a remote server, but in a cluster, and in a distributed environment, the relationship between subprojects is not complicated, and there is no question of what service is being called. Which, so here is a server specifically for us to manage the service here information and regulation, management of these services, so that we can focus on the project's business, Zookeeper's naming function is such a server. When clustering, the same service has many providers, when these providers start, the provider server related information, including service interface, address, port, etc. the connection provider's information registers into the Zookper, when the consumer wants to consume a service, The directory of all providers of services is changed from zookeeper, and a provider is selected from the map according to the Dubbo load balancing mechanism. In fact, from the role of 1,2 can be seen, zookeeper you can see is a file system (similar to the Linux file system), or that sentence, zookeeper is distributed in the brain.


3. Distributed synchronization

Distributed synchronization, also known as distributed locks. Believe that the concept of lock, we should all know it. For a single server lock, we all know how to do it. On a distributed, if a method to invoke B is c,c, the lock cannot be added as a single server, because this is a distributed invocation method. As with transactions, transactions on a single server are not the same as distributed transactions. Zookeeper provides a distributed synchronous (lock) method, using its provided, customers can save a lot of things.

The idea of zookeeper distributed lock

When a process needs to access shared data, it creates a sequence type of child node under the "/locks" node, called the Thispath. When Thispath is the smallest of all child nodes, the process obtains a lock. Once the process has acquired the lock, it can access the shared resource. After the access is complete, you need to delete the Thispath. The lock is obtained by the new smallest child node.
With clear ideas, you need to add a few more details. How does the process know that Thispath is the smallest of all child nodes? You can get a list of child nodes by using the GetChildren method when you create them, and then find a node in the list that is ranked above the top 1 in the Thispath, called the Waitpath, and then registers the listener on the Waitpath, and when the Waitpath is deleted, the process gets notified, This indicates that the process acquired a lock.

Approximate flowchart


Reference documents

https://my.oschina.net/91jason/blog/500503


Iv. Provision of Group services

In fact, this group of services is the cluster management, like Dubbo+zookeeer. Service registration on a server, or access to services, is done on a zookeeper, so the so-called provision of group services includes creating groups, joining Group members, listing group members, and deleting group members. For these services, it is primarily through the zookeeper heartbeat mechanism, which detects the number of servers connected to them, as well as information. When a zookeeper is connected, or when it is disconnected, the heartbeat mechanism is complete.





I am only a few of my views on zookeeper, if there are any questions can be raised.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.