It's nice to see that Yahoo donated zookeeper has migrated from SourceForge to Apache and become a subproject of Hadoop. So what is zookeeper? Zookeeper is an open-source implementation of Google's chubby. is a highly effective and reliable collaborative work system. Zookeeper can be used to leader elections, configure information maintenance, and so on. In a distributed environment, we need a master instance or store some configuration information To ensure the consistency of file writes. Zookeeper can guarantee the following 3 points:
Watches are ordered with respect to other events, other Watches, and
asynchronous replies. The Zookeeper client libraries ensures that
modifiable is dispatched in order. A client would-watch event for a znode it's watching unreported seeing the new data that corresponds. The Order of watch events from zookeeper corresponds to the "Order of" the updates as dots by the zookeeper service.
In zookeeper, Znode is a node similar to the UNIX file system path where data can be stored or retrieved. If flag is set to ephemeral when the Znode is created, This znode will no longer exist in zookeeper when the node that created this znode is disconnected from the zookeeper. Zookeeper uses watcher to detect event information, when the client receives event information, such as connection timeout, node data change, Child nodes change, you can invoke the corresponding behavior to process the data. Zookeeper's wiki page shows how to use zookeeper to handle event notifications, queues, priority queues, locks, shared locks, revocable shared locks, and two-phase commit.
So what can zookeeper do for us? A simple example: Let's say we have a server for 20 search engines (each of the search tasks responsible for a portion of the total index) and a total server (responsible for sending a search request to the servers of the 20 search engines and merging the result sets), An alternate total server (responsible for replacing the total server when the total server is down), a web CGI (a search request to the total server). 15 servers in the search engine's servers now provide search services, 5 servers are generating indexes. Servers in these 20 search engines often have to start generating indexes by stopping the server that is providing the search service. Or the server that generated the index has completed the index generation to search for the service. Use zookeeper to ensure that the total server automatically senses how many servers provide search engines and make search requests to those servers, and that the standby total server is automatically enabled when it is down. Web CGI can automatically learn about the network address changes of the total server.
servers that provide search engines create znode,zk.create in Zookeeper ("/search/nodes/node1",
"Hostname". GetBytes (), Ids.open_acl_unsafe, createflags.ephemeral); The total server can get a list of Znode child nodes from Zookeeper, Zk.getchildren ("/search/nodes", true); The total server traverses these subnodes and gets the data generation of the child nodes to provide a list of the search engine's servers. When the total server receives the event information changed by the child node, return to the second step. The total server creates nodes in Zookeeper, Zk.create ("/search/master", "hostname". GetBytes (), Ids.open_acl_unsafe, createflags.ephemeral ); The standby total server monitors the "/search/master" node in the zookeeper. When this znode node data changes, it turns itself into a total server and puts its own network address data into this node. The web CGI obtains the network address data of the total server from the "/search/master" node in zookeeper and sends a search request to it. The "/search/master" node in the Web CGI monitor zookeeper, when this znode node data changes, obtains the total server's network address data from this node, and changes the current total server's network address.
In my test: A zookeeper cluster of 3 zookeeper nodes. One leader, two follower, stop leader, Then two follower elect a leader. The data obtained is unchanged. I think zookeeper can help Hadoop do:
Hadoop, using Zookeeper event handling ensures that the entire cluster has only one namenode, storage configuration information, and so on.
HBase, use zookeeper event handling to ensure that the entire cluster has only one hmaster, detect hregionserver online and downtime, store access control lists, and so on.
Zookeeper Doc:
Zookeeper.pdf
Zookeeper Video:
Gossip about, in zookeeper Wike see the Tao of Zookeeper (Zookeeper way), is Yahoo's zookeeper team also have people?