ZooKeeper Getting Started Guide
Getting started:coordinating distributed Applications with ZooKeeper
This document contains information to get the started quickly with ZooKeeper. It is aimed primarily at developers hoping-try it out, and contains simple installation instructions for a single Zo Okeeper server, a few commands to verify that it's running, and a simple programming example. Finally, as a convenience, there is a few sections regarding more complicated installations, for example running Repli cated deployments, and optimizing the transaction log. However for the complete instructions for commercial deployments, please refer to the ZooKeeper Administrator ' s Guid E.
Pre-requisites
See System Requirements in the Admin Guide.
Download
To get a ZooKeeper distribution, download a recent stable release from one of the Apache download Mirrors.
Standalone operation
Setting up a ZooKeeper server in standalone mode is straightforward. The server is contained in a single JAR file, so installation consists of creating a configuration.
Once you ' ve downloaded a stable ZooKeeper release unpack it and CD to the root
To start ZooKeeper you need a configuration file. Here's a sample, create it in conf/zoo.cfg:
ticktime=2000datadir=/var/lib/zookeeperclientport=2181
This file can is called anything, but for the sake of this discussion call it conf/zoo.cfg. The value of the datadir to specify a existing (empty to start with) directory. Here is the meanings for each of the fields:
DataDir
The location to store the In-memory database snapshots and, unless specified otherwise, the transaction log of Updates to the database.
ClientPort
The port to listen for client connections
Now so you created the configuration file and you can start ZooKeeper:
bin/zkserver.sh start
ZooKeeper logs messages using log4j--more detail available in the Logging section of the Programmer ' s Gui De. You'll see log messages coming to the console (default) and/or a log file depending on the log4j configuration .
The steps outlined here run ZooKeeper in standalone mode. There is no replication, so if ZooKeeper process fails, the service would go down. This was fine for the most development situations, but to run ZooKeeper in replicated mode, please see Running replicated ZooKeeper.
Managing ZooKeeper Storage
For long running production systems ZooKeeper storage must is managed externally (DataDir and logs). See the sections on maintenance for more details.
Connecting to ZooKeeper
Once ZooKeeper is running and you are several options for connection to it:
-
Java : Use
bin/zkcli.sh -server 127.0.0.1:2181
This lets your perform simple, file-like operations.
-
C : Compile cli_mt (multi-threaded) or Cli_st (single-threaded) By running , make CLI_MT or make cli_ St in the src/c subdirectory in &NBS P;the ZooKeeper sources. See the README contained within , src/c for full details.
You can run the program from src/c using:
ld_library_path=. cli_mt 127.0.0.1:2181
or
ld_library_path=. cli_st 127.0.0.1:2181
This would give you a simple shell to execute file system like operations on ZooKeeper .
Once You has connected, you should see something like:
Connecting to Localhost:2181log4j:warn No Appenders could is found for logger (org.apache.zookeeper.ZooKeeper). log4j: WARN Initialize the log4j system properly. Welcome to zookeeper! JLine support is enabled[zkshell:0]
From the shell, type help to get a listing of commands the can is executed from the client, as in:
[zkshell: 0] helpzookeeper host:port cmd args get path [watch] ls path [watch ] set path data [version] delquota [-n|-b] path quit printwatches on|off create path data acl stat path [watch] listquota path history setacl path acl getAcl path sync path redo cmdno addauth scheme auth delete path [version] deleteall path Setquota -n|-b val path
From here, you can try a few simple commands to get a feel for this simple command line interface. First, start by issuing the List command, as in ls, yielding:
[Zkshell:8] LS/[zookeeper]
Next, create a new znode by running create/zk_test my_data. This creates a new Znode and associates the string "My_data" with the node. You should see:
[Zkshell:9] Create/zk_test my_datacreated/zk_test
Issue another LS/ command to see what the directory looks like:
[zkshell:11] LS/[zookeeper, zk_test]
Notice that the Zk_test directory have now been created.
Next, verify that the data is associated with the Znode by running the get command, as in:
[Zkshell:12] Get/zk_testmy_dataczxid = 5ctime = Fri June 13:57:06 PDT 2009mZxid = 5mtime = Fri June 13:57:06 PDT 200 9pZxid = 5cversion = 0dataVersion = 0aclVersion = 0ephemeralOwner = 0dataLength = 7numChildren = 0
We can change the data associated with zk_test by issuing the set command, as in:
[Zkshell:14] set/zk_test junkczxid = 5ctime = Fri June 13:57:06 PDT 2009mZxid = 6mtime = Fri June 14:01:52 PDT 2009p Zxid = 5cversion = 0dataVersion = 1aclVersion = 0ephemeralOwner = 0dataLength = 4numChildren = 0[zkshell:15] Get/zk_test Junkczxid = 5ctime = Fri June 13:57:06 PDT 2009mZxid = 6mtime = Fri June 14:01:52 PDT 2009pZxid = 5cversion = 0dataVer sion = 1aclVersion = 0ephemeralOwner = 0dataLength = 4numChildren = 0
(Notice We do a get after setting the data and it does, indeed, change.
Finally, let's delete the node by issuing:
[zkshell:16] delete/zk_test[zkshell:17] LS/[zookeeper][zkshell:18]
That's it for now. To explore more, continue with the rest of this document and see the Programmer's Guide.
Programming to ZooKeeper
ZooKeeper has a Java bindings and C bindings. They is functionally equivalent. The C bindings exist in the Variants:single threaded and multi-threaded. These differ only on how the messaging loop was done. For more information, see the programming Examples in the ZooKeeper Programmer ' s Guide for sample code using of The different APIs.
Running Replicated ZooKeeper
Running ZooKeeper in standalone mode are convenient for evaluation, some development, and testing. But in production, you should run ZooKeeper in replicated mode. A replicated group of servers in the same application are called a quorum, and in replicated mode, all s The ervers in the quorum has copies of the same configuration file. The file is similar to the one used in standalone mode, but with a few differences. Here are an example:
Ticktime=2000datadir=/var/lib/zookeeperclientport=2181initlimit=5synclimit=2server.1=zoo1:2888:3888server.2= zoo2:2888:3888server.3=zoo3:2888:3888
The new entry, initlimit is timeouts ZooKeeper uses to limit the length of time the ZooKeeper servers I n Quorum has to connect to a leader. The entry Synclimit limits how far out of date a server can is from a leader.
With both of these timeouts, you specify the unit of time using ticktime. In this example, the timeout for Initlimit are 5 ticks at milleseconds a tick, or seconds.
The entries of the form server. X List The servers the ZooKeeper service. When the server starts up, it knows which server it was by looking for the file myID in the data directo Ry. That file has the contains the server number, in ASCII.
Finally, note the port numbers after each server name: "2888" and "3888". Peers use the former port-to-connect to other peers. Such a connection is necessary so that peers can communicate, for example, to agree upon the order of updates. More specifically, a ZooKeeper server uses this port to connect followers to the leader. When a new leader is arises, a follower opens a TCP connection to the leader using this port. Because The default leader election also uses TCP, we currently require another port for leader election. The second port in the server entry.
Note
If you want to test multiple servers on a single machine specify the servername as localhost with Unique Quorum & leader election ports (i.e. 2888:3888, 2889:3889, 2890:3890 in the example above) for EA CH Server. X in that server's config file. Of course separate datadirs and distinct ClientPortS is also necessary (in the above REPL icated example, running on a single localhost, you would still has three config files).
Other optimizations
There is a couple of other configuration parameters that can greatly increase performance:
To get low latencies on updates It's important to has a dedicated transaction log directory. By default transaction logs is put in the same directory as the data snapshots and myID file. The Datalogdir parameters indicates a different directory to use for the transaction logs.
[Tbd:what is the other config param?]
Zookeeper Official documents