Zookeeper operation and Maintenance

Source: Internet
Author: User
Tags benchmark time interval zookeeper ssh account tomcat git clone log4j

Although zookeeper in the programming has many traps, the API is also very difficult to use, but the zookeeper service itself can say is very reliable, therefore appears on the net regarding the Movement dimension the article to be relatively few.

But worry does not mean that there will be no trouble, the following summary of zookeeper operation and related Dongdong. Important reference Materials

Here is a good pdf, introduced a lot of zookeeper, the author is one of the committer of zookeeper:
In addition, here is a summary: http://marcin.cylke.com.pl/blog/2013/03/21/zookeeper-tips/
Configure zookeeper boot

First modify the bin/zkenv.sh, configure the ZOO_LOG_DIR environment variable, Zoo_log_dir is the Zookeeper log output directory, Zoo_log4j_prop is the log4j log output configuration:

if ["x${zoo_log_dir}" = "x"]
    zoo_log_dir= "$ZOOBINDIR/.. /logs "

If [" x${zoo_log4j_prop} "=" x "]
    zoo_log4j_prop=" Info,rollingfile "

Add the Zookeeper1 file in the/ETC/INIT.D directory and add executable permissions:

chmod +x zookeeper1
The contents of the modified Zookeeper1 are:

#chkconfig: 2345   
Description: Zookeeper1 case,  in
          start) su zookeeper/home/ zookeeper/zookeeper345_1/bin/zkserver.sh start   ;;
          Stop) su zookeeper/home/zookeeper/zookeeper345_1/bin/zkserver.sh stop;;
          Status) su zookeeper/home/zookeeper/zookeeper345_1/bin/zkserver.sh status;;
          Restart) su zookeeper/home/zookeeper/zookeeper345_1/bin/zkserver.sh restart;;
              *)  echo "Require Start|stop|status|restart"  ;;
Finally use Chkconfig-add zookeeper1 to increase service. It's done. Note Use SU zookeeper to switch to zookeeper user.
If you want to configure the upstart mode of startup, you can refer to: http://blog.csdn.net/hengyunabc/article/details/18967627

zookeeper Virt virtual memory footprint is too large:

This is related to the implementation of zookeeper, refer to here: http://zookeeper-user.578899.n2.nabble.com/setting-zookeeper-heap-size-td6983511.html

The line of Zookeeper's virt has more than 30 g, looked at data, Datalog, a total of only hundreds of M. But there has been no problem.

Question of unreasonable length: https://issues.apache.org/jira/browse/ZOOKEEPER-1513
The current line is 345 version, and zookeeper last release version is this, has not been updated for about a year.
This problem may occur when the client tries to put more than 1M of data on the zookeeper.
To modify this default configuration, you can modify the "Jute.maxbuffer" environment variable. Reference: http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html
But we're on the line because of the port scan tool, which is pretty weird. After you stop the port scan tool, you don't have the problem.

watches Number of problems: Dubbo for each node will watch, resulting in watch number of many, casually thousands of.
Use WCHS,WCHC,WCHP these commands to view the watches information, including the total number of watch on each path. Every client's.

lookup failed to start reason:

Zookeeper will have many reasons to start unsuccessfully, you can pass:

./zkserver.sh Start-foreground
To see what the startup times are, and to see the exceptions during the run.
In addition, by:
./zkserver.sh Print-cmd
You can view the various parameters of zookeeper startup, including Java paths, and also make it easy to find problems.

To Configure automatic cleanup logging:

Starting with 3.4.0, the log is automatically cleaned, so this is usually not configured.

Configure Autopurge.snapretaincount and Autopurge.purgeinterval parameters.
The number of snapshop reserved, the default is 3, and the smallest is 3.

Reference here: http://nileader.blog.51cto.com/1381108/932156

Also note that the Zookeeper reboot automatically clears the Zookeeper.out log, so if you have an error, be careful to back up the file first. Configure Zookeeper.out location and log4j scrolling log output

Today, we found that the bin/zookeeper.out on the line has a 6G size. Looking at the next zkserver.sh code, this zookeeper.out is actually nohup output.

And the output of nohup is actually stdout,stderr output, so it is also the problem of Zookeepe's own log configuration.

After studying the bin/zkserver.sh and conf/log4j.properties, we found that zookeeper is actually a log-related output configuration, as long as the relevant variables are defined.

The main zoo_log_dir and Zoo_log4j_prop are the two environment variables:

In the zkserver.sh:

if [!-W "$ZOO _log_dir"]; Then
mkdir-p "$ZOO _log_dir"

_zoo_daemon_out= "$ZOO _log_dir/zookeeper.out"

    nohup $JAVA "- Dzookeeper.log.dir=${zoo_log_dir} "-dzookeeper.root.logger=${zoo_log4j_prop}" \
    -cp "$CLASSPATH" $JVMFLAGS $ Zoomain "$ZOOCFG" > "$_zoo_daemon_out" 2>&1 </dev/null &

In the log4j.properties:

Add Rollingfile to Rootlogger to-get log file output
#    log DEBUG level and above messages to a log File
log4j.appender.rollingfile.threshold=${ Zookeeper.log.threshold}

and zkserver.sh will load zkenv.sh.

So, in fact, modify the next bin/zkenv.sh can be:


if ["x${zoo_log_dir}" = "x"]
    zoo_log_dir= "$ZOOBINDIR/.. /logs "
If [" x${zoo_log4j_prop} "=" x "]
    zoo_log4j_prop=" Info,rollingfile "

You can also modify the following conf/log4j.properties to set up a scrolling log of up to 10:

# Max log file size of 10MB
# Uncomment the next line to limit number of backup Files

Too Many connections from error

This error is because the number of zookeeper socket connections for the same IP is greater than 60. The zookeeper server restricts up to 60 connections per IP by default.

This occurs on the test server because too many processes are running on the test server.

Modified to:


Http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_advancedConfiguration This zookeeper instance Error hint serving requests of is not currently

This error message occurs when the node in the cluster is left only one, or less than half.

Typically, this error is reported when only the first zookeeper is started.

In the Zookeeper server log, there will be similar logs:

Exception causing close of sessions 0x0 due to Java.io.IOException:ZooKeeperServer not running
Zookeeper connection speed is very slow, Dubbo initialization is very slow, the application starts very slow problem

After discovering that the offline environment was migrated to the new machine, the application startup became slow, and the application started more than 10 seconds ago, turning into a few minutes to start.

There is no error in the boot process, but the Dubbo registration information log has been slow to brush.

began to suspect that the network problem, but checked the iptables did not open, with Iptraf view flow, is not high. The machine has enough free memory.

Then check the configuration of zookeeper, disk space, the application of the Dubbo configuration, JVM configuration, found that there is no problem.

No way, with Jprofiler to test, found "org." I0itec.zkclient.zkclient$1.call ", this call takes a lot more time.

This confirms that the zookeeper itself is slower, not an application issue.

Using the following Zookeeper benchmark tool to test the performance, found that read speed is also possible, create/write speed is very slow, QPS only single-digit digits.

Then asked the next operation and maintenance colleagues, the original new machine is used to share the disk, so the speed is very slow.

and zookeeper every write request to write to log logs, and brush to disk, so very slow.

Later, the operation of the colleagues to change to the local disk, everything back to normal.

Management tools:

Zookeeper's official administrator tool: http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html official command-line tools are capable of doing most of the work.

zktop https://github.com/phunt/zktop python is a very interesting little tool.


Project Address: Https://github.com/alibaba/taokeeper
Taobao produced a monitoring tool, but also can be used to monitor the function of the script. Although open source, but in fact difficult to use, the code is difficult to expand, and some of the jar bag is inside Taobao.
I modified the following, can be used normally, the code address in: Https://github.com/hengyunabc/taokeeper
But we do not use this online, only Zabbix monitoring on the line.
Installation Configuration reference:

compiling 1. Download these two items:
git clone https://github.com/hengyunabc/common-toolkit.git
git clone https://github.com/nileader/zkclient.git

Perform mvn-dmaven.test.skip install separately
2. Download this item:
git clone https://github.com/hengyunabc/taokeeper.git
Perform Mvn-dmaven.test.skip clean package
The generated war package is visible to the taokeeper-monitor/target/directory.

DeployTaokeeper uses the MySQL database to save some configuration and logs.
Import the Taokeeper-build/etc/taokeeper.sql file or download it from: File: Taokeeper.sql.zip.
Configure Tomcat startup parameters to increase JVM startup parameters:
Java_opts=-dconfigfilepath= "~/taokeeper/taokeeper-monitor-config.properties"
and configure the parameters in the configuration above, for example:
#SSH Account of ZK server
Where the SSH user password is to be configured, the zookeeper deployed machine will open SSH service.
Change the generated war package to Root.war and put it into Tomcat's WebApps directory to start Tomcat.
If you report log4j errors, you also configure the Webapps/root/web-inf/classes/log4j.properties file. You can also modify them before you make them.
Open the http://localhost:8080/, you can see the Taokeeper interface.
Working principleTaokeeper is connected to the zookeeper deployed machine via SSH, and then executes zookeeper four letter words to get the statistics, then the analysis is saved to the MySQL database.
Reference: Http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkCommands
Monitoring the target machine load, but also through the SSH connection to the target machine, and then execute the top commands, and then analyze the data.

Attention MattersUnder the Chrome browser, the "Machine monitor" feature sometimes displays the information below the browser and pulls it to the end to see that it is not working.


This is a monitoring tool from Netflix, but it's really hard to use. The main function of Exhibitor is to monitor the zookeeper service of the machine, which can automatically restart the Zookeeper service.
regularly back up data;
Regular cleaning of zookeeper log;
Provides a web interface to modify zookeeper data;
Exhibitor InstallationExhibitor offers three ways to run: A stand-alone JAR file, a war package, and a core jar. It is recommended to run in jar mode, and configuration management is very convenient.
Installation method can refer to here: Https://github.com/Netflix/exhibitor/wiki/Building-Exhibitor, you can also download the compiled jar file: file: Exhibitor-war-1.0-jar-with-dependencies.zip, after downloading to modify the suffix to jar.
RunJava-jar <path>/exhibitor-xxx.jar-c File
Exhibitor automatically creates configuration files, and configuration changes made in the Web interface are saved to exhibitor.properties.
Configuration ItemsReference: Https://github.com/Netflix/exhibitor/wiki/Configuration-UI
When configuring the "Servers" parameter, be sure to be aware that you are configuring hostname, not IP. So if the configuration is IP, be sure to go to the target machine to check hostname and IP is consistent.
Attention MattersExhibitor uses the JPS command to determine if the zookeeper service is running, so configure the JPS command, and if no JPS command is currently available, you can create a soft link by using a command similar to the following:

Exhibitor will automatically create and overwrite the zookeeper configuration file, so configure all the zookeeper parameters on the Web interface.
Otherwise, if the zookeeper is exhibitor restarted, you may receive a condition that is unable to start because of a configuration error.

In the Control Panel, when Green is displayed, the Zookeeper service is normal and can be served externally, when the display is yellow or red,
Then zookeeper cannot provide services externally (the existence of this and zookeeper process is two concepts that may not be able to provide services even if the zookeeper process exists).

Exhibitor will periodically detect if the zookeeper service is normal, but the time interval is configured by default to 0, which causes the machine CPU to be consumed. To configure the Live Check (ms) parameter in the Web interface.

Because Exhibitor will automatically start the zookeeper process if it detects that the zookeeper service is not started, stop exhibitor before upgrading zookeeper.
some of the other stuff:

performance Test Related:

The output of this tool is rather messy, but it's good to use.
mvn-dzookeeperversion=3.4.5 Package
./runbenchmark.sh Test
Then, under the Test folder, there will be the generated information. Mainly in Zk-benchmark.log this document.

Http://zookeeper.apache.org/doc/r3.4.5/zookeeperOver.html http://wiki.apache.org/hadoop/ZooKeeper/ Servicelatencyoverview has a little in the document, but it doesn't seem to have been updated.
A test in Http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview Hadoop
A test of http://rdc.taobao.com/team/jm/archives/1070 Taobao

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.