Install and deploy a storm Cluster [Details]

Source: Internet
Author: User
Tags zookeeper client
ArticleDirectory
    • 2.2.1 install zmq 2.1.7
    • 2.2.2 install jzmq
    • 2.2.3 install Java 6
    • 2.2.4 install python2.6.6
    • 2.2.5 install unzip

Author: those things in the big circle | the article can be reproduced. Please mark the original source and author information in the form of hyperlinks

Web: http://www.cnblogs.com/panfeng412/archive/2012/11/30/how-to-install-and-deploy-storm-cluster.html

Based on the official wiki of Twitter storm, this article describes in detail how to quickly build a storm cluster. The problems and experiences encountered in project practice are summarized, in the corresponding chapter, the "Notes" are given.

1. Storm cluster components

Storm clusters contain master nodes and work nodes ). The corresponding roles are as follows:

    • Runs a backend called Nimbus on the master node.ProgramIt is responsible for distribution within the storm ClusterCodeAssign tasks to the worker and monitor the running status of the cluster. Nimbus is similar to jobtracker in hadoop.
    • Each worker node runs a background program called a supervisor. The supervisor is responsible for listening to the tasks assigned to it from nimbus to start or stop the worker processes that execute the tasks accordingly. Each worker process executes a subset of a topology. A running topology consists of multiple worker processes distributed on different worker nodes.

Storm cluster components

All coordination between nimbus and supervisor nodes is achieved through the zookeeper cluster. In addition, both nimbus and supervisor processes are fast failures and stateless; all the statuses of storm clusters are either in the zookeeper cluster or stored on the local disk. This means that you can use kill-9 to kill nimbus and supervisor processes, and they can continue to work after restart. This design gives storm clusters incredible stability.

2. Install the storm Cluster

This section describes how to build a storm cluster. The installation steps are as follows:

    • Build a zookeeper cluster;
    • Install storm dependent libraries;
    • Download and decompress the storm version;
    • Modify the storm. yaml configuration file;
    • Start various background processes of storm.
2.1 build a zookeeper Cluster

Storm uses zookeeper to coordinate clusters. Because zookeeper is not used for message transmission, storm imposes a rather low pressure on zookeeper. In most cases, the zookeeper cluster of a single node is sufficient. However, to ensure fault recovery or deploy a large-scale storm cluster, the zookeeper cluster of a larger node may be required (for the zookeeper cluster, the minimum number of officially recommended nodes is 3 ). Complete the following installation and deployment steps on each machine in the zookeeper cluster:

1) download and install Java JDK. The official download link is http://java.sun.com/javase/downloads/index.jsp. JDK 6 or later.

2) based on the load of the zookeeper cluster, reasonably set the Java heap size to avoid swap as much as possible, resulting in a reduction in zookeeper performance. During the conservative period, machines with 4 GB of memory can allocate a maximum of 3 GB heap space for zookeeper.

3) decompress and install the zookeeper package. The official download link is http://hadoop.apache.org/zookeeper/releases.html.

4) Create the zookeeper configuration file zoo. cfg in the following format based on the zookeeper cluster node:

Ticktime = 2000Datadir=/Var/zookeeper/Clientport= 2181Initlimit= 5Synclimit= 2Server. 1 = zoo1: 2888: 3888Server. 2 = zoo2: 2888: 3888Server. 3 = zoo3: 2888: 3888

Datadir specifies the data file directory of zookeeper. id = Host: Port, ID is the ID of each zookeeper node, saved in the myid file under the datadir directory, zoo1 ~ Zoo3 indicates the hostname of each zookeeper node. The first port is the port used to connect to the leader, and the second port is the port used for leader election.

5) create a myid file under the datadir directory. The file contains only one row and the content is the ID number in server. ID corresponding to the node.

6) Start the zookeeper service:

 
Java-CP zookeeper. Jar: LibLog4j-1.2.15.jar: Conf\ Org. Apache. zookeeper. server. Quorum. quorumpeermain zoo. cfg

You can also useBin/zkserver. ShThe script starts the zookeeper service.

7) test whether the service is available through the zookeeper client:

    • Run the following command on the Java client:
 
Java-CP zookeeper. Jar: SRC/Java/lib/log4j-1.2.15.jar: Conf: SRC/Java/lib/jline-0.9.94.jar \ org. Apache. zookeeper. zookeepermain-server 127.0.0.1: 2181

You can also useBin/zkcli. ShThe script starts the zookeeper Java client.

    • Under the C client, enterSrc/CTo compile a single-threaded or multi-threaded client:
 
./Configuremake cli_stmake cli_mt

Run to enter the C client:

 
Cli_st 127.0.0.1: 2181Cli_mt127.0.0.1: 2181

Now, the zookeeper cluster has been deployed and started.

Note:

    1. Because zookeeper is a fast failure (fail-fast) and the process exits in case of any errors, it is best to manage zookeeper through the monitoring program, ensure that zookeeper can be automatically restarted after exiting. For more information, see here.
    2. During zookeeper running, many logs and snapshot files are generated under the datadir directory, while the zookeeper running process is not responsible for regular cleaning and merging of these files, resulting in a large amount of disk space. Therefore, you need to regularly clear useless logs and snapshot files through cron and other methods. For more information, see here. The command format is as follows: Java-CP zookeeper. jar: log4j. jar: conf org. Apache. zookeeper. server. purgetxnlog <datadir> <snapdir>-n <count>
2.2 install storm dependent Libraries

Next, you need to install the storm dependency library on Nimbus and supervisor machines as follows:

    1. Zeromq 2.1.7-do not use version 2.1.10 because some serious bugs in this version may cause strange problems during the running of storm clusters. A few users may encounter an "illegalargumentexception" exception in version 2.1.7, which can be fixed if it is downgraded to version 2.1.4.
    2. Jzmq
    3. Java 6
    4. Python 2.6.6
    5. Unzip

The above dependent library versions have been tested by storm, and storm cannot be executed in other Java or Python libraries.

2.2.1 install zmq 2.1.7

After the download, compile and install zmq:

WgetHTTP://Download.zeromq.org/zeromq-2.1.7.tar.gzTar-Xzf zeromq-2.1.7.Tar. Gzcd zeromq-2.1.7./ConfigureMakeSudo Make Install

Note:

 
1. If the uuid cannot be found during the installation process, use the following package to install the uuid Library:
SudoYumInstallE2fsprogsl-B CurrentSudoYumInstallE2fsprogs-devel-B Current
2.2.2 install jzmq

After the download, compile and install jzmq:

 
Git clone https://Github.com/nathanmarz/jzmq.gitCD jzmq./Autogen.Sh./ConfigureMakeSudo Make Install

To ensure that jzmq works properly, you may need to complete the following configurations:

    1. Correctly set java_home Environment Variables
    2. Install Java SDK
    3. Upgrade Autoconf
    4. For Mac OSX, refer to here.

Note:

1. If you run./ConfigureCommand error, referHere.

2.2.3 install Java 6

1. Download and install JDK 6. Refer to here;

2. Configure the java_home environment variable;

3. run Java and javac commands to test the normal installation of Java.

2.2.4 install python2.6.6

1. Download python2.6.6:

 
WgetHTTP://Www.python.org/ftp/python/2.6.6/python-2.6.6.tar.bz2

2. Compile and install python2.6.6:

Tar-Jxvf Python-2.6.6.Tar. Bz2cd Python-2.6.6./ConfigureMakeMake Install

3. Test python2.6.6:

 
$ Python-Vpython2.6.6
2.2.5 install unzip

1. If the RedHat series Linux system is used, run the following command to install unzip:

 
Apt-GetInstall Unzip

2. If you use the Debian Linux system, run the following command to install unzip:

YumInstall Unzip
2.3 download and decompress storm release

Next, install the storm release version on Nimbus and supervisor machines.

1. Download the storm release version. storm0.8.1 is recommended:

 
WgetHttps://Github.com/downloads/nathanmarz/storm/storm-0.8.1.zip

2. decompress the package to the installation directory:

 
UnzipStorm-0.8.1.Zip
2.4 modify the storm. yaml configuration file

There isConf/storm. yamlFile used to configure storm. The default configuration can be viewed here.Conf/storm. yamlThe configuration options in will overwriteDefault Configuration in ults. yaml. The following configuration options must be inConf/storm. yamlConfigured in:

1)Storm. zookeeper. Servers: The Zookeeper cluster address used by the storm cluster. The format is as follows:

 
Storm. zookeeper. servers:-"111.222.333.444"-"555.666.777.888"

If the zookeeper cluster is not using the default portStorm. zookeeper. PortOption.

2)Storm. Local. dir: Nimbus and supervisor processes are used to store a small number of States, such as local disk directories such as jars and confs. You need to create this directory in advance and grant sufficient access permissions. Configure the directory in storm. yaml, for example:

 
Storm. Local.Dir:"/Home/admin/storm/workdir"

3)Java. Library. Path: The loading path of the local library (zmq and jzmq) used by storm. The default value is "/usr/local/lib:/opt/local/lib:/usr/lib ", generally, zmq and jzmq are installed under/usr/local/lib by default, so you do not need to configure them.

4)Nimbus. Host: The address of the nimbus machine in the storm cluster. The worker nodes of each supervisor need to know which machine is Nimbus to download files such as jars and confs of topologies, for example:

 
Nimbus. HOST:"111.222.333.444"

5)Supervisor. Slots. Ports: For each worker node of the supervisor, you need to configure the number of workers that the worker node can run. Each worker occupies a separate port to receive messages. This configuration option defines which ports can be used by worker. By default, each node can run four workers on ports 6700, 6701, 6702, and 6703, respectively. For example:

 
Supervisor. Slots. ports:-6700-6701-6702-6703
2.5 start various background processes of storm

In the last step, start all background processes of storm. Like zookeeper, storm is also a fail-fast system, so that storm can be stopped at any time, and the process is correctly resumed after being restarted. This is also why storm is not saved in the process. Even if Nimbus or supervisors are restarted, the running topologies will not be affected.

The following describes how to start various background processes of storm:

    1. Nimbus: Run "bin/storm Nimbus>/dev/null 2> & 1 &" on the storm master node to start the nimbus background program and run it in the background;
    2. Supervisor: Run "bin/storm supervisor>/dev/null 2> & 1 &" on each storm worker node to start the supervisor background program and run it in the background;
    3. UI: Run "bin/storm UI>/dev/null 2> & 1 &" on the storm master node to start the UI background program and run it in the background. After the startup, you can use http: // {Nimbus Host}: 8080 observe the cluster's worker resource usage and the running status of topologies.

Note:

    1. After the storm background process is started, log files of each process are generated under the logs/subdirectory under the storm installation and deployment directory.
    2. After testing, the storm UI and storm Nimbus must be deployed on the same machine. Otherwise, the UI will not work properly because the UI process will check whether the native has a nimbus link.
    3. For ease of use, bin/storm can be added to system environment variables.

So far, the storm cluster has been deployed and configured. You can submit the topology to the cluster for running.

3. Submit a task to the Cluster

1) Start storm topology:

 
Storm jar allmycode. Jar org. Me. mytopology arg1 arg2 arg3

Among them, allmycode. jar is a jar package containing the topology implementation code, org. me. the main method of mytopology is the entrance of topology. arg1, arg2, and arg3 are Org. me. parameters that need to be passed in when mytopology is executed.

2) Stop storm topology:

 
StormKill{Toponame}

Here, {toponame} is the name of the topology task specified when topology was submitted to the storm cluster.

4. References

1. https://github.com/nathanmarz/storm/wiki/Tutorial

2. https://github.com/nathanmarz/storm/wiki/Setting-up-a-Storm-cluster

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.