Storm Cluster Installation configuration

Last Update:2016-03-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article introduces the installation configuration method of Storm cluster in detail. If you need to install Storm on AWS, you should look at the Storm-deploy project. Storm-deploy can automatically complete the preparation, configuration, installation of the Storm cluster on E2, and also set up the Ganglia to facilitate monitoring of CPU, disk and network usage information.

If you're having trouble using Storm clustering, check the "problem and resolution" article for a solution. If you don't see a valid workaround, send a message about the problem to the community's mailing list.

Here are the steps to install Storm:

Install ZooKeeper cluster;
Install the dependent components required to run the cluster on each machine;
Download the Storm installer and unzip it to the various machines on the cluster;
Add the cluster configuration information in the Storm.yaml;
Use the "Storm" script to start each of the machine daemon processes.

Installing the ZooKeeper Cluster

Storm uses ZooKeeper to ensure cluster consistency. ZooKeeper is not used for messaging in a cluster, so Storm's load on ZooKeeper is fairly low. While a single point of ZooKeeper is barely enough in most scenarios, if you need a more reliable HA mechanism or need to deploy a large Storm cluster, you might want to configure a ZooKeeper cluster. Please refer to this article for the deployment instructions for the ZooKeeper cluster.

A few notes on ZooKeeper deployment:

The ZooKeeper must be running in monitoring mode. Because ZooKeeper is a fast-failing system, the ZooKeeper service shuts down proactively if a failure is encountered. Please refer to this article for more details.
You need to set up a cron service to compress ZooKeeper data and transaction logs at timed intervals. Because ZooKeeper's background process does not handle this problem, if you do not configure Cron,zookeeper, the log will quickly fill up disk space. Please refer to this article for more details.

Install the necessary dependent components

Next you need to install the necessary dependent components on all the machines in the cluster, including:

Java 6 (Recommended for JDK 7 or later-translator note)
Python 2.6.6 (Python 2.7.x version recommended-translator note)

These are the versions that were tested on Storm. Storm does not guarantee support for other versions of Java or Python.

Download Storm Installer and unzip

The next step is to download the required Storm release and unzip the zip installation file to the machines in the cluster. Storm's release can be downloaded here (recommended for download on the storm's official download page using Apache's image service-the translator's note).

Configure Storm.yaml

Storm's installation package contains a file in the Conf directory that storm.yaml is used to configure various properties of the storm cluster. You can see the default values for each configuration item here. Storm.yaml overrides the default values for each configuration item in the Defaults.yaml. Here are a few options that you must configure when you install a cluster:

1) Storm.zookeeper.servers: This is the address list of the storm-associated zookeeper cluster, and the configuration for this item is as follows:

storm.zookeeper.servers:  - "111.222.333.444"  - "555.666.777.888"

Note that if the port of the ZooKeeper cluster you are using is not the default port, you will also need to configure Storm.zookeeper.portaccordingly.

2) Storm.local.dir: Nimbus and Supervisor background processes require a directory to hold some state data (such as jar packages, configuration files, and so on). You can create this directory on each machine, give the appropriate read and write permissions, and write the directory to the configuration file as follows:

storm.local.dir: "/mnt/storm"

3) Nimbus.host: The working node of the cluster needs to know which machine in the cluster is the host, so that the topology and configuration files can be downloaded from the host as follows:

nimbus.host: "111.222.333.44"

4) Supervisor.slots.ports: You need this configuration item to configure the number of worker processes (worker) that each supervisor machine can run. Each worker needs a separate port to receive the message, and this configuration item defines the list of ports that the worker can use. If you define 5 ports here, Storm will allocate up to 5 workers on that machine. If you define 3 ports, Storm will run at most three workers. The default value for this entry is 6700, 6701, 6702, and 67,034 ports, as follows:

supervisor.slots.ports:    - 6700    - 6701    - 6702    - 6703

Configure external libraries and environment variables (optional)

If you need to use some external libraries or customize the functionality of the plugin, you can put the relevant jar package into the extlib/ extlib-daemon directory. Note that the extlib-daemon directory is used only for the jar packages needed to store the background process (nimbus,supervisor,drpc,ui,logviewer), such as HDFS and the custom Dispatch library. In addition, you can use STORM_EXT_CLASSPATH and STORM_EXT_CLASSPATH_DAEMON two environment variables to configure the classpath of the normal external library and the external library only for background processes.

To start a background process using the Storm script

The final step is to start all the Storm daemon processes. Note that these processes must be run under strict monitoring. Because Storm is a fast-failing system similar to ZooKeeper, its processes can easily be terminated by various exception errors. This pattern is designed to ensure that the Storm process can safely stop at any time and resume its journey after the process restarts. This is why Storm does not save any state during processing-in this case, if there is a Nimbus or Supervisor reboot, the running topology will not be affected. Here's how to start a background process:

Nimbus: On the master machine, execute the command under monitoring bin/storm nimbus .
Supervisor: On each work node, execute the command under monitoring bin/storm supervisor . The Supervisor background process is primarily responsible for starting/stopping worker processes on the machine.
UI: On the master machine, execute a command under monitoring to bin/storm ui start the Storm UI (the Storm UI is a site that can easily monitor cluster and topology health in a browser) background process. You can http://{nimbus.host}:8080 access the UI site through.

As you can see, starting a background process is very simple. Also, each background process logs log information to the logs/directory of the Storm installer.

Storm Cluster Installation configuration

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Storm Cluster Installation configuration

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support