Translated from: https://github.com/nathanmarz/storm/wiki/setting-up-a-storm-cluster.
This article describes how to build and run a storm cluster. If you use AWS, you can take a look at the storm-deploy project. The storm-deploy project enables installation on Amazon EC2 and full automation of storm cluster configuration. This article also helps you configure Ganglia to monitor the usage of CPU, hard disk, and network resources.
The following are the main steps to build a storm cluster:
- Build a ZooKeeper cluster.
- Install all software on Nimbus and all work machines.
- Download and decompress the storm release on Nimbus and all working machines.
- Configure storm. yaml.
- Use the storm script to start all necessary processes (nimbus, supervisor, worker ).
Build a ZooKeeper Cluster
Storm uses zookeeper to coordinate the entire cluster, but it should be noted that storm does not use zookeeper to transmit messages. Therefore, the load on zookeeper is very low. The zookeeper of a single node is sufficient in most cases. However, if you want to deploy a large storm cluster, then you need a larger zookeeper. For more information about how to deploy zookeeper, see here.
Some notes about how to deploy zookeeper:
- It is very important to monitor zookeeper. zookeeper is a fail-fast system and will exit if any errors occur. Therefore, you must monitor it. For more details, see here.
- You must configure a cron job to compress data and business logs of zookeeper. Zookeeper itself will not compress these files, so if you do not set a cron job, you will soon be out of use on the disk. For more details, see here.
Install necessary software on Nimbus and Work Machine
Next, you need to install Nimbus and some storm software on the worker machine.
- ZeroMQ 2.1.7
- JZMQ
- Java 6
- Python 2.6.6
- Unzip
The above also lists the software versions on which storm depends. Different versions may not run.
If you have problems installing ZeroMQ and JZMQ, you can check the installation dependencies.
Download and decompress the storm release on Nimbus and Working Machine
Next, download the storm release and decompress it. The released version of storm can be found here.
Configure storm. yaml
The storm release contains some configuration information in conf/storm. yaml. You can see the default configuration here. Storm. yaml has a higher priority than default. xml. The following configurations are required to run the storm cluster:
1. storm. zookeeper. servers: configure the zookeeper cluster address used by the storm cluster, for example:
123 |
storm.zookeeper.servers:
- "111.222.333.444"
- "555.666.777.888" |
2. storm. local. dir Nimbus and Supervisor need a directory on the local disk to store some status information (jar packages, configuration files, and so on). You should create the directory on each machine, assign the correct permissions, for example:
1 |
storm.local.dir: "/mnt/storm" |
3. java. library. path: the load address of the local dependency (ZeroMQ and JZMQ) on which storm depends. The default value is/usr/local/lib:/opt/local/lib:/usr/lib, in most cases, this configuration is correct, so you do not need to change it.
4. the addresses of all worker nodes of nimbus. host are needed so that they can know where to get the jar package and configuration file:
1 |
nimbus.host: "111.222.333.44" |
5. supervisor. slots. ports for each worker machine, this configuration specifies how many worker processes are running on this worker machine, and each process uses an independent port to receive messages. This configuration also specifies which ports are used. If you define five ports here, storm will allocate up to five worker processes on this machine. If three ports are allocated, a maximum of three processes can be allocated. The default configuration is four:
12345 |
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703 |
Start the daemon process
The last step is to start all the daemon processes of storm. As mentioned above, every process must be monitored! Storm is a fail-fast system that exits when unexpected errors occur. Storm is designed to exit safely at any time and restart properly at any time, which is why storm is not stored in the thread-if Nimbus and Supervisor restart, the running topology is not affected. The following describes how to start these threads:
- Nimbus runs bin/storm Nimbus on the nimbus Machine
- The Supervisor runs bin/storm supervisor on each worker machine. The supervisor is responsible for starting and terminating the worker processes on the worker machine.
- UI storm UI is a website that can view the running status of storm. It runs through the bin/storm ui. Access address: http: // {nimbus. host}: 8080 /.
As you can see, running a storm cluster is simple. These processes will log in the logs directory of the storm release directory.
Recommended reading:
Twitter Storm installation configuration (cluster) Notes
Install a Twitter Storm Cluster
Notes on installing and configuring Twitter Storm (standalone version)
Storm practice and Example 1