Let's take a holistic look at the steps to build a storm cluster:
- Setting up zookeeper clusters
- Installation relies on all Nimbus and worker nodes
- Download and unzip the storm release version to all Nimbus and worker nodes
- Configure Storm.yaml
- Start the related background process
1 Configuring the Zookeeper cluster firstWe know that Storm uses zookeeper to coordinate the entire cluster. Zookeeper is not used for message delivery, so storm does not put a lot of pressure on zookeeper. Single-node zookeeper can be a good fit in most situations, but if you want to get better reliability or deploy a large cluster, you might need a large zookeeper cluster. About the deployment of zookeeper here is not much to do, please refer to: here. About the deployment of zookeeper add two points here:
- It is critical to run zookeeper under a monitoring process, because zookeeper is a fail-fast process that automatically exits when it encounters any errors, please refer to: here
- Timing to compress and transfer zookeeper data is also critical, because Zookeeper does not have a compression and clear data mechanism, if we do not set up a cron to manage this data, zookeeper generated data will quickly fill the disk, please refer to: here
PS: If zookeeper fails to start, look at the Zookeeper.out file in its bin directory, and configure its myid to try. 2 installation dependent on the Nimbus and worker nodes storm needs to rely on:
- Java 6
- Python 2.6.6
It is important to note that storm has tested most versions of dependencies, but Storm does not guarantee that dependencies on any version will work correctly.
3 Downloads extract storm release to Nimbus and worker nodes the next step is to download and extract the storm compressed files to each machine, and the Storm release version can be downloaded from here. 4 Configuring the Storm.yaml fileStorm has a file Conf/storm.yaml, which is the storm's configuration file. All the default values for this file can be obtained from hereHere. The configuration in the Storm.yaml overrides the configuration in the Default.yaml. Configure the configuration that a cluster must modify below:1) Storm.zookeeper.servers: Configure a list of zookeeper clusters
Storm.zookeeper.servers: -"111.222.333.444" -"555.666.777.888"
If your zookeeper cluster uses a port other than the default port, then you have to configure Storm.zookeeper.port. 2) Storm.local.dir:storm's Nimbus and work processes require a directory to hold a small subset of state data, such as jars, Confs, and so on. We need to create this directory on each machine and give it the appropriate permissions.
Storm.local.dir: "/mnt/storm"
3) The Nimbus.host:worker node needs to know which machine is the master node so that it can download jars and confs from the maser node.
Nimbus.host: "111.222.333.44"
4) Supervisor.slots.ports: For each worker machine, it determines how many worker processes this machine can run altogether. Each worker process takes a single port to receive the message, which is the configuration of which ports are assigned to the worker process. If you configure 5 ports here, Storm will be able to allocate 5 worker processes to this machine, and if you configure 3 ports, Storm can only allocate 3 worker processes. Storm allocates 4 worker processes to the 6700,6701,6702,6703 port by default. For example:
Supervisor.slots.ports: -6700 -6701 -6702 -6703
5 running storm-related daemons with the Storm command the last step is to start all storm-related daemons. Of course, it is necessary to include these processes under the supervision process. Storm is also a fail-fast system, which means that these processes terminate as soon as they encounter an exception. Storm is designed so that it can safely terminate at any time and resume when the process restarts. This is why storm does not save the related state in the process, and if the Nimbus or supervisor node restarts, the running topoloies will not be affected. Here's the command to start the storm-related process:
- Nimbus: Run "Bin/storm Nimbus" on the master node
- Supervisor: Run "Bin/storm Supervisor" on each worker node, the Supervisor process is responsible for starting and stopping the worker process on the worker node
- UI: Run the Bin/storm UI, a tool that manages and displays the running state of a storm cluster through a page that can be accessed through "Http://nimbus host:8080".
We can see that starting the Storm service process is fairly straightforward, the log generated by storm is stored in the Storm/logs directory of each machine, and Storm manages its logs through Logback. We can change its log directory and content by modifying its logback.xml file.
Storm cluster deployment and configuration process detailed