This blog post details how to install the storm local development environment, which consists of two steps:
1. Download the storm release package from the official website, decompress the package, and add the decompressed bin directory to the environment variable (PATH, to facilitate subsequent execution of storm-related commands
2. Modify the storm configuration file (storm. yaml) to update the cluster information in the configuration file according to the actual situation, and then add the modified configuration file to the directory (~ /. Storm/) to remotely Start and Stop computing tasks on the cluster (that is, topology)
Next, we will detail each operation step.
First, what is the storm development environment?
Storm has two modes: Local Mode and remote mode. In local mode, we can fully develop and test computing tasks on the local machine. In remote mode, we submit computing tasks to the storm cluster and execute them on the remote cluster.
Various components are installed in the storm development environment. Therefore, you can develop and test storm topology in local mode, compile and package computing tasks, and submit them to a remote cluster for execution, submit a task to a remote cluster or stop a running task.
Why is the local mode so powerful? Let's quickly review the relationship between the local host and the remote cluster. Generally, a storm cluster is managed by a master node called "Nimbus". The local host communicates with Nimbus to complete the code (it needs to be packaged into a jar package) submit and control the execution of tasks on the cluster. Nimbus is responsible for the balanced distribution of code on the cluster and the allocation of worker (in fact, this task is completed by Nimbus through the supervisor) to run the submitted computing tasks. The local host uses the command line client (called the Storm Command) to communicate with nimbus. This command line is used only when it interacts with a remote cluster, it is not used in local mode. (As described in the official document, the blogger believes that the actual execution can be more flexible, that is, the local mode can also run tasks through storm commands, topology code controls the local mode or cluster mode. For details, see. Of course, this is officially because another method is recommended for running tasks in local mode. The details are described below.).
Figure 1 task running mode example
In Figure 1, the Code part marked by red number 1 runs in local mode, and the code part marked by red number 2 runs in cluster mode. Both methods can be executed through the command line (storm command.
If you want to submit tasks locally to the remote cluster and control them, what do we need to do on the local machine? It is easy to install the storm release package on the local machine. After the installation is complete on the local machine, we have the command line client (something called "Storm") and can use other remote clusters for interaction. The installation steps are as follows:
1. Download the release package of the corresponding version from storm. If you do not know it, you canClick here
2. After the download is completed, decompress the package in the local task directory.
3. Add the decompressed bin directory to your environment variable (PATH) and ensure that the script (bin/storm) has executable permissions.
Storm Environment is installed on the local machine to interact with remote clusters. for students who want to develop and test computing tasks locally, we strongly recommend that you include the storm dependency package in the packaging project, so that you can run computing tasks using Java commands, for details about how to package Maven and run jar using Java commands, see. (This is the local running mode officially recommended, but the disadvantage is that two packages are required for cluster mode and local mode, which is troublesome to operate; the advantage of Figure 1 is that the cluster mode adopts the same package as the local mode, and does not need to be packaged separately)
At this point, the content of the task submitted to the remote cluster has come to an end, and it is time to start and stop the computing task.
The above steps show that the local installation of storm is to use the command line client to interact with the cluster, then we have to tell the cluster address of the client to interact with it. How to complete this step? The operation is also very simple, as follows:
1. Modify the storm configuration file (storm. yaml), add the host address of nimbus (either the host name or IP address, and configure the ing information for the host name in the local hosts). The configuration method is as follows:
nimbus.host: "123.45.678.890"
2. Add the modified configuration file to the directory (~ //. Storm /)
For environment installation, there is another option:
We can use the one-click Installation Tool (storm-deploy) to install storm in AWS. This tool is automatically installed in the directory (~ /. Storm. yaml file. We can use the attach command to inform the client of the cluster information to connect to (this command is very convenient for switching between multiple clusters). The command is as follows:
lein run :deploy --attach --name mystormcluster
For more information about storm-deploy, see Project
Wiki