Twitter Storm: Running topology on a production clusterposted on October 07, 2011 by xumingming Author: xumingming | may be reproduced, but must be in the form of hyperlinks to indicate the original source and author information and copyright notice
URL: http://xumingming.sinaapp.com/185/twitter-storm-running on a production cluster topology/
This article is translated from: Https://github.com/nathanmarz/storm/wiki/Running-topologies-on-a-production-cluster.
Running topology on a production cluster is similar to local mode. Here are the steps:
1) Define topology (if Java is the case, use Topologybuilder)
2) Use Stormsubmitter to submit the topology to the cluster. Stormsubmitter parameters are: Topology's name, topology's configuration object, and topology itself. Like what:
Help
12345 |
CONFIG CONF =  new config (); conf.setnumworkers ( 20 ); conf.setmaxspoutpending ( 5000 stormsubmitter.submittopology ( conf, topology); |
3) Create a jar package that contains your program code and the dependent packages that your code relies on (the jar packages for Storm are not included, and these jar packages are automatically added to the classpath on the work node). If you use Maven, then the plugin: Maven Assembly plugin can help you pack, just add the following configuration to your pom.xml.
Help
01020304050607080910111213 |
<
plugin
>
<
artifactId
>maven-assembly-plugin</
artifactId
>
<
configuration
>
<
descriptorRefs
>
<
descriptorRef
>jar-with-dependencies</
descriptorRef
>
</
descriptorRefs
>
<
archive
>
<
manifest
>
<
mainClass
>com.path.to.main.Class</
mainClass
>
</
manifest
>
</
archive
>
</
configuration
>
</
plugin
>
|
Then run the MVN assembly:assembly to pack. Again, don't include storm-related jar packages, they're automatically added to Classpath.
4) Use the Storm client to submit the jar package:
Help
1 |
storm jar allmycode.jar org.me.MyTopology arg1 arg2 arg3 |
The storm jar submits the code to the cluster and configures the Stormsubmitter class to communicate with the correct cluster. In this example, after uploading the jar package, the Storm jar command calls Org.me.MyTopology's main function, with the parameters arg1, arg2, Arg3. about how to configure your storm client to communicate with the storm cluster can be seen in configuring the Storm development environment.
Common configuration
There are many topology-level configurations that can be set. There is a checklist for all configurations, and the configuration that starts with "topology" is the topology level configuration, which overrides the global level. Here are some of the more common:
1)config.topology_workers: This setting uses the number of worker processes to perform this topology. For example, if you set it to 25, there will be a total of 25 Java processes in the cluster to execute all of the topology's tasks. If all the components in your topology add up to a total of 150 parallelism, then there will be 6 threads in each process (150/25 = 6).
2)config.topology_ackers: This configuration sets the number of Acker threads. Ackers is part of the Storm's reliability API, and the reliability API for Storm can be seen: How Twitter storm ensures that messages are not lost.
3)config.topology_max_spout_pending: This setting is a SPOUT task up to the maximum number of non-processed tuple (no ack/failed) reply, we recommend you set this configuration, To prevent the tuple queue from exploding.
4)Config.topology_message_timeout_secs: This configures the timeout period for a tuple of storm-a tuple that exceeds this time is considered to be processing failed. The default setting for this setting is 30 seconds, which is sufficient for most topology. The reliability API for storm lets you see how Twitter storm ensures that messages are not lost.
5)config.topology_serializations: In order to use custom types within your tuple, you can register your custom serializer with this configuration.
Terminate a topology
To terminate a topology, execute:
Help
where {stormname} is the name specified when the topology is submitted to the Storm cluster.
Storm will not terminate topology immediately. Instead, it terminates all spout, allowing them to no longer launch any new tuple, and Storm will wait config.topology_message_timeout_secs seconds before killing all the work processes. This will give topology enough time to complete all the tuples that we have not completed when we executed the Storm kill command.
To update a running topology
In order to update a running topology, the only option is to kill the running topology and then resubmit the new one. A planned command is to implement a storm swap command to update topology at run time, and to ensure that the front and back two topology do not run simultaneously, while ensuring that the replacement causes the least "downtime".
Monitoring topology
The best way to monitor topology is to use the Storm UI. The Storm UI provides statistical information about the errors that occur within a task and the throughput and performance of each component within the topology. At the same time you can look at the logs on the working machine in the cluster.
"Go" Twitter Storm: Running topology on a production cluster