Storm Configuration Instructions

Source: Internet
Author: User

What is Storm?

Storm is a set of real-time data processing frameworks for Twitter's open source, which allows you to implement real-time processing of data streams through simple programming.


STORM's configuration files are typically stored under $storm_home/conf, usually named Storm.yaml, which conforms to the YAML format requirements.


Configuration Items in detail:

Here are all the storm-supported configuration items collected from Storm's config class (Based storm 0.6.0):


Configuration item configuration instructions

Storm.zookeeper.serversZookeeper Server list

Storm.zookeeper.portZookeeper connection port

Storm.local.dirLocal file system directory used by storm (must exist and the storm process can read and write)

Storm.cluster.modeStorm cluster operating mode ([distributed|local])

STORM.LOCAL.MODE.ZMQwhether to use ZEROMQ as the message system in local mode, or use the Java message system if set to false. Default is False

Storm.zookeeper.rootRoot location of storm in zookeeper

Storm.zookeeper.session.timeout Client Connection zookeeper time-out

The ID of the topology in the Storm.id run, consisting of storm name and a unique random number.

Nimbus.hostNimbus Server address

Nimbus.thrift.portNimbus Thrift Listening Port

Nimbus.childopts JVM Options assigned to the Nimbus process by Storm-deploy project deployment

Nimbus.task.timeout.secs Heartbeat Timeout, Nimbus will consider the task dead and assigned to another address after timeout.

Nimbus.monitor.freq.secsNimbus Check the time interval for heartbeat and reassignment tasks. Note If the machine is down, the Nimbus will take over and handle it immediately.

Nimbus.supervisor.timeout.secsSupervisor's heartbeat timeout, once more than Nimbus would consider the supervisor dead and stop distributing new tasks for it.

Nimbus.task.launch.secsA special time-out setting when a task is started. This value is used to temporarily replace Nimbus.task.timeout.secs before the first heartbeat after startup.

Nimbus.reassign Nimbus whether to reassign execution when a task is found to fail. The default is true and is not recommended for modification.

Nimbus.file.copy.expiration.secsNimbus determines the time-out of the upload/download link and Nimbus considers the link to be dead and actively disconnected when the idle time exceeds the setting

Ui.portThe service port of the Storm UI

Drpc.serversdrpc server list so drpcspout know who to communicate with

Drpc.portStorm Drpc's service port

Supervisor.slots.portsA list of ports on the supervisor that can run workers. Each worker occupies one port, And only one worker is running per port. This configuration allows you to adjust the number of workers running on each machine. (Adjust slots/per machine)

Supervisor.childopts used in the Storm-deploy project to configure the JVM options for the Supervisor daemon

Theworker heartbeat timeout in Supervisor.worker.timeout.secs supervisor, once timed out supervisor attempts to restart the worker process.

Supervisor.worker.start.timeout.secsSupervisor Initial startup, the worker's heartbeat timeout is exceeded, and supervisor attempts to restart the worker when it exceeds that time. Because of the additional consumption of the JVM's initial boot and configuration, the first heartbeat will exceed the Supervisor.worker.timeout.secs setting

Supervisor.enableSupervisor should run the workers assigned to him. The default is true, which is used for unit testing of storm and should not normally be modified.

Supervisor.heartbeat.frequency.secsSupervisor Heartbeat Send frequency (how often sent)

Supervisor.monitor.frequency.secsSupervisor Check the rate of the worker's heartbeat

Worker.childoptsSupervisor the JVM option to use when starting the worker. All "%id%" strings are replaced with the identifier of the corresponding worker

Worker.heartbeat.frequency.secsHeartbeat send time interval for worker

Task.heartbeat.frequency.secsTask reporting status heartbeat time interval

Task.refresh.poll.secsThe frequency of link synchronization between task and other tasks. (If the task is reassigned, other tasks will need to refresh the connection to send it a message). In general, other tasks are understood to be notified when redistribution occurs. This configuration is only intended to prevent non-notification situations.

Topology.debug If set to True,storm, each message emitted will be recorded.

Topology.optimizeMaster is the right time to optimize topologies by running multiple tasks within a single thread.

Topology.workers The number of processes that should be started in the topology cluster. Components within each process that perform a certain number of tasks.topology in a thread are combined with this parameter and the degree of parallelism hint to optimize performance

Topology.ackersThe number of Acker tasks started in topology. Acker save a record of tuples sent by spout, and detects when a tuple is fully processed. When Acker detects that a tuple is processed, it sends a confirmation message to spout. Typically, the number of Acker should be determined based on the throughput of the topology, but it generally does not require much. When set to 0 o'clock, it is equivalent to disabling message reliability. Storm will confirm immediately after the spout sends the tuples.

Topology.message.timeout.secstopology The maximum processing time-out for spout messages sent. If a message is not successful within that time window Ack,storm will tell spout that the message failed. Some spout implement the failed message replay feature.

Topology.kryo.register A list of serialization schemes registered to Kryo (the storm's underlying serialization framework). A serialization scheme can be a class name, or the implementation of Com.esotericsoftware.kryo.Serializer.

Topology.skip.missing.kryo.registrationsStorm should skip the Kryo serialization scheme it does not recognize. If set to no task may mount failed or throw an error at run time.

Topology.max.task.parallelism The maximum component parallelism that can be allowed in a topology. This configuration is primarily used to test the limit of the number of threads in local mode.

Topology.max.spout.pending The maximum number of tuples in a spout task that is in pending state. The configuration applies to a single task instead of the entire spouts or topology.

Topology.state.synchronization.timeout.secs Maximum timeout time for component synchronization State source (retention option, not currently used)

Topology.stats.sample.rate Percentage of tuples sampled to generate task statistics

Whetherto use the Java serialization scheme in topology.fall.back.on.java.serialization topology

Zmq.threads The number of threads used for ZEROMQ communication in each worker process

Zmq.linger.millis When the connection is closed, the link attempts to resend the message to the target host for a duration of time. This is an uncommon advanced option that can be ignored.

Java.library.pathThe Java.library.path settings when the JVM starts, such as nimbus,supervisor and workers. This option tells the JVM which paths to locate the local library.


Storm Configuration Instructions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.