Storm Buffer Settings

Source: Internet
Author: User

Citation: http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/

When you're optimizing the performance of your storm topologies it helps to understand how storm ' s internal message queue S is configured and put to use. In this short article I'll explain and illustrate how Storm version 0.8/0.9 implements the Intra-worker communication th At happens within a worker process and its associated executor threads.

Internal Messaging within Storm worker Processesterminology:i would use the terms messageand (Storm) tupleInterchangeably in the following sections.

When I say "internal messaging" I mean the messaging that happens within a worker process in Storm, which is communication That's restricted to happen within the same Storm Machine/node. For the communication Storm relies on various message queues backed by LMAX Disruptor, which are a high performance inter- Thread Messaging Library.

Note that this communication within the threads of a worker process was different from Storm ' sinter-worker communi cation, which normally happens across machines and thus over the network. For the latter storm uses ZeroMQ by default (in Storm 0.9 there are experimental support for Netty as the network messaging Backend). That is, Zeromq/netty was used when the a task in one worker process wants to send data to a task this runs in a worker proces s on different machine in the Storm cluster.

So for your reference:

    • Intra-worker Communication in Storm (Inter-thread on the same storm node): LMAX disruptor
    • Inter-worker communication (Node-to-node across the network): ZeroMQ or Netty
    • Inter-topology communication:nothing built into Storm, you must take care of this yourself with e.g. a messaging system s Uch as KAFKA/RABBITMQ, a database, etc.

If you don't know what the differences is between Storm ' s worker processes, executor threads and tasks please take a loo K at understanding the Parallelism of a Storm topology.

Illustration

Let us start with a picture before we discuss the nitty-gritty details on the next section.

Figure 1:overview of a worker's internal message queues in Storm. Queues related to a worker process is colored in red, Queues related to the worker ' s various executor threads is colored In green. For readability reasons I show only one worker process (though normally a single Storm node runs multiple such processes) And only one executor the thread within that worker process (of which, again, there is usually many per worker process). Detailed description

Now, you got a first glimpse of Storm ' s intra-worker Messaging setup We can discuss the details.

Worker processes

To manage it incoming and outgoing messages each worker process have a single receive thread this listens on the worker ' s TCP port (as configured via  supervisor.slots.ports ). The parameter  topology.receiver.buffer.size  determines The batch size that the receive thread Uses to place incoming messages into the incoming queues of the worker ' s executor threads. Similarly, each worker have a single send thread that's responsible for reading messages from the worker ' s transfer queue and sending them over the network to downstream consumers. The size of the transfer queue is configured via  topology.transfer.buffer.size .

    • the  topology.receiver.buffer.size  is The maximum number of messages that is batched tog Ether at once for appending to a executor ' s incoming queue by the worker receive thread (which reads the messages from th E Network) Setting This parameter too-cause a lot of problems ("heartbeat thread gets starved, throughput plummet S "). The default value is 8 elements, and the value must being a power of 2 (this requirement comes indirectly from LMAX disruptor ).
123 
// Example: configuring via Java APIConfig conf = new Config();conf.put(Config.TOPOLOGY_RECEIVER_BUFFER_SIZE, 16); // default is 8
Note thattopology.receiver.buffer.sizeIs in contrast to the other buffer size related parameters described in this article actually not configuring the size of An LMAX disruptor queue. Rather it sets the size of a simple ArrayList that's used to buffer incoming messages because in this specific case the D ATA structure does not need to being shared with other threads, i.e. it's local to the worker's receive thread. But because the content of this buffer was used to fill a disruptor-backed queue (executor incoming queues) it must still B e a power of 2. Seelaunch-receive-thread!In Backtype.storm.messaging.loader for details.
    • Each element of the transfer queue configured with is topology.transfer.buffer.size actually a list of tuples. The various executor send threads would batch outgoing tuples off their outgoing queues onto the transfer queue. The default value is 1024x768 elements.
12
// Example: configuring via Java APIconf.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE, 32); // default is 1024
Executors

Each worker process controls one or more executor threads. Each executor thread have its ownincoming queue and outgoing queue. As described above, the worker process runs a dedicated worker receive thread that was responsible for moving incoming mess Ages to the appropriate incoming queue of the worker ' s various executor threads. Similarly, each executor have its dedicated send thread that moves a executor ' s outgoing messages from its outgoing queue To the "parent" worker ' s transfer queue. The sizes of the executors ' incoming and outgoing queues are configured via topology.executor.receive.buffer.size topology.executor.send.buffer.size and, respectively.

Each executor thread have a single thread that handles the user logic for the Spout/bolt (i.e. your application code), and A single send thread which moves messages from the executor's outgoing queue to the worker ' s transfer queue.

    • the  topology.executor.receive.buffer.size  is the size of the incoming queue for an EXECU Tor. Each element of the this queue is a  list  of tuples. Here, tuples is appended in batch. The default value is 1024x768 elements, and the value must be a power of 2 (this requirement comes from LMAX disruptor).
12   
  //example:configuring via Java apiconf.put (config. Topology_executor_receive_buffer_size16384//batched; default is 1024x768            
    • the  topology.executor.send.buffer.size  is the size of the outgoing queue for an executor . Each element of the this queue would contain a  single  tuple. The default value is 1024x768 elements, and the value must be a power of 2 (this requirement comes from LMAX disruptor).
12
// Example: configuring via Java APIconf.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE, 16384); // individual tuples; default is 1024
Where to go from herehow to configure Storm ' s internal message buffers

The various default values mentioned above is defined in Conf/defaults.yaml. You can override these values globally in a Storm cluster ' s conf/storm.yaml . You can also configure these parameters per individual storm topology via Backtype.storm.Config in Storm ' s Java API.

How to configure Storm ' s parallelism

The correct configuration of Storm ' s message buffers is closely tied to the workload pattern of your topology as well as T He configured parallelism of your topologies. See Understanding the Parallelism of a Storm topology for more details about the latter.

Understand what's going on in your Storm topology

The Storm UI is a good start to inspect key metrics of your running Storm topologies. For instance, it shows you the so-called "capacity" of a spout/bolt. The various metrics would help you decide whether your changes to the buffer-related configuration parameters described in This article had a positive or negative effect on the performance of your Storm topologies. See Running a multi-node Storm Cluster for details.

Apart from the can also generate your own application metrics and track them with a tool like Graphite. See my articles sending Metrics from Storm to Graphite and installing and Running Graphite via RPM and Supervisord for Det Ails. It might also is worth checking out Ooyala ' s Metrics_storm project on GitHub (I haven ' t used it yet).

Advice on performance tuning

Watch Nathan Marz's talk on Tuning and productionization of Storm.

The TL;DR version is:try the following settings as a first start and see whether it improves the performance of your Stor M topology.

1234  
Conf.Put(Config.Topology_receiver_buffer_size,8conf. Put (config. Topology_transfer_buffer_size32 conf. (config. Topology_executor_receive_buffer_size16384 conf. (config. Topology_executor_send_buffer_size16384     

Storm Buffer Settings

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.