Storm Primer (11) Twitter Storm source code Analysis Coordinatedbolt

Source: Internet
Author: User
Tags ack

Author: xumingming | may be reproduced, but must be in the form of hyperlinks to indicate the original source and author information and copyright notice
Website: http://xumingming.sinaapp.com/811/twitter-storm-code-analysis-coordinated-bolt/

New features of Twitter storm: Transactional topology is asked the most question is: How does storm know that a bolt processing has completed all of its tuple? There are still a lot of things to do about it, and fortunately storm has provided a bolt to help us get rid of it. This awesome bolt is.
Coordinatedbolt. What is important is that the CoordinatedBolt implementation is also based on Storm's primitive: spout, bolt, which means that even if the author does not provide it, we can do it ourselves. Let's take a look at the implementation principle of this class.

Although Coordinatedbolt play a very good role, but in fact its principle is not very complex. It is now used in two scenarios:

    • Drpc
    • Transactional topology

Before looking at CoordinatedBolt the principle, we first see what is called "finished", in the end what finished?
In fact Coordinatedbolt for the business is not completely non-intrusive, to use the features provided by Coordinatedbolt, you have to ensure that each of your bolts sent each tuple's first field is request-id , then the so-called "done" It means that the current bolt is done with the current "Request-id" work to be done. This request-id DRPC represents a DRPC request in the inside, and in transactional topology represents a batch.

The principle of Coordinatedbolt is this:

  • For the user in the Drpc, transactional topology inside the bolt, have been coordinatedbolt packaging a layer: that is, DRPC, transactional Topology inside of the topology inside the run is not the user to provide the original bolt, but a bunch of coordinatedbolt, coordinatedbolt these bolts of the transaction agent.
  • With this proxy layer, Coordinatedbolt can do its job.
  • It maintains some of the following data on its own:
    • Which upstream task will send me a tuple? (The grouping information provided when constructing the topology can be learned)
    • Which downstream task do I send a tuple to? (also through grouping information can be learned)
  • Each coordinatedbolt, after each real Bolt sends a tuple, records which task the tuple is sent to.
  • After all of its tuples have been sent out (how do you know it's done?) Later, Wulf), it tells all of the task that it sent a tuple by another special stream in Emitdirect way, it sends the number of tuples to it.
  • A bolt, after receiving all the tuple information sent by the upstream task, compares the number of tuples it receives, and if the number is on it, it receives all the tuple-it has done.
  • In this way it is done, it can repeat the above steps to inform its downstream, its downstream to inform its downstream downstream and so on.
  • To summarize, how does each tuple know that it has finished its processing? is on its upstream notice. So as long as a bolt has upstream, it will be able to know when to complete.
  • There is always a bolt that has no upstream-the top bolt. So how does this bolt know that he's done with it? Relying on the storm's ACK system-as long as it ack its upstream (a non-coordinatedbolt, in DRPC is preparerequest) sent over the tuple, it completes the processing of this tuple. -that is to say, for the top Bolt, it just finishes processing a tuple (many tuples are processed relative to its downstream)

Specific principles such as:

As we discuss the concept of what is called "done," we say that CoordinatedBolt the use of the business is intrusive: you have to take the first field of each tuple in your current request-id , or you CoordinatedBolt will not be able to track it. A more elegant way is the network protocol stack inside the IP, TCP protocol processing way. IP packets in the TCP packet on the outer bread on the IP layer needs information, and does not require the IP layer needs to be doped in the TCP packet field, TCP layer in the sending of data only the TCP of those fields, to the IP layer automatically add IP layer information. The IP layer also automatically removes the IP layer information before it passes the packet to the TCP layer, and TCP will only see those fields of its own layer, without intrusion. The author has introduced some improvement measures to this problem here

Storm Primer (11) Twitter Storm source code Analysis Coordinatedbolt

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.