Spark Streaming source code interpretation of driver Receivertracker architecture design with concrete implementation of thorough research

Source: Internet
Author: User

Contents of this issue:

    • Architecture Design of Receivertracker
    • Message circulatory system
    • Receivertracker Concrete Implementation

First, the architecture design of Receivertracker

1, Receivertracker can be driver in the specific own algorithm to start receiver in the specific execute, the way to start receiver will each receiver is encapsulated into a tracker,

Tracker is the only tracker in this job, in essence receivertracker the way to start receiver is encapsulated into a job, how many jobs will start how many receiver, or there is

How many receiver will distribute the job, each job in a tracker, tracker inside a piece of data is this receiver data.

2, Receivertracker in the start receiver when it has a receiversupervisor, receiversupervisorlmpl as their own implementation, actually receiversupervisor it itself in the start

Turn around will help us start receiver, receiver will continue to receive data transfer through Blockgenerator will generate a block, plus the timer will continue to store data, storage

There are two kinds of data, one through Blockmessage, two to write the way of the log Wal, after storage Receiversupervisorlmpl will store the data of the source data will be reported to Receivertracker, in essence

is to report to the RECEIVERTRACKERRPC Communication message entity, Receivertracker to receive the data via RPC and then turn around to prepare for the next data management work.

  

Second, the concrete realization of Receivertracker

How to deal with Receivertracker after receiving data:

    

  Store data and report it to driver:

    

Receivedblockinfo:

    

    

    

    

Receivertracker acts as the RPC message loop body to receive receiver messages, manage the entire receiver execution, receiver startup, recovery, data management during execution, and include a reboot.

    

The message is to complete receiver communication with the Receivertracker message.

    

    

All input streams are determined and all input streams are required to start.

    

    

    

Getreceivedblockqueue: Streaming corresponding block received block, this is HashMap can have a lot of input streams, different input streams can be independent of each other, no matter,

From driver's point of view we act as a collection of larger hashmap, and the data received later is processed.

    

    

All received blocks are tracked, and the received receiver's blocks is assigned to our batches as needed, and the data is assigned to the currently executing job, depending on the time required

    

 

Third, the message communication body

Startallreceivers: Start all receiver

    

    

    

Updatereceiverratelimit:receivertracker He can dynamically adjust the limit received by receiver

    

    

    

    

    

    

  Summary:

1, receiver received data merged and stored data Receiversupervisorlmpl data and source data reported to our Receivertracker

2, Receivertracker receive the source data report is actually internal RPC message communication body, receive data inside actually have a receivedblocktracker to receive data distribution

3, Jobgenerator will each Bach as a time window, work at the time according to the source data information receivertracker to obtain the corresponding source data information generated RDD

4, Receivedblocktracker management of the entire block of source data information, but as an internal management object

If you speak from a design pattern, receivertracker and receiverblocktracker, or our RPC communication objects and receiverblocktracker their design patterns are façade (Facet) Design Patterns:

Receiverblocktracker: doing things inside

Receivertracker: An external communication body or representative.

   Note:

      • Data from: Liaoliang (Spark release version customization)
      • Sina Weibo:http://www.weibo.com/ilovepains

Spark Streaming source code interpretation of driver Receivertracker architecture design with concrete implementation of thorough research

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.