Spark Streaming source code interpretation of driver Receivertracker architecture design with concrete implementation of thorough research

Last Update:2016-05-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Contents of this issue:

Architecture Design of Receivertracker
Message circulatory system
Receivertracker Concrete Implementation

First, the architecture design of Receivertracker

1, Receivertracker can be driver in the specific own algorithm to start receiver in the specific execute, the way to start receiver will each receiver is encapsulated into a tracker,

Tracker is the only tracker in this job, in essence receivertracker the way to start receiver is encapsulated into a job, how many jobs will start how many receiver, or there is

How many receiver will distribute the job, each job in a tracker, tracker inside a piece of data is this receiver data.

2, Receivertracker in the start receiver when it has a receiversupervisor, receiversupervisorlmpl as their own implementation, actually receiversupervisor it itself in the start

Turn around will help us start receiver, receiver will continue to receive data transfer through Blockgenerator will generate a block, plus the timer will continue to store data, storage

There are two kinds of data, one through Blockmessage, two to write the way of the log Wal, after storage Receiversupervisorlmpl will store the data of the source data will be reported to Receivertracker, in essence

is to report to the RECEIVERTRACKERRPC Communication message entity, Receivertracker to receive the data via RPC and then turn around to prepare for the next data management work.

Second, the concrete realization of Receivertracker

How to deal with Receivertracker after receiving data:

　　Store data and report it to driver:

Receivedblockinfo:

Receivertracker acts as the RPC message loop body to receive receiver messages, manage the entire receiver execution, receiver startup, recovery, data management during execution, and include a reboot.

The message is to complete receiver communication with the Receivertracker message.

All input streams are determined and all input streams are required to start.

Getreceivedblockqueue: Streaming corresponding block received block, this is HashMap can have a lot of input streams, different input streams can be independent of each other, no matter,

From driver's point of view we act as a collection of larger hashmap, and the data received later is processed.

All received blocks are tracked, and the received receiver's blocks is assigned to our batches as needed, and the data is assigned to the currently executing job, depending on the time required

Third, the message communication body

Startallreceivers: Start all receiver

Updatereceiverratelimit:receivertracker He can dynamically adjust the limit received by receiver

　　Summary:

1, receiver received data merged and stored data Receiversupervisorlmpl data and source data reported to our Receivertracker

2, Receivertracker receive the source data report is actually internal RPC message communication body, receive data inside actually have a receivedblocktracker to receive data distribution

3, Jobgenerator will each Bach as a time window, work at the time according to the source data information receivertracker to obtain the corresponding source data information generated RDD

4, Receivedblocktracker management of the entire block of source data information, but as an internal management object

If you speak from a design pattern, receivertracker and receiverblocktracker, or our RPC communication objects and receiverblocktracker their design patterns are façade (Facet) Design Patterns:

Receiverblocktracker: doing things inside

Receivertracker: An external communication body or representative.

　　　Note:

- Data from: Liaoliang (Spark release version customization)
- Sina Weibo:http://www.weibo.com/ilovepains

Spark Streaming source code interpretation of driver Receivertracker architecture design with concrete implementation of thorough research

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark Streaming source code interpretation of driver Receivertracker architecture design with concrete implementation of thorough research

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Spark Streaming source code interpretation of driver Receivertracker architecture design with concrete implementation of thorough research

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support