Contents of this issue:
- Architecture Design of Receivertracker
- Message circulatory system
- Receivertracker Concrete Implementation
First, the architecture design of Receivertracker
1, Receivertracker can be driver in the specific own algorithm to start receiver in the specific execute, the way to start receiver will each receiver is encapsulated into a tracker,
Tracker is the only tracker in this job, in essence receivertracker the way to start receiver is encapsulated into a job, how many jobs will start how many receiver, or there is
How many receiver will distribute the job, each job in a tracker, tracker inside a piece of data is this receiver data.
2, Receivertracker in the start receiver when it has a receiversupervisor, receiversupervisorlmpl as their own implementation, actually receiversupervisor it itself in the start
Turn around will help us start receiver, receiver will continue to receive data transfer through Blockgenerator will generate a block, plus the timer will continue to store data, storage
There are two kinds of data, one through Blockmessage, two to write the way of the log Wal, after storage Receiversupervisorlmpl will store the data of the source data will be reported to Receivertracker, in essence
is to report to the RECEIVERTRACKERRPC Communication message entity, Receivertracker to receive the data via RPC and then turn around to prepare for the next data management work.
Second, the concrete realization of Receivertracker
How to deal with Receivertracker after receiving data:
Store data and report it to driver:
Receivedblockinfo:
Receivertracker acts as the RPC message loop body to receive receiver messages, manage the entire receiver execution, receiver startup, recovery, data management during execution, and include a reboot.
The message is to complete receiver communication with the Receivertracker message.
All input streams are determined and all input streams are required to start.
Getreceivedblockqueue: Streaming corresponding block received block, this is HashMap can have a lot of input streams, different input streams can be independent of each other, no matter,
From driver's point of view we act as a collection of larger hashmap, and the data received later is processed.
All received blocks are tracked, and the received receiver's blocks is assigned to our batches as needed, and the data is assigned to the currently executing job, depending on the time required
Third, the message communication body
Startallreceivers: Start all receiver
Updatereceiverratelimit:receivertracker He can dynamically adjust the limit received by receiver
Summary:
1, receiver received data merged and stored data Receiversupervisorlmpl data and source data reported to our Receivertracker
2, Receivertracker receive the source data report is actually internal RPC message communication body, receive data inside actually have a receivedblocktracker to receive data distribution
3, Jobgenerator will each Bach as a time window, work at the time according to the source data information receivertracker to obtain the corresponding source data information generated RDD
4, Receivedblocktracker management of the entire block of source data information, but as an internal management object
If you speak from a design pattern, receivertracker and receiverblocktracker, or our RPC communication objects and receiverblocktracker their design patterns are façade (Facet) Design Patterns:
Receiverblocktracker: doing things inside
Receivertracker: An external communication body or representative.
Note:
-
- Data from: Liaoliang (Spark release version customization)
- Sina Weibo:http://www.weibo.com/ilovepains
Spark Streaming source code interpretation of driver Receivertracker architecture design with concrete implementation of thorough research