Spark Release Notes 11

Source: Internet
Author: User

Overview of this issue:

Receivertracker Architecture Design

Message circulatory system

Receivertracker Specific implementations

Spark streaming as an application on top of the spark core infrastructure, how does data processing take place after the Receivertracker receives the data?

In order to understand this problem, first of all, we open the source code

Find Receiversupervisorimpl This class

From the source code can be seen, write data is through the Receivedblockhandler object Receivedblockhandler write. There are two ways to write, one is to write fault-tolerant based on the Wal method. One is direct write (relatively unsafe). As shown

The data is then stored and reported to driver for driver to store the metadata as follows

The message class used to report to driver, as shown in

When it comes to the record, it should be noted that the general professional description of the size of the data processing, should use how many records to describe the more scientific, generally say the size of the data to how many capacity records, rather than how much data scale to reach the size of petabytes of data, this is not very scientific, because the record may have many fields, For example, 1PB of data, 5 fields, and 5PB of data 1 fields are similar. So 1PB data size is not necessarily larger than 5PB data size reflects the data processing power of a big data engine. For example, some data is video or audio. It is less suitable to say how many PB to describe the size of the scale.

Indicates that there is a receivertracker communication in Receiversupervisorimpl, which can communicate with Receivertracker.

and Receiversupervisorimpl data metadata information to Receivertracker

So we enter the Receivertracker class, which is the center of the entire stream processing data management.

In the Receivertracker, there are endpoint communication bodies, which receive data reporting from Receiversupervisorimpl metadata.

Next, we re-enter the Receivertracker itself, the overall understanding of receivertracker.

Record receiver's three states, respectively, inactive state, executing scheduled task status, active state

Sealed keywords, stating that all subclasses are sealed here for easy management

/**

* This message would trigger Receivertrackerendpoint to restart a Spark job for the receiver.

*/

This message is used to tell the receiver to start a job, and Receivertracker has many of these case classes for communication.

Private[streaming] Case Class Restartreceiver (Receiver:receiver[_])

Extends Receivertrackerlocalmessage

Another example of this same kind of message

/**

* This message would trigger Receivertrackerendpoint to send the stop signals to all registered

* Receivers.

*/

Private[streaming] Case Object Stopallreceivers extends Receivertrackerlocalmessage

Note: param Skipreceiverlaunch does not launch the receiver. This is a useful for testing.

Simply put, the receivertracker can be simply said to include receiver data start-up reception, management, recycling three processes.

Prior to a preview, we will streaming stream processing all the code line of the filter, tell the whole streaming through a drop of water to see the world.

All input streams are handed to the Grapx object because the object will dispatch all the data to be dispatched uniformly.

There's a member inside, called Receiverblocktracker.

Listenerbus is very important, we will focus on the analysis of Listenerbus source code, it plays an important role in the monitoring level.

Here, you can see that the state of Receivertracker has the following 4 states, respectively,

Initialize, start, stop, stop.

This is the process of receiving a message that Receiversupervisorimpl sent over a remote address.

This is one of the priorities of today.

Write the log before you proceed to the next step, here is for fault-tolerant reasons to consider.

Note: This will make iswriteaheadlogenabled true if the checkpoint directory is specified.

Receivedblocktrackerlogevent is actually meta data information.

Using a HASHMAP structure to match the stream with block one by one in the Blockqueue, it is really ingenious to the extreme.

And back to our message communication plane.

Reply to each other, inform each other, Addblock success. and holds metadata information that has data.

The main task of the Receivedblocktracker class is to assign blocks to stream batch that does not have a block assigned to it.

This is the code that specifically allocates block to batch.

This shows that the specific allocations are assigned in batch time.

Look again at the message communication body.

This says to start all receiver.

Start all receiver.

In this way, the entire data reception link is opened.

Finally, make some additions:

This phase is the cleanupoldblocks phase, at which time a message is sent to Receiversupervisorimpl, allowing it to execute the Cleanupoldblocks method.


/** Update A receiver ' s maximum ingestion rate */

Last Stopallreceivers, it's over.

Spark Release Notes 11

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.