Spark Release Notes 11

Last Update:2016-05-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Overview of this issue:

Receivertracker Architecture Design

Message circulatory system

Receivertracker Specific implementations

Spark streaming as an application on top of the spark core infrastructure, how does data processing take place after the Receivertracker receives the data?

In order to understand this problem, first of all, we open the source code

Find Receiversupervisorimpl This class

From the source code can be seen, write data is through the Receivedblockhandler object Receivedblockhandler write. There are two ways to write, one is to write fault-tolerant based on the Wal method. One is direct write (relatively unsafe). As shown

The data is then stored and reported to driver for driver to store the metadata as follows

The message class used to report to driver, as shown in

When it comes to the record, it should be noted that the general professional description of the size of the data processing, should use how many records to describe the more scientific, generally say the size of the data to how many capacity records, rather than how much data scale to reach the size of petabytes of data, this is not very scientific, because the record may have many fields, For example, 1PB of data, 5 fields, and 5PB of data 1 fields are similar. So 1PB data size is not necessarily larger than 5PB data size reflects the data processing power of a big data engine. For example, some data is video or audio. It is less suitable to say how many PB to describe the size of the scale.

Indicates that there is a receivertracker communication in Receiversupervisorimpl, which can communicate with Receivertracker.

and Receiversupervisorimpl data metadata information to Receivertracker

So we enter the Receivertracker class, which is the center of the entire stream processing data management.

In the Receivertracker, there are endpoint communication bodies, which receive data reporting from Receiversupervisorimpl metadata.

Next, we re-enter the Receivertracker itself, the overall understanding of receivertracker.

Record receiver's three states, respectively, inactive state, executing scheduled task status, active state

Sealed keywords, stating that all subclasses are sealed here for easy management

/**

* This message would trigger Receivertrackerendpoint to restart a Spark job for the receiver.

This message is used to tell the receiver to start a job, and Receivertracker has many of these case classes for communication.

Private[streaming] Case Class Restartreceiver (Receiver:receiver[_])

Extends Receivertrackerlocalmessage

Another example of this same kind of message

/**

* This message would trigger Receivertrackerendpoint to send the stop signals to all registered

* Receivers.

Private[streaming] Case Object Stopallreceivers extends Receivertrackerlocalmessage

Note: param Skipreceiverlaunch does not launch the receiver. This is a useful for testing.

Simply put, the receivertracker can be simply said to include receiver data start-up reception, management, recycling three processes.

Prior to a preview, we will streaming stream processing all the code line of the filter, tell the whole streaming through a drop of water to see the world.

All input streams are handed to the Grapx object because the object will dispatch all the data to be dispatched uniformly.

There's a member inside, called Receiverblocktracker.

Listenerbus is very important, we will focus on the analysis of Listenerbus source code, it plays an important role in the monitoring level.

Here, you can see that the state of Receivertracker has the following 4 states, respectively,

Initialize, start, stop, stop.

This is the process of receiving a message that Receiversupervisorimpl sent over a remote address.

This is one of the priorities of today.

Write the log before you proceed to the next step, here is for fault-tolerant reasons to consider.

Note: This will make iswriteaheadlogenabled true if the checkpoint directory is specified.

Receivedblocktrackerlogevent is actually meta data information.

Using a HASHMAP structure to match the stream with block one by one in the Blockqueue, it is really ingenious to the extreme.

And back to our message communication plane.

Reply to each other, inform each other, Addblock success. and holds metadata information that has data.

The main task of the Receivedblocktracker class is to assign blocks to stream batch that does not have a block assigned to it.

This is the code that specifically allocates block to batch.

This shows that the specific allocations are assigned in batch time.

Look again at the message communication body.

This says to start all receiver.

Start all receiver.

In this way, the entire data reception link is opened.

Finally, make some additions:

This phase is the cleanupoldblocks phase, at which time a message is sent to Receiversupervisorimpl, allowing it to execute the Cleanupoldblocks method.

/** Update A receiver ' s maximum ingestion rate */

Last Stopallreceivers, it's over.

Spark Release Notes 11

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark Release Notes 11

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Spark Release Notes 11

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support