84 Lessons: StreamingContext, DStream, receiver depth analysis

Last Update:2016-04-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

StreamingContext, DStream, receiver depth analysis

This lesson is divided into four parts to explain, the first part of StreamingContext function and source code analysis, the second part of DStream function and source analysis; The third part of receiver function and source analysis; the last part will StreamingContext, DStream , receiver combined to analyze its process.

First, StreamingContext function and source code analysis:

1. Create the application main portal via the Spark streaming object JSSC and write to the source data on the receiving data service port 9999 on the driver:

2. The main functions of Spark streaming are:

The entrance of the main program;
Various methods of creating dstream are provided to receive various incoming data sources (for example: Kafka, Flume, Twitter, ZEROMQ, and simple TCP sockets);
When instantiating a spark streaming object through a constructor, you can specify the master URL, AppName, or an incoming Sparkconf configuration object, or a Sparkcontext object that has already been created;
The incoming data is streamed into the Dstreams object;
Start the current application's flow computing framework by using the start method of the spark streaming object instance or end the current application's flow computing framework through the Stop method;

Second, Dstream function and source code analysis:

1, Dstream is the template of the RDD, Dstream is abstract, the RDD is also abstract

2, Dstream the implementation of the sub-class as shown:

3, take StreamingContext instance of the Sockettextsteam method as an example, the result of its execution returns Dstream object instance, its source code call process such as:

Socket.getinputstream get data, while loop to store savings data (memory, disk)

Third, receiver function and source code analysis:

1, receiver represents the input of data, receive external input data, such as fetching data from Kafka;

2, receiver running on the worker node;

3, receiver on the worker node crawl Kafka distributed message Framework data, the implementation of the specific class is kafkareceiver;

4, receiver is an abstract class, the implementation of its fetching data subclass as shown:

5, if the above implementation classes do not meet your requirements, you can define the receiver class, you only need to inherit the receiver abstract class to achieve their own sub-class business requirements.

Four, StreamingContext, DStream, receiver combined flow analysis:

(1) InputStream represents the data input stream (for example: Socket, Kafka, flume, etc.)

(2) Transformation represents a series of operations on the data, such as FLATMAP, map, etc.

(3) OutputStream represents the output of the data, such as the Println method in WordCount:

The data data will eventually generate the job after the flow comes in, and ultimately the execution is based on the Spark Core's RDD: Dstream when processing incoming data transformation because it's streamingcontext, it doesn't run at all, StreamingContext will generate "Dstream chains" and dstreamgraph based on transformation, and Dstreamgraph is the template for the DAG, which is managed by the framework. When we specify a time interval, the driver end will trigger the job based on the interval to trigger the jobs based on the specific function specified in the Outputdstream, such as print in WordCount, This function is bound to pass to Foreachdstream, which will hand over the function to the last Dstream-generated rdd, the RDD print operation, which is the RDD trigger action.

Summarize:

With spark streaming you can handle a variety of data source types, such as database, HDFS, server log logs, network streams, which are more powerful than you might imagine, but are often not used by people, and the real reason for this is the spark, spark Streaming itself does not understand.

Written by: Imf-spark Steaming enterprise-level development Practical Team

Main editor: Liaoliang

Note:

Data from: Dt_ Big Data DreamWorks (the fund's legendary action secret course)

For more private content, please follow the public number: Dt_spark

If you are interested in big data spark, you can listen to it free of charge by Liaoliang teacher every night at 20:00 Spark Permanent free public class, address yy room Number: 68917580

84 Lessons: StreamingContext, DStream, receiver depth analysis

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

84 Lessons: StreamingContext, DStream, receiver depth analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

84 Lessons: StreamingContext, DStream, receiver depth analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support