84 Lessons: StreamingContext, DStream, receiver depth analysis

Source: Internet
Author: User

StreamingContext, DStream, receiver depth analysis

This lesson is divided into four parts to explain, the first part of StreamingContext function and source code analysis, the second part of DStream function and source analysis; The third part of receiver function and source analysis; the last part will StreamingContext, DStream , receiver combined to analyze its process.

First, StreamingContext function and source code analysis:

1. Create the application main portal via the Spark streaming object JSSC and write to the source data on the receiving data service port 9999 on the driver:

2. The main functions of Spark streaming are:

    • The entrance of the main program;
    • Various methods of creating dstream are provided to receive various incoming data sources (for example: Kafka, Flume, Twitter, ZEROMQ, and simple TCP sockets);
    • When instantiating a spark streaming object through a constructor, you can specify the master URL, AppName, or an incoming Sparkconf configuration object, or a Sparkcontext object that has already been created;
    • The incoming data is streamed into the Dstreams object;
    • Start the current application's flow computing framework by using the start method of the spark streaming object instance or end the current application's flow computing framework through the Stop method;

Second, Dstream function and source code analysis:

1, Dstream is the template of the RDD, Dstream is abstract, the RDD is also abstract

2, Dstream the implementation of the sub-class as shown:

3, take StreamingContext instance of the Sockettextsteam method as an example, the result of its execution returns Dstream object instance, its source code call process such as:

Socket.getinputstream get data, while loop to store savings data (memory, disk)

Third, receiver function and source code analysis:

1, receiver represents the input of data, receive external input data, such as fetching data from Kafka;

2, receiver running on the worker node;

3, receiver on the worker node crawl Kafka distributed message Framework data, the implementation of the specific class is kafkareceiver;

4, receiver is an abstract class, the implementation of its fetching data subclass as shown:

5, if the above implementation classes do not meet your requirements, you can define the receiver class, you only need to inherit the receiver abstract class to achieve their own sub-class business requirements.

Four, StreamingContext, DStream, receiver combined flow analysis:

(1) InputStream represents the data input stream (for example: Socket, Kafka, flume, etc.)

(2) Transformation represents a series of operations on the data, such as FLATMAP, map, etc.

(3) OutputStream represents the output of the data, such as the Println method in WordCount:

The data data will eventually generate the job after the flow comes in, and ultimately the execution is based on the Spark Core's RDD: Dstream when processing incoming data transformation because it's streamingcontext, it doesn't run at all, StreamingContext will generate "Dstream chains" and dstreamgraph based on transformation, and Dstreamgraph is the template for the DAG, which is managed by the framework. When we specify a time interval, the driver end will trigger the job based on the interval to trigger the jobs based on the specific function specified in the Outputdstream, such as print in WordCount, This function is bound to pass to Foreachdstream, which will hand over the function to the last Dstream-generated rdd, the RDD print operation, which is the RDD trigger action.

Summarize:

With spark streaming you can handle a variety of data source types, such as database, HDFS, server log logs, network streams, which are more powerful than you might imagine, but are often not used by people, and the real reason for this is the spark, spark Streaming itself does not understand.

Written by: Imf-spark Steaming enterprise-level development Practical Team

Main editor: Liaoliang

Note:

Data from: Dt_ Big Data DreamWorks (the fund's legendary action secret course)

For more private content, please follow the public number: Dt_spark

If you are interested in big data spark, you can listen to it free of charge by Liaoliang teacher every night at 20:00 Spark Permanent free public class, address yy room Number: 68917580

84 Lessons: StreamingContext, DStream, receiver depth analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.