3rd Lesson: Interpreting spark–streaming operating mechanism

Last Update:2016-05-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Thanks to DT Big Data DreamWorks Support offers the following content, DT Big Data DreamWorks specializes in spark release customization. For more information, see
contact email [email protected]
Tel: 18610086859
qq:1740415547
No.: 18610086859

Custom class: The third lesson interprets the spark–streaming operation mechanism from the actual combat

First we run the following program and then further deepen the process of understanding the execution of the spark streaming flow processing job through the program's running process, as follows:

def main (Args:array[string]) {val conf = new sparkconf ()//Create sparkconf Object conf. Setappname("Onlineforeachrdd")//Set the name of the application, which can be seen in the monitoring interface of the program run//CONF. Setmaster("spark://master:7077")//At this time, the program is in the spark cluster conf. Setmaster("Local[6]")//Set the Batchduration interval to control the frequency of job generation and create a spark streaming execution portal Val SSC = new StreamingContext (conf, Seconds (5)) Val lines = SSC. Sockettextstream("Master",9999) Val words = lines. FlatMap(_. Split(" ")) Val wordcounts = words. Map(x= (x,1)). Reducebykey(_ + _) wordcounts. Foreachrdd{Rdd = Rdd. Foreachpartition{partitionofrecords = {//ConnectionPool is a static, lazily initialized pool of connections V Al connection = ConnectionPool. getconnection() partitionofrecords. foreach(record = {Val sql ="INSERT into Streaming_itemcount (Item,count) VALUES ('"+ Record._1 +"',"+ record._2 +")"Val stmt = connection. Createstatement();stmt. Executeupdate(SQL);}) ConnectionPool. ReturnConnection(connection)//Return to the pool for future Reuse}}} SSC. Start() SSC. Awaittermination()       }}

Two operating mechanisms insider disclosure

1. The inside of the StreamingContext call start method is actually starting the Jobscheduler start method, the message loop, in Jobscheduler The start interior constructs Jobgenerator and Receivertacker, and calls the Start method of Jobgenerator and Receivertacker:
(1). Jobgenerator will continue to generate a job based on batchduration after startup
(2). Receivertracker start receiver first in spark cluster (in fact, start receiversupervisor in executor) before receiver receives
The data is then stored via receiversupervisor to executor and the metadata information of the data is sent to Receivertracker in driver, Receivertracker
Internally, the received metadata information is managed via Receivedblocktracker
2. Each batchinterval will produce a specific job, in fact, the job here is not the job referred to in Spark Core, it is just the dstreamgraph based on the RDD generated by the DAG, from Java perspective, Equivalent to the Runnable interface instance, at this point to run the job needs to be submitted to Jobscheduler, in Jobscheduler through the thread pool way to find a
Why use a thread pool when a separate thread submits the job to the cluster to run (in fact, the RDD-based action in the thread triggers a real job run)?
(1). The job is constantly generated, so in order to improve efficiency, we need a thread pool, which is similar to executing a task in executor through a thread pool;
(2). It is possible to set the job of the Fair Fair scheduling method, this time also need multi-threading support;

3rd Lesson: Interpreting spark–streaming operating mechanism

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

3rd Lesson: Interpreting spark–streaming operating mechanism

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

3rd Lesson: Interpreting spark–streaming operating mechanism

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support