3rd Lesson: Interpreting spark–streaming operating mechanism

Source: Internet
Author: User

Thanks to DT Big Data DreamWorks Support offers the following content, DT Big Data DreamWorks specializes in spark release customization. For more information, see
contact email [email protected]
Tel: 18610086859
qq:1740415547
No.: 18610086859

Custom class: The third lesson interprets the spark–streaming operation mechanism from the actual combat

First we run the following program and then further deepen the process of understanding the execution of the spark streaming flow processing job through the program's running process, as follows:

def main (Args:array[string]) {val conf = new sparkconf ()//Create sparkconf Object conf. Setappname("Onlineforeachrdd")//Set the name of the application, which can be seen in the monitoring interface of the program run//CONF. Setmaster("spark://master:7077")//At this time, the program is in the spark cluster conf. Setmaster("Local[6]")//Set the Batchduration interval to control the frequency of job generation and create a spark streaming execution portal Val SSC = new StreamingContext (conf, Seconds (5)) Val lines = SSC. Sockettextstream("Master",9999) Val words = lines. FlatMap(_. Split(" ")) Val wordcounts = words. Map(x= (x,1)). Reducebykey(_ + _) wordcounts. Foreachrdd{Rdd = Rdd. Foreachpartition{partitionofrecords = {//ConnectionPool is a static, lazily initialized pool of connections V Al connection = ConnectionPool. getconnection() partitionofrecords. foreach(record = {Val sql ="INSERT into Streaming_itemcount (Item,count) VALUES ('"+ Record._1 +"',"+ record._2 +")"Val stmt = connection. Createstatement();stmt. Executeupdate(SQL);}) ConnectionPool. ReturnConnection(connection)//Return to the pool for future Reuse}}} SSC. Start() SSC. Awaittermination()       }}
Two operating mechanisms insider disclosure

1. The inside of the StreamingContext call start method is actually starting the Jobscheduler start method, the message loop, in Jobscheduler The start interior constructs Jobgenerator and Receivertacker, and calls the Start method of Jobgenerator and Receivertacker:
(1). Jobgenerator will continue to generate a job based on batchduration after startup
(2). Receivertracker start receiver first in spark cluster (in fact, start receiversupervisor in executor) before receiver receives
The data is then stored via receiversupervisor to executor and the metadata information of the data is sent to Receivertracker in driver, Receivertracker
Internally, the received metadata information is managed via Receivedblocktracker
2. Each batchinterval will produce a specific job, in fact, the job here is not the job referred to in Spark Core, it is just the dstreamgraph based on the RDD generated by the DAG, from Java perspective, Equivalent to the Runnable interface instance, at this point to run the job needs to be submitted to Jobscheduler, in Jobscheduler through the thread pool way to find a
Why use a thread pool when a separate thread submits the job to the cluster to run (in fact, the RDD-based action in the thread triggers a real job run)?
(1). The job is constantly generated, so in order to improve efficiency, we need a thread pool, which is similar to executing a task in executor through a thread pool;
(2). It is possible to set the job of the Fair Fair scheduling method, this time also need multi-threading support;

3rd Lesson: Interpreting spark–streaming operating mechanism

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.