(Version Customization) lesson 7th: Spark Streaming Source Interpretation Jobscheduler insider realization and deep thinking

Source: Internet
Author: User

Contents of this issue:

1, Jobscheduler Insider realization

2, Jobscheduler deep thinking


Jobscheduler is the dispatch core of spark streaming, and it is important to be the Dag Scheduler of the dispatch center on Spark Core!

Jobgenerator Every batch duration time will be dynamically generated Jobset submitted to Jobscheduler,jobscheduler received Jobset, how to deal with it?

Create Job

/** Generate jobs and perform checkpoint for the given` Time`. */
Private DefGeneratejobs(Time:time) {
//Set The sparkenv in this thread, so the job generation code can access the environment
//Example:blockrdds is created in this thread, and it needs to access Blockmanager
//Update:this is probably redundant after threadlocal stuff in sparkenv have been removed.
Sparkenv.Set(SSC.Env)
Try{
Jobscheduler.Receivertracker. Allocateblockstobatch (Time)//Allocate received blocks to batch
Graph. Generatejobs (Time)Generate jobs using allocated block
}Match{
CaseSuccess(Jobs) =
ValStreamidtoinputinfos = Jobscheduler.Inputinfotracker. GetInfo (Time)
Jobscheduler.Submitjobset(Jobset( time, Jobs, Streamidtoinputinfos))
CaseFailure(e) = =
Jobscheduler.reporterror ("Error Generating jobs for Time"+ Time, e)
}
EventLoop. Post (Docheckpoint( time, Clearcheckpointdatalater =false))
}

Processing the resulting jobset

defSubmitjobset(Jobset:jobset) {
if(JobSet.jobs.isEmpty) {
Loginfo ("No jobs added for time"+ jobset.time)
}Else{
Listenerbus. Post (streaminglistenerbatchsubmitted(Jobset.tobatchinfo))
jobsets. put (Jobset.time, Jobset)
JobSet.jobs.foreach (Job = jobexecutor. Execute (new Jobhandler (Job)))
Loginfo ("Added Jobs for Time"+ jobset.time)
}
}

This will generate a new jobhandler for each job and give it to Jobexecutor to run.

The most important processing logic here is Job = Jobexecutor.execute (new Jobhandler), which is to handle each job in the Jobexecutor thread pool, with the new Jobhandler.

Let's take a look at Jobhandler's main processing logic for the job:

var_eventloop =EventLoop
off(_eventloop! =NULL) {
_eventloop.post (jobstarted(Job, Clock. Gettimemillis ()))
Disable checks for existing output directories in jobs launched by the streaming
//Scheduler, since we may need to write output to an existing directory during checkpoint
//recovery; see SPARK-4835 for more details.
Pairrddfunctions.disableoutputspecvalidation. Withvalue (true) {
Job.run ()
}
_eventloop =EventLoop
if(_eventloop! =NULL) {
_eventloop.post (jobcompleted(Job, Clock. Gettimemillis ()))
}

In other words, jobhandler in addition to doing some state records, the most important thing is to call Job.run ()! This corresponds to our analysis in the DStream generated RDD instance, where Foreachdstream.generatejob (time) defines the operating logic of the job, which defines the job.func. And here in Jobhandler, is really called the Job.run (), will trigger the real execution of the Job.func!

def run () {_result = Try (func ())}

650) this.width=650; "title=" 1.png "alt=" Wkiom1c-pzxgmmyaaadvu7cxuo4555.png "src=" http://s4.51cto.com/wyfs02/M01/ 80/58/wkiom1c-pzxgmmyaaadvu7cxuo4555.png "/>

Reference Blog: http://lqding.blog.51cto.com/9123978/1773391

Note:

information from: Dt_ Big Data Dream Factory ( Spark release version customization)

for more private content, please follow the public number: Dt_spark

If you have a big dataSparkinterested to be free to listen to by Liaoliang teacher every night -:xxopened bySparkpermanent free public class, addressYYRoom Number:68917580

This article is from "Dt_spark Big Data DreamWorks" blog, please make sure to keep this source http://18610086859.blog.51cto.com/11484530/1775258

(Version Customization) lesson 7th: Spark Streaming Source Interpretation Jobscheduler insider realization and deep thinking

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.