(Version Customization) lesson 7th: Spark Streaming Source Interpretation Jobscheduler insider realization and deep thinking

Last Update:2016-05-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Contents of this issue:

1, Jobscheduler Insider realization

2, Jobscheduler deep thinking

Jobscheduler is the dispatch core of spark streaming, and it is important to be the Dag Scheduler of the dispatch center on Spark Core!

Jobgenerator Every batch duration time will be dynamically generated Jobset submitted to Jobscheduler,jobscheduler received Jobset, how to deal with it?

Create Job

/** Generate jobs and perform checkpoint for the given` Time`. */
Private DefGeneratejobs(Time:time) {
//Set The sparkenv in this thread, so the job generation code can access the environment
//Example:blockrdds is created in this thread, and it needs to access Blockmanager
//Update:this is probably redundant after threadlocal stuff in sparkenv have been removed.
Sparkenv.Set(SSC.Env)
Try{
Jobscheduler.Receivertracker. Allocateblockstobatch (Time)//Allocate received blocks to batch
Graph. Generatejobs (Time)Generate jobs using allocated block
}Match{
 CaseSuccess(Jobs) =
ValStreamidtoinputinfos = Jobscheduler.Inputinfotracker. GetInfo (Time)
Jobscheduler.Submitjobset(Jobset( time, Jobs, Streamidtoinputinfos))
 CaseFailure(e) = =
Jobscheduler.reporterror ("Error Generating jobs for Time"+ Time, e)
}
EventLoop. Post (Docheckpoint( time, Clearcheckpointdatalater =false))
}

Processing the resulting jobset

defSubmitjobset(Jobset:jobset) {
if(JobSet.jobs.isEmpty) {
Loginfo ("No jobs added for time"+ jobset.time)
}Else{
Listenerbus. Post (streaminglistenerbatchsubmitted(Jobset.tobatchinfo))
jobsets. put (Jobset.time, Jobset)
JobSet.jobs.foreach (Job = jobexecutor. Execute (new Jobhandler (Job)))
Loginfo ("Added Jobs for Time"+ jobset.time)
}
}

This will generate a new jobhandler for each job and give it to Jobexecutor to run.

The most important processing logic here is Job = Jobexecutor.execute (new Jobhandler), which is to handle each job in the Jobexecutor thread pool, with the new Jobhandler.

Let's take a look at Jobhandler's main processing logic for the job:

var_eventloop =EventLoop
off(_eventloop! =NULL) {
_eventloop.post (jobstarted(Job, Clock. Gettimemillis ()))
Disable checks for existing output directories in jobs launched by the streaming
//Scheduler, since we may need to write output to an existing directory during checkpoint
//recovery; see SPARK-4835 for more details.
Pairrddfunctions.disableoutputspecvalidation. Withvalue (true) {
Job.run ()
}
_eventloop =EventLoop
if(_eventloop! =NULL) {
_eventloop.post (jobcompleted(Job, Clock. Gettimemillis ()))
}

In other words, jobhandler in addition to doing some state records, the most important thing is to call Job.run ()! This corresponds to our analysis in the DStream generated RDD instance, where Foreachdstream.generatejob (time) defines the operating logic of the job, which defines the job.func. And here in Jobhandler, is really called the Job.run (), will trigger the real execution of the Job.func!

def run () {_result = Try (func ())}

650) this.width=650; "title=" 1.png "alt=" Wkiom1c-pzxgmmyaaadvu7cxuo4555.png "src=" http://s4.51cto.com/wyfs02/M01/ 80/58/wkiom1c-pzxgmmyaaadvu7cxuo4555.png "/>

Reference Blog: http://lqding.blog.51cto.com/9123978/1773391

Note:

information from: Dt_ Big Data Dream Factory ( Spark release version customization)

for more private content, please follow the public number: Dt_spark

If you have a big dataSparkinterested to be free to listen to by Liaoliang teacher every night -:xxopened bySparkpermanent free public class, addressYYRoom Number:68917580

This article is from "Dt_spark Big Data DreamWorks" blog, please make sure to keep this source http://18610086859.blog.51cto.com/11484530/1775258

(Version Customization) lesson 7th: Spark Streaming Source Interpretation Jobscheduler insider realization and deep thinking

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

(Version Customization) lesson 7th: Spark Streaming Source Interpretation Jobscheduler insider realization and deep thinking

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

(Version Customization) lesson 7th: Spark Streaming Source Interpretation Jobscheduler insider realization and deep thinking

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support