Contents of this issue:
1 Rdd Generation life cycle
2 Deep thinking
All data that cannot be streamed in real time is invalid data. In the stream processing era, Sparkstreaming has a strong appeal, and development prospects, coupled with Spark's ecosystem, streaming can easily call other powerful frameworks such as Sql,mllib, it will eminence.
The spark streaming runtime is not so much a streaming framework on spark core as one of the most complex applications on spark core. If you can master the complex application of spark streaming, then other complex applications are a cinch. It is also a general trend to choose spark streaming as a starting point for custom versions.
Dstream is a template for Rdd in sparkstreaming, and every other batchinterval generates a corresponding RDD based on the Dstream template. The resulting rdd will be stored in the Generatedrdd. After generating a series of RDD, the Rdd is manipulated via Foreachrdd.
Private def foreachrdd ( foreachfunc: (Rdd[t], time) = unit, displayinnerrddops:boolean): unit = { new Fo Reachdstream (This,
A Foreachdstream object is foreachrdd in new. and register this to dstreamgraph.
The Generatejobs method in Dstreamgraph is called during each batchinterval time
def generatejobs (time:time): seq[job] = { Logdebug ("Generating jobs for Time" + time) val jobs = This.synchroniz Ed { Outputstreams.flatmap {outputstream = val joboption = outputstream.generatejob (time) Joboption.foreach (_.setcallsite (outputstream.creationsite)) joboption } } logdebug (" Generated "+ jobs.length +" Jobs for Time "+ time)
In this case, the Generatejob method is called based on time.
Override Def generatejob (time:time): option[job] = { Parent.getorcompute (time) match {case Some (RDD) = val jobfunc = () = Createrddwithlocalproperties (time, displayinnerrddops) { Foreachfunc (Rdd, time) }< C5/>some (time, Jobfunc) case None = None
Then follow graph back, then next batchinterval generate a new rdd, loop around, and put all the generated Rdd into the collection Generatrdrdd.
Note:
Data from: Dt_ Big Data Dream Factory (spark release version customization)
For more private content, please follow the public number: Dt_spark
If you are interested in big data spark, you can listen to it free of charge by Liaoliang teacher every night at 20:00 Spark Permanent free public class, address yy room Number: 68917580
Spark Version Custom 8th day: The RDD generation lifecycle is thorough