The last lesson will be to how receiver is constantly receiving data, and the data received by the metadata will be reported to Receivertracker, below we look at the Receivertracker specific functions and implementation.
First, the main functions of Receivertracker:
Start receivers on the executor.
Stop receivers.
Update receiver's rate of receiving data (i.e., current limit)
Constantly waiting for the receivers to run, restart receiver as long as the receivers stops running. This is the fault-tolerant function of receiver.
Accept the registration of receiver.
Use Receivedblocktracker to manage the metadata of receiver receiving data.
Report the error message sent by receiver.
Receivertracker manages a message communication receivertrackerendpoint that communicates with receiver or receivertracker.
In the Receivertracker start method, the Receivertrackerendpoint is instantiated and receivers is started on executor:
/** Start The endpoint and receiver execution thread. */def start (): Unit = synchronized {if (istrackerstarted) {throw new Sparkexception ("Receivertracker already started ")} if (!receiverinputstreams.isempty) {endpoint = Ssc.env.rpcEnv.setupEndpoint (" Receivertracker ", New Receiv Ertrackerendpoint (SSC.ENV.RPCENV)) if (!skipreceiverlaunch) launchreceivers () Loginfo ("Receivertracker started") Trackerstate = Started}}
Start RECEIVR, is actually receivertracker to Receivertrackerendpoint sent a local message, Receivertrackerendpoint the receiver package into an RDD to be submitted to the cluster for job execution.
Endpoint.send (Startallreceivers (receivers))
The endpoint here is Receivertrackerendpoint's reference.
Receiver after the start, will be registered to Receivertracker, registration success is officially started.
Override protected Def onreceiverstart (): Boolean = {val msg = Registerreceiver (Streamid, Receiver.getClass.getSimpl ename, host, Executorid, endpoint) Trackerendpoint.askwithretry[boolean] (msg)}
When the receiver side receives the data, it is necessary to write the data to the Blockmanager and report the data to Receivertracker:
/** store block and report it to driver */def pushandreportblock ( receivedblock: receivedblock, metadataoption: option[ any], blockidoption: option[streamblockid] ) { val blockid = blockidoption.getorelse (Nextblockid) val time = System.currenttimemillis val blockstoreresult = receivedblockhandler.storeblock ( Blockid, receivedblock) logdebug (S "pushed block $blockId in ${( System.currenttimemillis - time)} ms ") val numRecords = Blockstoreresult.numrecords val blockinfo = receivedblockinfo (streamId, Numrecords, metadataoption, blockstoreresult) trackerendpoint.askwithretry[boolean] ( Addblock (Blockinfo)) logdebug (S "reported block&nbSP; $blockId ")}
When Receivertracker receives the metadata, a thread is started in the thread pool to write the data:
Case addblock (Receivedblockinfo) => if (writeaheadlogutils.isbatchingenabled ( ssc.conf, isdriver = true)) { walbatchingthreadpool.execute (new runnable { override def run (): Unit = utils.trylognonfatalerror { if (Active) { context.reply (Addblock (receivedBlockInfo)) } else { throw new illegalstateexception ("Receivertracker rpcendpoint shut down.") } } }) } else { context.reply (Addblock (receivedBlockInfo)) }
The metadata for the data is managed by Receivedblocktracker.
The data is eventually written to the Streamidtounallocatedblockqueues: a queue that corresponds to a block of data for a stream.
Private type Receivedblockqueue = mutable. Queue[receivedblockinfo]private val streamidtounallocatedblockqueues = new mutable. Hashmap[int, Receivedblockqueue]
Whenever streaming triggers a job, the data in the queue is assigned to a batch and the data is written to the TIMETOALLOCATEDBLOCKS data structure.
Private val timetoallocatedblocks = new mutable. Hashmap[time, allocatedblocks]....def allocateblockstobatch (batchtime: time): Unit = synchronized { if (lastallocatedbatchtime == null | | batchtime > lastallocatedbatchtime) { val streamidtoblocks = streamIds.map { streamId => ( Streamid, getreceivedblockqueue (Streamid). Dequeueall (x => true)) }. Tomap val allocatedblocks = allocatedblocks (streamIdToBlocks) if (WriteToLog (Batchallocationevent (batchtime, allocatedblocks)) { timetoallocatedblocks.put (batchtime, allocatedblocks) lastallocatedbatchtime = batchtime &nBSP;} else { loginfo (S "possibly processed batch $ Batchtime need to be processed again in wal recovery ") } } else { // this situation occurs when: // 1. wal is ended with batchallocationevent, but without BatchCleanupEvent, // possibly processed batch job or half-processed batch job need to be processed again, // so the batchtime will be equal to lastallocatedbatchtime. // 2. slow checkpointing makes recovered batch time older than wal recovered // lastallocatedbatchtime. // this situation will only occurs in recovery time. loginfo (S "possibly processed batch $batchTime need to be Processed again in wal recovery ") }}
A batch can be seen to contain data from multiple streams.
Every time a job for streaming is finished running:
Private def handlejobcompletion (Job: job, completedtime: long) { val jobset = jobsets.get (job.time) jobset.handlejobcompletion (Job) Job.setendtime (completedtime) listenerbus.post (streaminglisteneroutputoperationcompleted ( Job.tooutputoperationinfo)) loginfo ("finished job " + job.id + " from job set of time " + jobset.time" if ( jobset.hascompleted) { jobsets.remove (jobset.time) Jobgenerator.onbatchcompletion (jobset.time) loginfo ("Total delay: %.3f s for time %s (Execution: %.3f s) ". Format ( jobset.totaldelay / 1000.0, jobset.time.tostring, jobset.processingdelay / 1000.0 )) listenerbus.post (streaminglistenerbatchcompleted (jobset.tobatchinfo)) }  &NBSP, .....
Jobscheduler will invoke the Handlejobcompletion method, which will eventually trigger
JobScheduler.receiverTracker.cleanupOldBlocksAndBatches (time-maxrememberduration)
The maxrememberduration here is the maximum amount of time that the RDD generated in Dstream is retained for each moment.
def cleanupoldbatches (Cleanupthreshtime:time, Waitforcompletion:boolean): Unit = synchronized {require ( Cleanupthreshtime.milliseconds < Clock.gettimemillis ()) Val Timestocleanup = timeToAllocatedBlocks.keys.filter {_ & Lt Cleanupthreshtime}.toseq loginfo ("Deleting batches" + timestocleanup) if (WriteToLog (Batchcleanupevent ( Timestocleanup)) {timetoallocatedblocks--= timestocleanup Writeaheadlogoption.foreach (_.clean. milliseconds, waitforcompletion)} else {logwarning ("Failed to acknowledge batch clean up in the Write Ahead Log.") }}
and finally
Listenerbus.post (streaminglistenerbatchcompleted (Jobset.tobatchinfo))
This code will call the
Case batchcompleted:streaminglistenerbatchcompleted = listener.onbatchcompleted (batchcompleted) ... All the way down .../** * A Ratecontroller that sends the new rate to receivers via the receiver tracker. */private[streaming] class Receiverratecontroller (Id:int, Estimator:rateestimator) extends Ratecontroller (ID, Estima Tor) {Override def publish (Rate:long): Unit = ssc.scheduler.receiverTracker.sendRateUpdate (ID, rate)}
/** Update A receiver ' s maximum ingestion rate */def sendrateupdate (Streamuid:int, newrate:long): Unit = synchronized { if (istrackerstarted) {endpoint.send (Updatereceiverratelimit (Streamuid, Newrate))}}
Case Updatereceiverratelimit (Streamuid, newrate) + = (Info <-receivertrackinginfos.get (streamuid); EP <-in Fo.endpoint) {ep.send (Updateratelimit (newrate))}
The rate at which the data flow is controlled is finally adjusted by Blockgenerator to adjust the rate at which the message is sent to Receiver,receiver.
Case Updateratelimit (EPS) = Loginfo (S "Received a new rate limit: $eps.") Registeredblockgenerators.foreach {bg = bg.updaterate (EPS)}
Note:
1. DT Big Data Dream Factory public number Dt_spark
2, the IMF 8 o'clock in the evening big data real combat YY Live channel number: 68917580
3, Sina Weibo: Http://www.weibo.com/ilovepains
This article is from the "Ding Dong" blog, please be sure to keep this source http://lqding.blog.51cto.com/9123978/1774994
11th Lesson: Spark Streaming the Receivertracker architecture design and concrete implementation of driver in source code interpretation