11th Lesson: Spark Streaming the Receivertracker architecture design and concrete implementation of driver in source code interpretation

Last Update:2016-05-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The last lesson will be to how receiver is constantly receiving data, and the data received by the metadata will be reported to Receivertracker, below we look at the Receivertracker specific functions and implementation.

First, the main functions of Receivertracker:

Start receivers on the executor.
Stop receivers.
Update receiver's rate of receiving data (i.e., current limit)
Constantly waiting for the receivers to run, restart receiver as long as the receivers stops running. This is the fault-tolerant function of receiver.
Accept the registration of receiver.
Use Receivedblocktracker to manage the metadata of receiver receiving data.
Report the error message sent by receiver.

Receivertracker manages a message communication receivertrackerendpoint that communicates with receiver or receivertracker.

In the Receivertracker start method, the Receivertrackerendpoint is instantiated and receivers is started on executor:

/** Start The endpoint and receiver execution thread. */def start (): Unit = synchronized {if (istrackerstarted) {throw new Sparkexception ("Receivertracker already started ")} if (!receiverinputstreams.isempty) {endpoint = Ssc.env.rpcEnv.setupEndpoint (" Receivertracker ", New Receiv    Ertrackerendpoint (SSC.ENV.RPCENV)) if (!skipreceiverlaunch) launchreceivers () Loginfo ("Receivertracker started") Trackerstate = Started}}

Start RECEIVR, is actually receivertracker to Receivertrackerendpoint sent a local message, Receivertrackerendpoint the receiver package into an RDD to be submitted to the cluster for job execution.

Endpoint.send (Startallreceivers (receivers))

The endpoint here is Receivertrackerendpoint's reference.

Receiver after the start, will be registered to Receivertracker, registration success is officially started.

Override protected Def onreceiverstart (): Boolean = {val msg = Registerreceiver (Streamid, Receiver.getClass.getSimpl ename, host, Executorid, endpoint) Trackerendpoint.askwithretry[boolean] (msg)}

When the receiver side receives the data, it is necessary to write the data to the Blockmanager and report the data to Receivertracker:

/** store block and report it to driver */def pushandreportblock (     receivedblock: receivedblock,    metadataoption: option[ any],    blockidoption: option[streamblockid]  )  {  val  blockid = blockidoption.getorelse (Nextblockid)   val time =  System.currenttimemillis  val blockstoreresult = receivedblockhandler.storeblock ( Blockid, receivedblock)   logdebug (S "pushed block  $blockId  in ${( System.currenttimemillis - time)} ms ")   val numRecords =  Blockstoreresult.numrecords  val blockinfo = receivedblockinfo (streamId,  Numrecords, metadataoption, blockstoreresult)   trackerendpoint.askwithretry[boolean] ( Addblock (Blockinfo))   logdebug (S "reported block&nbSP; $blockId ")}

When Receivertracker receives the metadata, a thread is started in the thread pool to write the data:

Case addblock (Receivedblockinfo)  =>  if  (writeaheadlogutils.isbatchingenabled ( ssc.conf, isdriver = true))  {    walbatchingthreadpool.execute (new  runnable {      override def run (): Unit =  utils.trylognonfatalerror {        if  (Active)  {           context.reply (Addblock (receivedBlockInfo))          } else {           throw new illegalstateexception ("Receivertracker rpcendpoint shut down.")         }      }     })   } else {    context.reply (Addblock (receivedBlockInfo))    }

The metadata for the data is managed by Receivedblocktracker.

The data is eventually written to the Streamidtounallocatedblockqueues: a queue that corresponds to a block of data for a stream.

Private type Receivedblockqueue = mutable. Queue[receivedblockinfo]private val streamidtounallocatedblockqueues = new mutable. Hashmap[int, Receivedblockqueue]

Whenever streaming triggers a job, the data in the queue is assigned to a batch and the data is written to the TIMETOALLOCATEDBLOCKS data structure.

Private val timetoallocatedblocks = new mutable. Hashmap[time, allocatedblocks]....def allocateblockstobatch (batchtime: time):  Unit =  synchronized {  if  (lastallocatedbatchtime == null | |  batchtime > lastallocatedbatchtime)  {    val streamidtoblocks  = streamIds.map { streamId =>         ( Streamid, getreceivedblockqueue (Streamid). Dequeueall (x => true))     }. Tomap    val allocatedblocks = allocatedblocks (streamIdToBlocks)      if  (WriteToLog (Batchallocationevent (batchtime, allocatedblocks))  {       timetoallocatedblocks.put (batchtime, allocatedblocks)        lastallocatedbatchtime = batchtime   &nBSP;}  else {      loginfo (S "possibly processed batch $ Batchtime need to be processed again in wal recovery ")      }  } else {    // this situation occurs  when:    // 1. wal is ended with batchallocationevent,  but without BatchCleanupEvent,    // possibly processed  batch job or half-processed batch job need to be processed  again,    // so the batchtime will be equal to  lastallocatedbatchtime.    // 2. slow checkpointing makes  recovered batch time older than wal recovered    //  lastallocatedbatchtime.    // this situation will only occurs in recovery time.     loginfo (S "possibly processed batch  $batchTime  need to be  Processed again in wal recovery ")   }}

A batch can be seen to contain data from multiple streams.

Every time a job for streaming is finished running:

Private def handlejobcompletion (Job: job, completedtime: long)  {  val  jobset = jobsets.get (job.time)   jobset.handlejobcompletion (Job)    Job.setendtime (completedtime)   listenerbus.post (streaminglisteneroutputoperationcompleted ( Job.tooutputoperationinfo))   loginfo ("finished job "  + job.id +  "  from job set of time  " + jobset.time"   if  ( jobset.hascompleted)  {    jobsets.remove (jobset.time)      Jobgenerator.onbatchcompletion (jobset.time)     loginfo ("Total delay: %.3f s  for time %s  (Execution: %.3f s) ". Format (       jobset.totaldelay / 1000.0, jobset.time.tostring,       jobset.processingdelay / 1000.0    ))    listenerbus.post (streaminglistenerbatchcompleted (jobset.tobatchinfo))   } &NBSP;&NBSP, .....

Jobscheduler will invoke the Handlejobcompletion method, which will eventually trigger

JobScheduler.receiverTracker.cleanupOldBlocksAndBatches (time-maxrememberduration)

The maxrememberduration here is the maximum amount of time that the RDD generated in Dstream is retained for each moment.

def cleanupoldbatches (Cleanupthreshtime:time, Waitforcompletion:boolean): Unit = synchronized {require ( Cleanupthreshtime.milliseconds < Clock.gettimemillis ()) Val Timestocleanup = timeToAllocatedBlocks.keys.filter {_ & Lt Cleanupthreshtime}.toseq loginfo ("Deleting batches" + timestocleanup) if (WriteToLog (Batchcleanupevent ( Timestocleanup)) {timetoallocatedblocks--= timestocleanup Writeaheadlogoption.foreach (_.clean.   milliseconds, waitforcompletion)} else {logwarning ("Failed to acknowledge batch clean up in the Write Ahead Log.") }}

and finally

Listenerbus.post (streaminglistenerbatchcompleted (Jobset.tobatchinfo))

This code will call the

Case batchcompleted:streaminglistenerbatchcompleted = listener.onbatchcompleted (batchcompleted) ... All the way down .../** * A Ratecontroller that sends the new rate to receivers via the receiver tracker. */private[streaming] class Receiverratecontroller (Id:int, Estimator:rateestimator) extends Ratecontroller (ID, Estima Tor) {Override def publish (Rate:long): Unit = ssc.scheduler.receiverTracker.sendRateUpdate (ID, rate)}

/** Update A receiver ' s maximum ingestion rate */def sendrateupdate (Streamuid:int, newrate:long): Unit = synchronized { if (istrackerstarted) {endpoint.send (Updatereceiverratelimit (Streamuid, Newrate))}}

Case Updatereceiverratelimit (Streamuid, newrate) + = (Info <-receivertrackinginfos.get (streamuid); EP <-in Fo.endpoint) {ep.send (Updateratelimit (newrate))}

The rate at which the data flow is controlled is finally adjusted by Blockgenerator to adjust the rate at which the message is sent to Receiver,receiver.

Case Updateratelimit (EPS) = Loginfo (S "Received a new rate limit: $eps.") Registeredblockgenerators.foreach {bg = bg.updaterate (EPS)}

Note:

1. DT Big Data Dream Factory public number Dt_spark
2, the IMF 8 o'clock in the evening big data real combat YY Live channel number: 68917580
3, Sina Weibo: Http://www.weibo.com/ilovepains

This article is from the "Ding Dong" blog, please be sure to keep this source http://lqding.blog.51cto.com/9123978/1774994

11th Lesson: Spark Streaming the Receivertracker architecture design and concrete implementation of driver in source code interpretation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

11th Lesson: Spark Streaming the Receivertracker architecture design and concrete implementation of driver in source code interpretation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

11th Lesson: Spark Streaming the Receivertracker architecture design and concrete implementation of driver in source code interpretation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support