13th lesson: Spark Streaming Source interpretation of driver fault-tolerant security

Source: Internet
Author: User

Contents of this issue:

    1. Receivedblocktracker Fault-tolerant security

    2. Dstream and Jobgenerator fault-tolerant security


Driver has two levels of fault tolerance: 1. Receiver receives the metadata of data 2. Driver management of each component information (scheduling and drive plane)


Metadata uses Wal's fault-tolerant mechanism

Case addblock (Receivedblockinfo)  =>  if  (writeaheadlogutils.isbatchingenabled ( ssc.conf, isdriver = true))  {    walbatchingthreadpool.execute (new  runnable {      override def run (): Unit =  utils.trylognonfatalerror {        if  (Active)  {           context.reply (Addblock (receivedBlockInfo))          } else {           throw new illegalstateexception ("Receivertracker rpcendpoint shut down.")         }      }     })   } else {    context.reply (Addblock (receivedBlockInfo))    }    ...    /** add new blocks for the given stream */ Private def addblock (Receivedblockinfo: receivedblockinfo): boolean = {   receivedblocktracker.addblock (Receivedblockinfo)}


Metadata is actually managed by Receivedblocktracker.

Def addblock (receivedblockinfo: receivedblockinfo):  boolean = {  try {     val writeresult = writetolog (Blockadditionevent (ReceivedBlockInfo))     if  (Writeresult)  {      synchronized {         getreceivedblockqueue (Receivedblockinfo.streamid)  +=  receivedblockinfo      }      logdebug (S " stream ${receivedblockinfo.streamid} received  " +         s "Block ${receivedblockinfo.blockstoreresult.blockid}")     } else  {      logdebug (S "failed to acknowledge stream ${ receivedblockinfo.streamid} receiving  " +        s" Block ${receivedblockinfo.blockstorerEsult.blockid} in the write ahead log. ")     }    writeResult  } catch {     case nonfatal (e)  =>      logerror (S "Error adding  block  $receivedBlockInfo ",  e)       false  }}

The

calls the WriteToLog method first:

/** write an update to the tracker to the write ahead  Log */private def writetolog (record: receivedblocktrackerlogevent):  Boolean =  {  if  (iswriteaheadlogenabled)  {    logtrace (S "writing  record:  $record ")     try {       WriteAheadLogOption.get.write (Bytebuffer.wrap (Utils.serialize (record)),         clock.gettimemillis ())       true    } catch  {      case nonfatal (e)  =>         logwarning (S "exception thrown while writing record:  $record  to  the writeaheadlog. ",  e)         false     }  } else {    true  }} 


The data is then written to the Streamidtounallocatedblockqueue queue.


After every batchinterval time, streaming's job is triggered to run. The data in the Streamidtounallocatedblockqueue queue is assigned to a specific time.

Def allocateblockstobatch (batchtime: time):  unit = synchronized {  if   (lastallocatedbatchtime == null | |  batchtime > lastallocatedbatchtime)  {    val streamidtoblocks  = streamIds.map { streamId =>         ( Streamid, getreceivedblockqueue (Streamid). Dequeueall (x => true))     }. Tomap    val allocatedblocks = allocatedblocks (streamIdToBlocks)      if  (WriteToLog (Batchallocationevent (batchtime, allocatedblocks))  {       timetoallocatedblocks.put (batchtime, allocatedblocks)        lastAllocatedBatchTime = batchTime    } else {       loginfo (S "possibly processed batch  $batChtime need to be processed again in wal recovery ")      }  } else {    // this situation occurs when :     // 1. wal is ended with batchallocationevent, but  without BatchCleanupEvent,    // possibly processed batch  job or half-processed batch job need to be processed again,     // so the batchTime will be equal to  lastallocatedbatchtime.    // 2. slow checkpointing makes  recovered batch time older than wal recovered    //  Lastallocatedbatchtime.    // this situation will only occurs  in recovery time. &nbSp;  loginfo (S "possibly processed batch  $batchTime  need to be  Processed again in wal recovery ")   }}

The Wal log is also written during this process


Jobgenerator will be triggered at every batchinterval time to generate job

/** generate jobs and perform checkpoint for the given  ' time '.   */private def generatejobs (time: time)  {  // Set the  sparkenv in this thread, so that job generation code can  Access the environment  // example: blockrdds are created in  this thread, and it needs to access BlockManager  //  update: this is probably redundant after threadlocal stuff in  Sparkenv has been removed.  sparkenv.set (ssc.env)   Try {     jobscheduler.receivertracker.allocateblockstobatch (Time)  // allocate received  Blocks to batch    graph.generatejobs (Time)  // generate jobs  using allocated Block  } match {    case success (Jobs)  =>       val streamidtoinputinfos = jobscheduler.inputinfotracker.getinfo (Time)       jobscheduler.submitjobset (Jobset (time, jobs,  Streamidtoinputinfos))     case failure (e)  =>       jobscheduler.reporterror ("error generating jobs for time "  + time,  e)   }  eventloop.post (Docheckpoint (time, clearcheckpointdatalater =  False)}

Finally, a docheckpoint message is placed in the message loop queue.

Jobgenerator after receiving the message:

/** Processes All Events */private def processevent (event:jobgeneratorevent) {logdebug ("Got event" + Event) event Mat CH {case Generatejobs (time) = Generatejobs (time) case Clearmetadata (time) = Clearmetadata (time) Case DoC Heckpoint (time, clearcheckpointdatalater) = Docheckpoint (time, clearcheckpointdatalater) case Clearcheckpointd ATA (time) = Clearcheckpointdata (Time)}}
/** Perform checkpoint for the give ' time '. */private def docheckpoint (Time:time, Clearcheckpointdatalater:boolean) {if (Shouldcheckpoint && (time-graph . Zerotime). ismultipleof (ssc.checkpointduration)) {loginfo ("checkpointing Graph for Time" + time) Ssc.graph.updateC Heckpointdata (Time) Checkpointwriter.write (New Checkpoint (SSC, time), Clearcheckpointdatalater)}}

A checkpoint object is generated based on SSC and time. And the SSC has all the driver information. So when driver crashes, it can recover driver based on checkpoint data.

The recovery code is as follows:

/** restarts the generator based on the information in checkpoint  */private def restart ()  {  // if manual clock is being  used for testing, then  // either set the manual clock  to the last checkpointed time,  // or if the property  is defined set it to that time  if  (clock.isinstanceof[ Manualclock])  {    val lastTime =  ssc.initialcheckpoint.checkpointtime.milliseconds    val jumptime =  Ssc.sc.conf.getLong ("Spark.streaming.manualClock.jump",  0)     clock.asinstanceof[ Manualclock].settime (lasttime + jumptime)   }  val batchduration =  ssc.graph.batchduration  // batches when the master was down, that is,  // between the  checkpoint and current restart time  val checkpointtime =  Ssc.initialcheckpoint.checkpointtime  val restarttime = new time ( Timer.getrestarttime (graph.zeroTime.milliseconds))   val downTimes =  Checkpointtime.until (restarttime, batchduration)   loginfo ("batches during down  time  (" + downTimes.size + "  batches):  "    +  Downtimes.mkstring (", "))   // batches that were unprocessed before  failure  val pendingtimes = ssc.initialcheckpoint.pendingtimes.sorted ( time.ordering)   loginfo ("batches pending processing  ("  + pendingtimes.size  +  " batches): "  +    pendingtimEs.mkstring (", "))   // reschedule jobs for these times  val  timesToReschedule =  (pendingtimes ++ downtimes) .filter { _ <  Restarttime }    .distinct.sorted (time.ordering)   loginfo ("Batches to  reschedule  (" + timesToReschedule.size + "  batches):  " +     timestoreschedule.mkstring (", "))   timestoreschedule.foreach { time  =>    // Allocate the related blocks when  recovering from failure, because some blocks that were     // added but not allocated, are dangling in the queue  after recovering, we have to allocate    // those  Blocks to the next batch, which is the batch they were supposed to go.     jobscheduler.receivertracker.allocateblockstobatch (Time)  // allocate received  blocks to batch    jobscheduler.submitjobset (JobSet (time,  Graph.generatejobs (Time)))   }  // restart the timer  timer.start ( Restarttime.milliseconds)   loginfo ("restarted jobgenerator at "  +  Restarttime)}



Note:

1. DT Big Data Dream Factory public number Dt_spark
2, the IMF 8 o'clock in the evening big data real combat YY Live channel number: 68917580
3, Sina Weibo: Http://www.weibo.com/ilovepains


This article is from the "Ding Dong" blog, please be sure to keep this source http://lqding.blog.51cto.com/9123978/1775946

13th lesson: Spark Streaming Source interpretation of driver fault-tolerant security

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.