13th lesson: Spark Streaming Source interpretation of driver fault-tolerant security

Source: Internet
Author: User

The objectives of this blog post are as follows:
1. Receiverblocktracker Fault-tolerant security
2. Dstream and Jobgenerator fault-tolerant security

The article is organized in the following ways:
considering driver fault-tolerant security, what do we have to think about?
Detailed analysis of Receiverblocktracker,dstream and Jobgenerator fault-tolerant security

One: Fault-tolerant security
1. Receivedblocktracker is responsible for managing the metadata of the spark streaming run program. Data plane
2. Dstream and Jobgenerator are the core level of job scheduling, that is, the specific scheduling to what extent, from the operational considerations. Dstream is a logical plane.
3. Job survival level, jobgenerator is the job scheduling level, the specific scheduling to what extent. From the point of view of the run.
talk about driver fault tolerance you have to consider the driver that need to maintain state of the run.
1. Receivedblocktracker tracks the data and therefore requires fault tolerance. Fault tolerance through the Wal method.
2. Dstreamgraph expresses the dependency, and when the state is restored it is necessary to calculate the logical level of dependency based on the Dstream recovery. Fault tolerance through checkpoint mode.
3. Jobgenerator surface How do you create job processes based on data in Receiverblocktracker, as well as the dependency relationships that Dstream make up? How far do you spend that data?

summarized as follows:

receivedblocktracker:
1. Receivedblocktracker manages all data during the spark streaming run. and assign the data to the required batches, all the actions will be written to the log by the Wal, driver failure, you can restore tracker state according to history, when the Receivedblocktracker created, Use checkpoint to save the history directory.

/** * This class manages the execution Of the receivers of Receiverinputdstreams. Instance of * This class must is created after all input streams having been added and Streamingcontext.start () * has been C Alled because it needs the final set of input streams at the time of instantiation. * *  @param  Skipreceiverlaunch do not launch the receiver. This was useful for testing. */ private  [Streaming] class  receivertracker   (Ssc:streamingcontext, Skipreceiverlaunch:boolean = False)  extends  logging  {

below is the beginning of what to do after receiving data from receiver.
2. Receiverblocktracker.addblock source code is as follows:
Receiver receives data, the meta-data information to report, and then through the Receiversupervisorimpl to report the data, directly through the Wal-fault tolerance.
When receiver's manager, Receiversupervisorimpl the meta-data information to driver, is processing is to give receiverblocktracker. Receiverblocktracker writes the data into the Wal file before it is written into memory and is used by the scheduler of the current spark streaming program, which is jobgenerator. Jobgenerator cannot use Wal directly. Wal's data on disk, here Jobgenerator uses the in-memory cached data structure

/** ADD received block. This event would get written to the write ahead log (if enabled). */defAddblock (receivedblockinfo:receivedblockinfo): Boolean = {Try{//WriteToLog    ValWriteresult = WriteToLog (blockadditionevent (Receivedblockinfo))if(Writeresult) {synchronized {//data reporting only when it is successfully written into Wal will the Receivedblockinfo metadata information be put into the queueGetreceivedblockqueue (receivedblockinfo.streamid) + = Receivedblockinfo} logdebug (s"Stream ${receivedblockinfo.streamid} received"+ S"Block ${receivedblockinfo.blockstoreresult.blockid}")    }Else{Logdebug (s"Failed to acknowledge stream ${receivedblockinfo.streamid} receiving"+ S"Block ${receivedblockinfo.blockstoreresult.blockid} in the Write Ahead Log.")} Writeresult}Catch{ CaseNonfatal (E) = LogError (s"Error adding block $receivedBlockInfo"Efalse}}

At this point the data structure is the streamidtounallocatedblockqueues,driver end of the received information stored in the streamidtounallocatedblockqueues.

privatevalnewmutable.HashMap[Int, ReceivedBlockQueue]
3.  

What the hell is batchtime?
Batchtime is the time after the last job has allocated data to begin receiving data again.

/** * Allocate all unallocated blocks to the given batch. * This event would get written to the write ahead log (if Ena Bled). */defAllocateblockstobatch (batchtime:time): Unit = synchronized {if(Lastallocatedbatchtime = =NULL|| Batchtime > Lastallocatedbatchtime) {//Streamidtoblocks obtained all assigned data    ValStreamidtoblocks = streamids.map {Streamid =///Getreceivedblockqueue stored the data obtained by Streamid. If you want to assign to batch,//let the data out of the queue is OK. (Streamid, Getreceivedblockqueue (Streamid). Dequeueall (x =true))}.tomapValAllocatedblocks = Allocatedblocks (streamidtoblocks)///The metadata information is not immediately assigned to the job or the Wal//So if the driver error occurs, then recovery will be able to assign the job's normal allocation to those block states//This refers to the allocation of the Block state for batch time can be restored back.     if(WriteToLog (Batchallocationevent (Batchtime, allocatedblocks))) {//jobgenerator is to get the data from the timetoallocatedblocks. //This time period batchtime will know to deal with those data allocatedblocksTimetoallocatedblocks.put (Batchtime, Allocatedblocks)//Lastallocatedbatchtime = Batchtime}Else{Loginfo (s"Possibly processed batch $batchTime need to being processed again in WAL recovery")    }  }Else{//This situation occurs when:    //1. WAL is ended with batchallocationevent, but without batchcleanupevent,    //Possibly Processed batch job or half-processed batch job need to be processed again,    //So the batchtime is equal to Lastallocatedbatchtime.    //2. Slow checkpointing makes recovered batch time older than WAL recovered    //Lastallocatedbatchtime.    //This situation would only occurs in recovery time.Loginfo (S"Possibly processed batch $batchTime need to being processed again in WAL recovery")  }}
4.  timeToAllocatedBlocks可以有很多的时间窗口的Blocks,也就是Batch Duractions的Blocks。这里面就维护了很多Batch Duractions分配的数据,假设10秒是一个Batch Duractions也就是10s产生一个Job的话,如果此时想算过去的数据,只需要根据时间进行聚合操作即可。
privatevalnewmutable.HashMap[Time, AllocatedBlocks]
5.  根据streamId获取Block信息
/** Class representing the blocks of all the streams allocated to a batch */private[streaming]case class AllocatedBlocks(streamIdToAllocatedBlocks: Map[Int, Seq[ReceivedBlockInfo]]) {  def getBlocksOfStream(streamId: Int): Seq[ReceivedBlockInfo] = {    streamIdToAllocatedBlocks.getOrElse(streamId, Seq.empty)  }}
6.  cleanupOldBatches:因为时间的推移会不断的生成RDD,RDD会不断的处理数据,因此不可能一直保存历史数据。
/** * Clean up block information of the old batches. If waitforcompletion isTrue, this method * returns only after the files is cleaned up. */ def cleanupoldbatches(cleanupthreshtime:time, Waitforcompletion:boolean):Unit = synchronized {require (Cleanupthreshtime.milliseconds < Clock.gettimemillis ()) Val timestocleanup = Timetoallo CatedBlocks.keys.filter {_ < Cleanupthreshtime}.toseq Loginfo ("Deleting Batches"+ timestocleanup)//walif(WriteToLog (Batchcleanupevent (timestocleanup))) {timetoallocatedblocks--= timestocleanup Writeaheadlogoption.foreach (_.clean (cleanupthreshtime.milliseconds, wait forcompletion))}Else{logwarning ("Failed to acknowledge batch clean up in the Write Ahead Log.")  }}
  7. WriteToLog source code is as follows:  
/** write an update to the tracker to the Write ahead log */  private def writetolog  (record: receivedblocktrackerlogevent): Boolean = { if  (iswriteaheadlogenabled) {LogTrace (s  "Writing record: $record" ) try  {WriteAheadLogOption.get.write (Bytebuffer.wrap (Utils.serialize (record)), Clock.gettimemillis ()) true } catch  { case  Nonfatal (E) = logwarning (S , E) false }} else  { true }}  

Summary:
Wal's management of data includes data generation, data destruction and consumption. The above will be written to the Wal file first after the operation.

Jobgenerator:
Checkpoint will have time interval batch Duractions,batch checkpoint before and after execution.
Docheckpoint the pre-and post-process that is invoked:

    1. Generatejobs:
/** Generate jobs and perform checkpoint for the given ' time '. */Private def generatejobs (time:time) {//SetThe sparkenvinchThis thread, so, job generation code can access the environment//Example:blockrdds isCreatedinchThis thread, andIt needs toAccess Blockmanager//Update: this isProbably redundant AfterThreadlocal StuffinchSparkenv has been removed. Sparkenv.Set(ssc.env) Try {jobScheduler.receiverTracker.allocateBlocksToBatch ( Time) //AllocateReceived blocks toBatch Graph.generatejobs ( Time)//Generate jobsusingAllocated Block}Match{ CaseSuccess (jobs) = val Streamidtoinputinfos = JobScheduler.inputInfoTracker.getInfo ( Time) Jobscheduler.submitjobset (Jobset ( Time, Jobs, Streamidtoinputinfos)) CaseFailure (E) = Jobscheduler.reporterror ("Error Generating jobs for Time"+ Time, e)}//above self-study on that end will need to carry out Checkpoint Eventloop.post (Docheckpoint ( Time, Clearcheckpointdatalater =false))}
2.  processEvent接收到消息
/** Processes all events */privateprocessEvent(event: JobGeneratorEvent) {  logDebug("Got event " + event)  event match {    case GenerateJobs(time) => generateJobs(time)    case ClearMetadata(time) => clearMetadata(time)    case DoCheckpoint(time, clearCheckpointDataLater) =>// doCheckpoint被调用      doCheckpoint(time, clearCheckpointDataLater)    case ClearCheckpointData(time) => clearCheckpointData(time)  }}
3.  把当前的状态进行Checkpoint.
/** Perform checkpoint for the give `time`. */privatedoCheckpoint(time: Time, clearCheckpointDataLater: Boolean) {  if (shouldCheckpoint && (time - graph.zeroTime).isMultipleOf(ssc.checkpointDuration)) {    logInfo("Checkpointing graph for time " + time)    ssc.graph.updateCheckpointData(time)    checkpointWriter.write(new Checkpoint(ssc, time), clearCheckpointDataLater)  }}
4.  DStream中的updateCheckpointData源码如下:最终导致RDD的Checkpoint
/** * Refresh the List  ofcheckpointed RDDs thatwould be saved along withCheckpoint of* This stream. This isAn internal method thatShould notbe called directly. This is* A default implementation thatSaves only the fileNames of  thecheckpointed RDDs to* Checkpointdata. Subclasses ofDStream (especially those ofInputdstream) may override * This method toSave custom checkpoint data. */private[streaming] def updatecheckpointdata (currenttime:time) {Logdebug ("Updating checkpoint data for Time"+ currenttime) checkpointdata.update (currenttime) Dependencies.foreach (_.updatecheckpointdata (currentTime)) Logdebug ("Updated checkpoint data for Time"+ CurrentTime +": "+ checkpointdata)}
5.  shouldCheckpoint是状态变量。
isisafterincontextandnullnull

Jobgenerator Fault-tolerant security such as:

This course note comes from:

13th lesson: Spark Streaming Source interpretation of driver fault-tolerant security

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.