12th lesson: Spark Streaming Source interpretation of executor fault-tolerant security

Source: Internet
Author: User

Receiver receives data to be managed by Receiversupervisorimpl.

After the Receiversupervisorimpl receives the data, the data is stored and the metadata is reported to Receivertracker.

There are three ways to executor data tolerance:

    1. Wal logs

    2. Copy of data

    3. Data stream playback for receiving receiver

/** store block and report it to driver */def pushandreportblock (     receivedblock: receivedblock,    metadataoption: option[ any],    blockidoption: option[streamblockid]  )  {  val  blockid = blockidoption.getorelse (Nextblockid)   val time =  System.currenttimemillis  val blockstoreresult = receivedblockhandler.storeblock ( Blockid, receivedblock)   logdebug (S "pushed block  $blockId  in ${( System.currenttimemillis - time)} ms ")   val numRecords =  Blockstoreresult.numrecords  val blockinfo = receivedblockinfo (streamId,  Numrecords, metadataoption, blockstoreresult)   trackerendpoint.askwithretry[boolean] ( Addblock (Blockinfo))   logdebug (S "reported block&nbSP; $blockId ")} 

The

data is stored by using Receiverblockhandler, which is implemented in two ways:

private val receivedblockhandler: receivedblockhandler = {  if  ( Writeaheadlogutils.enablereceiverlog (env.conf))  {    if  ( Checkpointdiroption.isempty)  {      throw new sparkexception (          "Cannot enable receiver write-ahead log  without checkpoint directory set.  " +            "Please use streamingcontext.checkpoint ()  to set the  checkpoint directory.  " +          " see  Documentation for more details. ")     }    new writeaheadlogbasedblockhandler (Env.blockManager,  receiver.streamId,      receiver.storageLevel, env.conf,  Hadoopconf, cHeckpointdiroption.get)   } else {    new  Blockmanagerbasedblockhandler (Env.blockmanager, receiver.storagelevel)   }}


Writeaheadlogbaseblockhandler on the one hand the data to Blockmanager management, on the other hand will write the Wal log.

Once the node crashes, the data in memory can be recovered by the Wal log. At the start of the Wal, no more copies of the recommended data are stored.

Private Val Effectivestoragelevel = {if (storagelevel.deserialized) {logwarning (S "Storage level serialization ${stor  Agelevel.deserialized} is not supported when "+ S" write ahead log was enabled, change to serialization false ")} if (Storagelevel.replication > 1) {logwarning (S "Storage level replication ${storagelevel.replication} are unnecessary when" + S "Write ahead log is Enabled, change to Replication 1 ")} storagelevel (Storagelevel.usedisk, Storagelevel.usememory, Storagelevel.useoffheap, False, 1)}


and blockmanagerbaseblockhandler the data directly to Blockmanager management.

If you do not write Wal, when the node crashes will it be data loss? That's not necessarily the same. Because the storagelevel of receiver will be passed in when building Writeaheadlogbaseblockhandler, and Blockmanagerbaseblockhandler. Storagelevel is used to describe where the data is stored (memory, disk) and the number of replicas.

Class Storagelevel Private (private var _usedisk:boolean, private var _usememory:boolean, private var _useoffhe Ap:boolean, private var _deserialized:boolean, private var _replication:int = 1) extends Externalizable

Public Storagelevel of the following kinds:

Val none = new storagelevel (False, false, false, false) Val DISK_ONLY  = new storagelevel (True, false, false, false) val disk_only_2 =  New storagelevel (true, false, false, false, 2) val memory_only = new  storagelevel (false, true, false, true) val memory_only_2 = new  Storagelevel (false, true, false, true, 2) val memory_only_ser = new  Storagelevel (False, true, false, false) val memory_only_ser_2 = new  Storagelevel (false, true, false, false, 2) val memory_and_disk = new  Storagelevel (true, true, false, true) val memory_and_disk_2 = new  Storagelevel (true, true, false, true, 2) val memory_and_disk_ser = new  Storagelevel (true, true, false,  false) Val memory_and_disk_ser_2 = new storagelevel (true, true, false,  FALSE, 2) Val off_heap = new storagelevel (False, false, true, false)


By default, the data is memory_and_disk_2, which means that the data generates two copies and is written to disk when there is not enough memory.


The final storage of the data is done and managed by Blockmanager:

Def storeblock (Blockid: streamblockid, block: receivedblock):  Receivedblockstoreresult = {  var numrecords = none: option[long]   val putresult: seq[(Blockid, blockstatus)] = block match {     case arraybufferblock (ArrayBuffer)  =>       Numrecords = some (ArrayBuffer.size.toLong)        Blockmanager.putiterator (blockid, arraybuffer.iterator, storagelevel,         tellmaster = true)     case iteratorblock (iterator)  = >      val countiterator = new countingiterator (iterator)       val putresult = blockmanager.putiterator (blockId,  countiterator, storagelevel,      &Nbsp; tellmaster = true)       numRecords =  countiterator.count      putresult    case  Bytebufferblock (Bytebuffer)  =>      blockmanager.putbytes (blockId,  Bytebuffer, storagelevel, tellmaster = true)     case o =>       throw new sparkexception (         s "could not store  $blockId  to block manager, unexpected  Block type ${o.getclass.getname} ")   }  if  (!putresult.map { _._1  }.contains (Blockid))  {    throw new sparkexception (       s "could not store  $blockId  to block manager with  storage level  $storageLevel ") &nbsP; }  blockmanagerbasedstoreresult (Blockid, numrecords)} 


For reading data directly from Kafka, fault tolerance can be achieved by recording the data offset method. If the program crashes, the next time you start, the data is read again from offset from the last unhandled data.



Note:

1. DT Big Data Dream Factory public number Dt_spark
2, the IMF 8 o'clock in the evening big data real combat YY Live channel number: 68917580
3, Sina Weibo: Http://www.weibo.com/ilovepains


This article is from the "Ding Dong" blog, please be sure to keep this source http://lqding.blog.51cto.com/9123978/1775918

12th lesson: Spark Streaming Source Interpretation executor fault-tolerant security

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.