12th lesson: Spark Streaming Source interpretation of executor fault-tolerant security

Last Update:2016-05-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Receiver receives data to be managed by Receiversupervisorimpl.

After the Receiversupervisorimpl receives the data, the data is stored and the metadata is reported to Receivertracker.

There are three ways to executor data tolerance:

Wal logs
Copy of data
Data stream playback for receiving receiver

/** store block and report it to driver */def pushandreportblock (     receivedblock: receivedblock,    metadataoption: option[ any],    blockidoption: option[streamblockid]  )  {  val  blockid = blockidoption.getorelse (Nextblockid)   val time =  System.currenttimemillis  val blockstoreresult = receivedblockhandler.storeblock ( Blockid, receivedblock)   logdebug (S "pushed block  $blockId  in ${( System.currenttimemillis - time)} ms ")   val numRecords =  Blockstoreresult.numrecords  val blockinfo = receivedblockinfo (streamId,  Numrecords, metadataoption, blockstoreresult)   trackerendpoint.askwithretry[boolean] ( Addblock (Blockinfo))   logdebug (S "reported block&nbSP; $blockId ")}

The

data is stored by using Receiverblockhandler, which is implemented in two ways:

private val receivedblockhandler: receivedblockhandler = {  if  ( Writeaheadlogutils.enablereceiverlog (env.conf))  {    if  ( Checkpointdiroption.isempty)  {      throw new sparkexception (          "Cannot enable receiver write-ahead log  without checkpoint directory set.  " +            "Please use streamingcontext.checkpoint ()  to set the  checkpoint directory.  " +          " see  Documentation for more details. ")     }    new writeaheadlogbasedblockhandler (Env.blockManager,  receiver.streamId,      receiver.storageLevel, env.conf,  Hadoopconf, cHeckpointdiroption.get)   } else {    new  Blockmanagerbasedblockhandler (Env.blockmanager, receiver.storagelevel)   }}

Writeaheadlogbaseblockhandler on the one hand the data to Blockmanager management, on the other hand will write the Wal log.

Once the node crashes, the data in memory can be recovered by the Wal log. At the start of the Wal, no more copies of the recommended data are stored.

Private Val Effectivestoragelevel = {if (storagelevel.deserialized) {logwarning (S "Storage level serialization ${stor  Agelevel.deserialized} is not supported when "+ S" write ahead log was enabled, change to serialization false ")} if (Storagelevel.replication > 1) {logwarning (S "Storage level replication ${storagelevel.replication} are unnecessary when" + S "Write ahead log is Enabled, change to Replication 1 ")} storagelevel (Storagelevel.usedisk, Storagelevel.usememory, Storagelevel.useoffheap, False, 1)}

and blockmanagerbaseblockhandler the data directly to Blockmanager management.

If you do not write Wal, when the node crashes will it be data loss? That's not necessarily the same. Because the storagelevel of receiver will be passed in when building Writeaheadlogbaseblockhandler, and Blockmanagerbaseblockhandler. Storagelevel is used to describe where the data is stored (memory, disk) and the number of replicas.

Class Storagelevel Private (private var _usedisk:boolean, private var _usememory:boolean, private var _useoffhe Ap:boolean, private var _deserialized:boolean, private var _replication:int = 1) extends Externalizable

Public Storagelevel of the following kinds:

Val none = new storagelevel (False, false, false, false) Val DISK_ONLY  = new storagelevel (True, false, false, false) val disk_only_2 =  New storagelevel (true, false, false, false, 2) val memory_only = new  storagelevel (false, true, false, true) val memory_only_2 = new  Storagelevel (false, true, false, true, 2) val memory_only_ser = new  Storagelevel (False, true, false, false) val memory_only_ser_2 = new  Storagelevel (false, true, false, false, 2) val memory_and_disk = new  Storagelevel (true, true, false, true) val memory_and_disk_2 = new  Storagelevel (true, true, false, true, 2) val memory_and_disk_ser = new  Storagelevel (true, true, false,  false) Val memory_and_disk_ser_2 = new storagelevel (true, true, false,  FALSE,&NBSP;2) Val off_heap = new storagelevel (False, false, true, false)

By default, the data is memory_and_disk_2, which means that the data generates two copies and is written to disk when there is not enough memory.

The final storage of the data is done and managed by Blockmanager:

Def storeblock (Blockid: streamblockid, block: receivedblock):  Receivedblockstoreresult = {  var numrecords = none: option[long]   val putresult: seq[(Blockid, blockstatus)] = block match {     case arraybufferblock (ArrayBuffer)  =>       Numrecords = some (ArrayBuffer.size.toLong)        Blockmanager.putiterator (blockid, arraybuffer.iterator, storagelevel,         tellmaster = true)     case iteratorblock (iterator)  = >      val countiterator = new countingiterator (iterator)       val putresult = blockmanager.putiterator (blockId,  countiterator, storagelevel,      &Nbsp; tellmaster = true)       numRecords =  countiterator.count      putresult    case  Bytebufferblock (Bytebuffer)  =>      blockmanager.putbytes (blockId,  Bytebuffer, storagelevel, tellmaster = true)     case o =>       throw new sparkexception (         s "could not store  $blockId  to block manager, unexpected  Block type ${o.getclass.getname} ")   }  if  (!putresult.map { _._1  }.contains (Blockid))  {    throw new sparkexception (       s "could not store  $blockId  to block manager with  storage level  $storageLevel ") &nbsP; }  blockmanagerbasedstoreresult (Blockid, numrecords)}

For reading data directly from Kafka, fault tolerance can be achieved by recording the data offset method. If the program crashes, the next time you start, the data is read again from offset from the last unhandled data.

Note:

1. DT Big Data Dream Factory public number Dt_spark
2, the IMF 8 o'clock in the evening big data real combat YY Live channel number: 68917580
3, Sina Weibo: Http://www.weibo.com/ilovepains

This article is from the "Ding Dong" blog, please be sure to keep this source http://lqding.blog.51cto.com/9123978/1775918

12th lesson: Spark Streaming Source Interpretation executor fault-tolerant security

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

12th lesson: Spark Streaming Source interpretation of executor fault-tolerant security

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

12th lesson: Spark Streaming Source interpretation of executor fault-tolerant security

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support