Receiver receives data to be managed by Receiversupervisorimpl.
After the Receiversupervisorimpl receives the data, the data is stored and the metadata is reported to Receivertracker.
There are three ways to executor data tolerance:
Wal logs
Copy of data
Data stream playback for receiving receiver
/** store block and report it to driver */def pushandreportblock ( receivedblock: receivedblock, metadataoption: option[ any], blockidoption: option[streamblockid] ) { val blockid = blockidoption.getorelse (Nextblockid) val time = System.currenttimemillis val blockstoreresult = receivedblockhandler.storeblock ( Blockid, receivedblock) logdebug (S "pushed block $blockId in ${( System.currenttimemillis - time)} ms ") val numRecords = Blockstoreresult.numrecords val blockinfo = receivedblockinfo (streamId, Numrecords, metadataoption, blockstoreresult) trackerendpoint.askwithretry[boolean] ( Addblock (Blockinfo)) logdebug (S "reported block&nbSP; $blockId ")}
The
data is stored by using Receiverblockhandler, which is implemented in two ways:
private val receivedblockhandler: receivedblockhandler = { if ( Writeaheadlogutils.enablereceiverlog (env.conf)) { if ( Checkpointdiroption.isempty) { throw new sparkexception ( "Cannot enable receiver write-ahead log without checkpoint directory set. " + "Please use streamingcontext.checkpoint () to set the checkpoint directory. " + " see Documentation for more details. ") } new writeaheadlogbasedblockhandler (Env.blockManager, receiver.streamId, receiver.storageLevel, env.conf, Hadoopconf, cHeckpointdiroption.get) } else { new Blockmanagerbasedblockhandler (Env.blockmanager, receiver.storagelevel) }}
Writeaheadlogbaseblockhandler on the one hand the data to Blockmanager management, on the other hand will write the Wal log.
Once the node crashes, the data in memory can be recovered by the Wal log. At the start of the Wal, no more copies of the recommended data are stored.
Private Val Effectivestoragelevel = {if (storagelevel.deserialized) {logwarning (S "Storage level serialization ${stor Agelevel.deserialized} is not supported when "+ S" write ahead log was enabled, change to serialization false ")} if (Storagelevel.replication > 1) {logwarning (S "Storage level replication ${storagelevel.replication} are unnecessary when" + S "Write ahead log is Enabled, change to Replication 1 ")} storagelevel (Storagelevel.usedisk, Storagelevel.usememory, Storagelevel.useoffheap, False, 1)}
and blockmanagerbaseblockhandler the data directly to Blockmanager management.
If you do not write Wal, when the node crashes will it be data loss? That's not necessarily the same. Because the storagelevel of receiver will be passed in when building Writeaheadlogbaseblockhandler, and Blockmanagerbaseblockhandler. Storagelevel is used to describe where the data is stored (memory, disk) and the number of replicas.
Class Storagelevel Private (private var _usedisk:boolean, private var _usememory:boolean, private var _useoffhe Ap:boolean, private var _deserialized:boolean, private var _replication:int = 1) extends Externalizable
Public Storagelevel of the following kinds:
Val none = new storagelevel (False, false, false, false) Val DISK_ONLY = new storagelevel (True, false, false, false) val disk_only_2 = New storagelevel (true, false, false, false, 2) val memory_only = new storagelevel (false, true, false, true) val memory_only_2 = new Storagelevel (false, true, false, true, 2) val memory_only_ser = new Storagelevel (False, true, false, false) val memory_only_ser_2 = new Storagelevel (false, true, false, false, 2) val memory_and_disk = new Storagelevel (true, true, false, true) val memory_and_disk_2 = new Storagelevel (true, true, false, true, 2) val memory_and_disk_ser = new Storagelevel (true, true, false, false) Val memory_and_disk_ser_2 = new storagelevel (true, true, false, FALSE, 2) Val off_heap = new storagelevel (False, false, true, false)
By default, the data is memory_and_disk_2, which means that the data generates two copies and is written to disk when there is not enough memory.
The final storage of the data is done and managed by Blockmanager:
Def storeblock (Blockid: streamblockid, block: receivedblock): Receivedblockstoreresult = { var numrecords = none: option[long] val putresult: seq[(Blockid, blockstatus)] = block match { case arraybufferblock (ArrayBuffer) => Numrecords = some (ArrayBuffer.size.toLong) Blockmanager.putiterator (blockid, arraybuffer.iterator, storagelevel, tellmaster = true) case iteratorblock (iterator) = > val countiterator = new countingiterator (iterator) val putresult = blockmanager.putiterator (blockId, countiterator, storagelevel, &Nbsp; tellmaster = true) numRecords = countiterator.count putresult case Bytebufferblock (Bytebuffer) => blockmanager.putbytes (blockId, Bytebuffer, storagelevel, tellmaster = true) case o => throw new sparkexception ( s "could not store $blockId to block manager, unexpected Block type ${o.getclass.getname} ") } if (!putresult.map { _._1 }.contains (Blockid)) { throw new sparkexception ( s "could not store $blockId to block manager with storage level $storageLevel ") &nbsP; } blockmanagerbasedstoreresult (Blockid, numrecords)}
For reading data directly from Kafka, fault tolerance can be achieved by recording the data offset method. If the program crashes, the next time you start, the data is read again from offset from the last unhandled data.
Note:
1. DT Big Data Dream Factory public number Dt_spark
2, the IMF 8 o'clock in the evening big data real combat YY Live channel number: 68917580
3, Sina Weibo: Http://www.weibo.com/ilovepains
This article is from the "Ding Dong" blog, please be sure to keep this source http://lqding.blog.51cto.com/9123978/1775918
12th lesson: Spark Streaming Source Interpretation executor fault-tolerant security