12th lesson: Spark Streaming Source interpretation of executor fault-tolerant security

Last Update:2016-05-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

One, Spark streaming data security considerations:

Spark Streaming constantly receive data, and constantly generate jobs, and constantly submit jobs to the cluster to run. So this involves a very important problem with data security.
Spark Streaming is based on the spark core, if you can ensure that the data is safe, when the spark streaming generated the job is based on the RDD, even if the problem occurs when running, then spark streaming can also use spark The core's fault-tolerant mechanism is automatically fault tolerant.
executor capacity Fault is primarily security-tolerant to data
> Why not consider the fault-tolerant data calculation: When computing, spark streaming is fault-tolerant on the spark core, so nature is safe and reliable.

Executor fault Tolerant mode:
1. The simplest fault tolerance is the copy mode, which is based on the underlying Blockmanager replica fault tolerance, and is the default fault tolerant method.

2.WAL Log Mode

3. After receiving the data do not make a copy, support data replay, so-called replay is to support the repeated reading of data.

Blockmanager Backup:

By default in memory two copies, that is, the spark streaming receiver received data after the storage of the time to specify Storagelevel as memory_and_disk_ser_2, the underlying storage is given to Blockmanager, The semantics of Blockmanager ensure that if two copies are specified, they are generally in memory. So at least two executor will have the data.

Receiver will give the data to Blockmanger is handled by Receiveredblockhandler, there are two kinds ofimplementation of Receiveredblockhandler:1.Writeaheadlogbasedblockhandler2.Blockmanagerbasedblockhandler
The storagelevel here is passed in when the Inputdstream is built,Sockettextstream'sThe default storage level is storagelevel. memory_and_disk_ser_2

If you useWriteaheadlogbasedblockhandler need to open Wal, default does not open:

wal log mode:This way the data is now written to the log file, which is the checkpoint directory, where the exception is to re-read the data from the checkpoint directory for recovery. When you start the Wal, it is not necessary to set the number of replicas to be greater than 1 and not require serialization.

The Wal will write the data simultaneously to Blockmanager and write ahead log, and is parallel to write block, of course, two blocks of storage is completed before it is returned.

To deposit a block into Blockmanager:

To deposit a block into the Wal log:

wal write data in sequential , the data is not modifiable, so just read it by the pointer (that is, the record you want to read is there, how long it is). So Wal's speed is very fast.

 browse Writeaheadlog , he is an abstract class:

看一下an implementation class for WriteaheadlogFilebasedwriteaheadlogThe Write method:

根据不同时间获取不同Writer将序列化结果写入文件,返回一个filebasedwriteaheadlogsegmentan object of type Filesegment.

Read data:

It creates a Filebasewriteaheadlograndomreader object, and then calls the Read method of the object:

Supports data replay.

Kafka has receiver mode and direct mode
receiver way: is to give zookeeper to manage the data, That is, offset offsets. If it fails, Kafka will re-read based on offset, because the data is processed in the middle of the crash, will not send an ACK to zookeeper, at this time zookeeper think you do not have the message this data. But the more you use in practice, the more direct it is to directly manipulate offset. and manage offset.

Directkafkainputdstream will check the latest offset and put the offset in batch.
When batch is generated each time, the latestleaderoffsets is called to view the nearest offset, at which point the offset is subtracted from the previous offset to get the batch range. So that you can read the data.

Protected final def latestleaderoffsets(retries:int):Map[topicandpartition, Leaderoffset] = {val o = kc.getlatestleaderoffsets (currentoffsets.keyset)//Either.fold would C Onfuse@tailrec, do it manuallyif(O.isleft) {val err = o.left.get.tostringif(Retries <=0) {throw new Sparkexception (ERR)}Else{Log.error (Err) thread.sleep (kc.config.refreshLeaderBackoffMs) latestleaderoffsets (Retries-1)    }  }Else{O.right.get}}
  
 
   
  
  1
   
  
  2
   
  
  3
   
  
  4
   
  
  5
   
  
  6
   
  
  7
   
  
  8
   
  
  9
   
  
  10
   
  
  11
   
  
  12
   
  
  13
   
  
  14
   
  
  15
   
  
  16
   
  
  17
  
 
  
 
   
  
  1
   
  
  2
   
  
  3
   
  
  4
   
  
  5
   
  
  6
   
  
  7
   
  
  8
   
  
  9
   
  
  10
   
  
  11
   
  
  12
   
  
  13
   
  
  14
   
  
  15
   
  
  16
   
  
  17

From Wiznote

12th lesson: Spark Streaming Source interpretation of executor fault-tolerant security

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

12th lesson: Spark Streaming Source interpretation of executor fault-tolerant security

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

12th lesson: Spark Streaming Source interpretation of executor fault-tolerant security

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support