Spark Streaming Backpressure Analysis

Last Update:2016-04-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

---restore content starts---

1, why introduce backpressure

by default,Spark StreamingthroughReceiverdata is received at the rate of producer production data, which occurs during the calculationBatch processing time > Batch intervalthe situation in whichBatch processing timeto actually calculate a lot of time spent,Batch Intervalto beStreamingthe batch interval at which the settings are applied. This means thatSpark StreamingThe data receive rate is higher thanSparkThe rate at which data is removed from the queue, that is, the data processing capability is low and the current receive rate is not fully processed during the set interval. If this condition persists for too long, it can cause data to accumulate in memory, causingReceiverwhereExecutormemory Overflow and other issues (if setStoragelevelincludedisk,data that is not stored in memory isdisk,increased latency). Spark 1.5Previous versions, if users want to restrictReceiverthe data receive rate can be set by setting the static configuration parameters"spark.streaming.receiver.maxRate"value, although this can be done by limiting the rate of reception to match the current processing power to prevent memory overflow, but also introduce other issues. For example:producerdata production is higher thanmaxrate, the current cluster processing power is also higher thanmaxrate, which will lead to a decline in resource utilization and so on. In order to better coordinate the data receiving rate and the resource processing ability,Spark Streamingfromv1.5The introduction of the reverse pressure mechanism is initiated (back-pressure),The data receiving rate can be dynamically controlled to match the capability of cluster processing.

2 , backpressure

Spark streaming backpressure: according toJobschedulerFeedback job execution information to dynamically adjustReceiverdata reception rate. Through the properties "spark.streaming.backpressure.enabledto control whether the Enablebackpressuremechanism, default valuefalse, which is not enabled.

2.1 Streaming as shown in the architecture (see Streaming data Reception process documentation and Streaming Source parsing)

2.2 backpressure the execution process is as follows:

On the basis of the original architecture, add a new component Ratecontroller, which is responsible for monitoring the "onbatchcompleted" event and extracting Processingdelay and schedulingdelay information from it. The estimator estimates the maximum processing speed (rate) based on this information, and finally the receiver-based input Stream transfers rate through Receivertracker and Receiversupervisorimpl to Blockgenerator (inherited from Ratelimiter).

3 , backpressure Source Parsing

3.1 Ratecontroller class System

Ratencontroller inherits from Streaminglistener. used to process batchcompleted event. The core code is:

* * A Streaminglistener that receives batch completion updates, and maintains * an estimate of the speed at which this St Ream should ingest messages, * Given an estimate computation from a ' rateestimator ' */private[streaming] abstract class Ra Tecontroller (Val streamuid:int, rateestimator:rateestimator) extends Streaminglistener with Serializable {...../** * C   Ompute the new rate limit and publish it asynchronously.      */Private Def computeandpublish (Time:long, Elems:long, Workdelay:long, waitdelay:long): Unit = Future[unit] { Val newrate = Rateestimator.compute (Time, Elems, Workdelay, Waitdelay) Newrate.foreach {s = Ratelimit.s ET (S.tolong) Publish (Getlatestrate ())}} def getlatestrate (): Long = Ratelimit.get () override Def ONBATC Hcompleted (batchcompleted:streaminglistenerbatchcompleted) {val elements =      BatchCompleted.batchInfo.streamIdToInputInfo for {processingend <-batchCompleted.batchInfo.processingEndTime WoRkdelay <-batchCompleted.batchInfo.processingDelay waitdelay <-batchCompleted.batchInfo.schedulingDelay  Elems <-Elements.get (streamuid). Map (_.numrecords)} computeandpublish (Processingend, Elems, Workdelay, WaitDelay) }}

---restore content starts---

1, why introduce backpressure

2 , backpressure

2.1 Streaming as shown in the architecture (see Streaming data Reception process documentation and Streaming Source parsing)

2.2 backpressure the execution process is as follows:

3 , backpressure Source Parsing

3.1 Ratecontroller class System

Ratencontroller inherits from Streaminglistener. used to process batchcompleted event. The core code is:

* * A Streaminglistener that receives batch completion updates, and maintains * an estimate of the speed at which this St Ream should ingest messages, * Given an estimate computation from a ' rateestimator ' */private[streaming] abstract class Ra Tecontroller (Val streamuid:int, rateestimator:rateestimator) extends Streaminglistener with Serializable {...../** * C   Ompute the new rate limit and publish it asynchronously.      */Private Def computeandpublish (Time:long, Elems:long, Workdelay:long, waitdelay:long): Unit = Future[unit] { Val newrate = Rateestimator.compute (Time, Elems, Workdelay, Waitdelay) Newrate.foreach {s = Ratelimit.s ET (S.tolong) Publish (Getlatestrate ())}} def getlatestrate (): Long = Ratelimit.get () override Def ONBATC Hcompleted (batchcompleted:streaminglistenerbatchcompleted) {val elements =      BatchCompleted.batchInfo.streamIdToInputInfo for {processingend <-batchCompleted.batchInfo.processingEndTime WoRkdelay <-batchCompleted.batchInfo.processingDelay waitdelay <-batchCompleted.batchInfo.schedulingDelay  Elems <-Elements.get (streamuid). Map (_.numrecords)} computeandpublish (Processingend, Elems, Workdelay, WaitDelay) }}

3.2 Ratecontroller the registration

all inputdstreamregistered in dstreamgraph are extracted when the Jobscheduler is started Ratecontroller , and Register for monitoring with Listenerbus . This section of code is as follows:

Spark Streaming Backpressure Analysis

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark Streaming Backpressure Analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Spark Streaming Backpressure Analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support