Contents of this issue:
Why do I need dynamic?
When spark is coarse-grained by default, the resource is first allocated for recalculation. Spark streaming has high peaks and low peaks, but the resources they need are different, and if they are at peak value, there will be a lot of wasted resources.
Spark streaming is constantly running, and the resource consumption and management is also a factor to consider.
Spark streaming challenges when dynamically adjusting the resources:
Spark streaming runs according to batch Duration, batch Duration requires a lot of resources, and the next batch Duration doesn't need that much resources. When you adjust the resources, the batch duration run is out of date. The time interval is adjusted at this time.
Spark Streaming Resource Dynamic application
1. Dynamic resource allocation is not turned on by default in Sparkcontext, but can be configured manually in sparkconf.
// optionally scale number of executors dynamically based on workload. exposed for testing.val Dynamicallocationenabled = utils.isdynamicallocationenabled (_conf) if (!dynamicAllocationEnabled && //parameter configuration whether to turn on resource dynamic allocation _conf.getboolean ("spark.dynamicAllocation.enabled", false)) { logwarning ("Dynamic allocation and num executors both set, thus dynamic allocation disabled. ")} _executorallocationmanager = if (dynamicallocationenabled) { Some (New executorallocationmanager (this, listenerbus, _conf)) } else { none }_executorallocationmanager.foreach (_.start ())
-
Executorallocationmanager: There are timers that will constantly scan the executor case, the running stage, to run in different executor, either increase executor or decrease. The schedule method in the
-
Executorallocationmanager is periodically triggered for resource dynamic adjustment.
/** * this is called at a fixed interval to regulate the number of pending executor requests * and number of executors running. * * first, adjust our requested executors Based on the add time and our current needs. * then, if the remove time for an existing executor has expired, kill the executor. * * this is factored out into its own method for testing. */private def schedule (): unit = synchronized { val now = clock.gettimemillis Updateandsyncnumexecutorstarget (now) removeTimes.retain { case (executorid, Expiretime) => val expired = now >= expiretime if (expired) { initializing = false removeexecutor (EXECUTORID) } !expired }}
-
In Executorallocationmanager the timer in the thread pool runs continuously schedu Le.
/** * register for scheduler callbacks to decide when to add and remove executors, and start * the scheduling task. */ Def start (): unit = { listenerbus.addlistener (listener) val Scheduletask = new runnable () { override def run (): unit = { try { schedule () } catch { case ct: ControlThrowable => throw ct case t: throwable = > logwarning (S "uncaught exception in thread ${thread.currEntthread (). GetName} ", t) } } }// intervalmillis Timer trigger Time executor.scheduleatfixedrate (Scheduletask, 0, intervalmillis, timeunit.milliseconds)}
Dynamic control consumption rate: Spark streaming provides an elastic mechanism for the relationship between the speed of flow in and the speed of processing, and whether data is processed in time. If not, he will automatically dynamically control the speed of the data flow, spark.streaming.backpressure.enabled parameter settings.
The principle of dynamic control of consumption rate can be referenced in paper Adaptive Stream processing using dynamic Batch Sizing
Note:
1. DT Big Data Dream Factory public number Dt_spark
2, the IMF 8 o'clock in the evening big data real combat YY Live channel number: 68917580
3, Sina Weibo: Http://www.weibo.com/ilovepains
This article is from the "Ding Dong" blog, please be sure to keep this source http://lqding.blog.51cto.com/9123978/1784901
17th Lesson: Spark Streaming Resource dynamic application and dynamic control consumption rate principle analysis