Contents of this issue:
- Spark Streaming Resource dynamic allocation
- Spark streaming dynamically control consumption rate
Why dynamic processing is required:
Spark is a coarse-grained resource allocation, that is, by default allocating a good resource before computing, coarse granularity has a benefit, because the resources are assigned to you in advance, when there is a calculation of the task of direct use,
the bad aspect of coarse granularity is from spark Streaming angle of peak value, low peak, in high and low peak time required resources are not the same, if the resource allocation according to High Peak, at low peak is the waste of resources,
as the spark streaming program itself continues to run on resources consumption and management are also factors that need to be considered.
One, the Spark streaming resource dynamic allocation:
Dynamic Resource allocation Source:
Set its configuration in sparkconf
The frequency of the timer to constantly scan the executor, the running scheduler is to run in different executor, need to dynamically increase executor or reduce executor, for example, to determine a 60-second time interval
of the Executor a If the task is not running, it will remove the executor. How the executor is reduced because the executor running in the current application will have a data structure in the driver that keeps a reference to it, each time the task is scheduled
the time will iterate through the columns of the executor table, and then query the list of available resources, depending on whether the clock in this class is constantly looping to see if the conditions for adding or removing executor are met, and if the conditions for adding or removing are met
triggering executor to add and remove.
From the spark streaming point of view, the dynamic resource adjustment that spark streaming is dealing with is Executor's resource dynamic adjustment, what is the biggest challenge?
Spark streaming is run according to Bachduration, perhaps this bachduration need a lot of resources, the next without so many resources, the current bachduration resources have not been adjusted to complete its operation has expired.
Second, dynamic control consumption rate:
Spark streaming elastic mechanism, you can see how the flow incoming data is handled, the relationship between the speed of processing can be processed in time, if it is too late to deal with, will dynamically control the speed of data flow in.
The spark streaming itself has a rate control, which can be adjusted manually using manual controls that require a sense of the speed at which the spark streaming is processed, according to Bachduration
Stream incoming data to control its speed, you can adjust the bachduration into more data or less data.
remark:
-
- Data from: Liaoliang (Spark release version customization)
- Sina Weibo:http://www.weibo.com/ilovepains
Spark Streaming resource dynamic application and dynamic control consumption rate analysis