The content of this lecture:
A. Online dynamic computing classification the most popular product case review and demonstration
B. Case-based running source for spark streaming
Note: This lecture is based on the spark 1.6.1 version (the latest version of Spark in May 2016).
Previous section Review
In the last lesson , we explored the spark streaming architecture mechanism from the perspective of business. The spark streaming program is divided into parts, partly driver, and executor in part. Through the analysis of driver and executor, insight into how to complete the semantics, transactional consistency, and ensure that the data 0 lost, exactly once transaction processing.
Direct consumption of data directly through the Kafka Direct API, all the executors through the Kafka API directly consume data, directly manage offset, so do not repeat consumption data, so as to achieve transactions!!!
Set spark.task.maxFailures number of times to 1, spark.speculation to OFF, Auto.offset.reset to "largest" to resolve multiple rewrite of spark streaming data output
Finally , through transform and FOREACHRDD based on business logic code logic control to achieve data consumption and output is not repeated! These two methods are similar to spark's back door and can be manipulated in any conceivable way!
Lecturing
Case Source
From (Shanghai-Ding Liqing)
Note:
1. DT Big Data Dream Factory public number Dt_spark
2, Spark God-level experts: Liaoliang
3, Sina Weibo: Http://www.weibo.com/ilovepains
Spark set-up: 005~ through spark streaming flow computing framework running source