Spark set-up: 005~ through spark streaming flow computing framework running source

Source: Internet
Author: User

The content of this lecture:

A. Online dynamic computing classification the most popular product case review and demonstration
B. Case-based running source for spark streaming

Note: This lecture is based on the spark 1.6.1 version (the latest version of Spark in May 2016).

Previous section Review

In the last lesson , we explored the spark streaming architecture mechanism from the perspective of business. The spark streaming program is divided into parts, partly driver, and executor in part. Through the analysis of driver and executor, insight into how to complete the semantics, transactional consistency, and ensure that the data 0 lost, exactly once transaction processing.

Direct consumption of data directly through the Kafka Direct API, all the executors through the Kafka API directly consume data, directly manage offset, so do not repeat consumption data, so as to achieve transactions!!!

Set spark.task.maxFailures number of times to 1, spark.speculation to OFF, Auto.offset.reset to "largest" to resolve multiple rewrite of spark streaming data output

Finally , through transform and FOREACHRDD based on business logic code logic control to achieve data consumption and output is not repeated! These two methods are similar to spark's back door and can be manipulated in any conceivable way!

Lecturing

Case Source




From (Shanghai-Ding Liqing)

Note:
1. DT Big Data Dream Factory public number Dt_spark
2, Spark God-level experts: Liaoliang
3, Sina Weibo: Http://www.weibo.com/ilovepains

Spark set-up: 005~ through spark streaming flow computing framework running source

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.