International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Spark streaming hollow Rdd handling and flow handler graceful stop

Last Update:2016-06-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Contents of this issue:

Empty RDD processing in Spark streaming
Spark Streaming Program Stop

Since each batchduration of spark streaming will constantly produce the RDD, the empty rdd has great probability, and how to deal with it will affect the efficiency of its operation and the efficient use of resources.

Spark streaming will continue to receive data, without knowing what state the data is received, and if you force it to stop, it will involve incomplete operations or consistency related issues.

One, the empty rdd processing in Spark streaming:

Foreachrdd is the method (operator) that produces the core of the Dstreams real action operation.

　　While the data is being written to the database, when the Rdd is empty, if the foreachpartition and write database operations are also performed, or if the data is stored in HDFs, there is no need to get the compute resources at this point.

　　How to maximize resource savings and improve efficiency? Increase your judgment before processing:

　　It is still not ideal to judge on the face of data, because the count operation starts the job or wastes resources, and we comb the following methods:

　　If there are several partitions, but the contents of partition are empty, take may start the job:

　　What happens if there is no data:

　Operations when the data is empty:

　From the above can be observed, will generate RDD, but there is no partition in the RDD, no data will not be generated block, but will generate an rdd, but there is no partition.

　　There are partition but will not be executed if there is no blockid:

　Summarize:

In fact, you can not generate the RDD, because it is necessary to maintain a concept, each bachduration will produce a job,job if there is no rdd can not be produced;

The job is generated at each interval, and if there is no job at the time of submission, what action does your action take, and on the surface it does not produce an RDD efficiency;

But at the scheduling level depends on each batchduration generated job, the scheduling level to determine whether there is an rdd, no RDD job will not be able to execute.

Second, the Spark streaming program stops:

　　In general, how does spark streaming stop?

　The above stop mode will stop this streams, but will not wait for all data processing to complete by default Sparkcontext will also be stopped.

Use the Stopgracefully method to process:

　　When the application starts, it calls Stoponshutdown, and the callback is passed in.

　　If the prompt data is not processed, it is stopped:

　　Summary: Using stopgracefully all received data will be processed to complete before it is stopped.

Note:

- Data from: Liaoliang (Spark release version customization)
- Sina Weibo:http://www.weibo.com/ilovepains

Spark streaming hollow Rdd handling and flow handler graceful stop

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark streaming hollow Rdd handling and flow handler graceful stop

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support