Topic Center

Contact Sales

Home > Others

Correlation comparison of Spark streaming&storm flow calculation

Last Update:2018-07-26 Source: Internet

Author: User

Tags comparison

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Spark streaming and Storm are now popular real-time streaming computing frameworks that have been widely used in real-time computing scenarios where spark streaming is a spark-based extension that is later than Storm. This chapter expounds the two from the following angles, which can be used as a reference in the selection.

A, Data processing methods

Spark streaming is a real-time streaming computing framework built on Spark, using the time batch window to generate the compute input source RDD for Spark, and then generating the job for the RDD, queued to execute in the Spark computing framework, The bottom layer is based on the spark resource scheduling and Task computing framework; Spark streaming is a data-based batch process that calculates data-generation tasks, moving computations without moving data, while storm, on the other, is streaming data into compute nodes, Moving data instead of calculations, the batch data processing for time Windows needs to be implemented by the user themselves, as described in the previous Storm series related chapters.

B. Ecological system

Spark streaming is spark-based and can be combined with other spark components to enable interactive query adhoc, machine learning mlib, and more. In contrast, Storm is simply a streaming computing framework that lacks the convergence of existing hadoop ecosystems.

C, latency, and throughput

Spark streaming is based on the processing of batch data, relying on the scheduling and computing framework of Spark, the latency is higher than storm, the general minimum latency is around 2s, and storm can reach within 100ms. Because the spark streaming is processing data in batches, the overall throughput is relatively high.

D, fault tolerance

Spark streaming fault tolerance through lineage and two copies of data backup in memory, and by lineage records the operation of the RDD before, if a node fails at runtime, it can be recalculated at other nodes by the backup data.

Storm uses ACK components to track the flow of data, which is much more expensive than sparking streaming.

E, transactional

Spark streaming guarantees that the data is processed only once and is at the hierarchical level of the batch process.

Storm can ensure that each record is processed at least once by tracking mechanism, and it needs to be implemented by the user if it is necessary to ensure that the state is updated only once.

So for statefull calculations, the higher the transactional, the spark streaming is better.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

spark streaming kafka spark streaming kafka maven simple spark streaming example spark streaming kafka offset spark structured streaming kafka maven spark streaming kafka spark streaming tutorial

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Correlation comparison of Spark streaming&storm flow calculation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support