Correlation comparison of Spark streaming&storm flow calculation

Source: Internet
Author: User
Tags comparison

Spark streaming and Storm are now popular real-time streaming computing frameworks that have been widely used in real-time computing scenarios where spark streaming is a spark-based extension that is later than Storm. This chapter expounds the two from the following angles, which can be used as a reference in the selection.

A, Data processing methods

Spark streaming is a real-time streaming computing framework built on Spark, using the time batch window to generate the compute input source RDD for Spark, and then generating the job for the RDD, queued to execute in the Spark computing framework, The bottom layer is based on the spark resource scheduling and Task computing framework; Spark streaming is a data-based batch process that calculates data-generation tasks, moving computations without moving data, while storm, on the other, is streaming data into compute nodes, Moving data instead of calculations, the batch data processing for time Windows needs to be implemented by the user themselves, as described in the previous Storm series related chapters.

B. Ecological system

Spark streaming is spark-based and can be combined with other spark components to enable interactive query adhoc, machine learning mlib, and more. In contrast, Storm is simply a streaming computing framework that lacks the convergence of existing hadoop ecosystems.

C, latency, and throughput

Spark streaming is based on the processing of batch data, relying on the scheduling and computing framework of Spark, the latency is higher than storm, the general minimum latency is around 2s, and storm can reach within 100ms. Because the spark streaming is processing data in batches, the overall throughput is relatively high.

D, fault tolerance

Spark streaming fault tolerance through lineage and two copies of data backup in memory, and by lineage records the operation of the RDD before, if a node fails at runtime, it can be recalculated at other nodes by the backup data.

Storm uses ACK components to track the flow of data, which is much more expensive than sparking streaming.

E, transactional

Spark streaming guarantees that the data is processed only once and is at the hierarchical level of the batch process.

Storm can ensure that each record is processed at least once by tracking mechanism, and it needs to be implemented by the user if it is necessary to ensure that the state is updated only once.

So for statefull calculations, the higher the transactional, the spark streaming is better.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.