Differences between Spark and Storm

Source: Internet
Author: User
Keywords spark storm spark and storm differences
1. Comparison
Comparison point Storm Spark Streaming

The real-time calculation model is pure real-time, come with a piece of data, process a piece of data quasi-real-time, collect the data within a period of time, and process it as an RDD

Real-time calculation of delay in milliseconds and seconds

Low throughput

The transaction mechanism supports perfect support, but it is not perfect

Robustness / fault tolerance ZooKeeper, Acker, very strong Checkpoint, WAL, general

Dynamic adjustment of parallelism is not supported

2. Application scenarios of Spark Streaming and Storm
Storm

It is recommended to be used in scenarios that require pure real-time and cannot tolerate delays of more than 1 second, such as real-time financial systems, which require pure real-time financial transactions and analysis

In addition, if the real-time computing function requires a reliable transaction mechanism and a reliability mechanism, that is, the data processing is completely accurate, no more than one, no less, you can also consider using Storm

If you also need to dynamically adjust the parallelism of the real-time calculation program for peak and low peak time periods to maximize the use of cluster resources (usually in small companies, where cluster resources are tight), you can also consider using Storm

If a big data application system, it is pure real-time calculation, without the need to perform SQL interactive queries, complex transformation operators, etc. in the middle, then Storm is a better choice

Spark Streaming

If the above three points that apply to Storm are not satisfied with a real-time scenario, that is, pure real-time, strong and reliable transaction mechanism, and dynamic adjustment of parallelism are not required, then you can consider using Spark Streaming

One of the most important factors to consider using Spark Streaming should be a macro consideration for the entire project, that is, if a project includes offline batch processing, interactive query and other business functions in addition to real-time computing, and in real-time computing, It may also involve high-latency batch processing, interactive query and other functions, then the Spark ecosystem should be preferred, offline batch processing with Spark Core, interactive query with Spark SQL, and real-time calculation with Spark Streaming. Seam integration provides very high scalability to the system

3. The advantages and disadvantages of Spark Streaming and Storm
In fact, Spark Streaming is definitely not better than Storm. These two frameworks are excellent in the field of real-time computing, but the subdivision scenarios that they are good at are not the same.

Spark Streaming is only superior to Storm in throughput, and throughput has always been Spark Streaming, and people who depreciate Storm emphasize it. But the question is, is it so focused on throughput in all real-time computing scenarios? Not really. Therefore, it is not reliable to say that Spark Streaming is stronger than Storm through throughput.

In fact, Storm's real-time latency is much better than Spark Streaming, the former is pure real-time, the latter is quasi-real-time. In addition, Storm's transaction mechanism, robustness/fault tolerance, and dynamic adjustment of parallelism are all better than Spark Streaming.

Spark Streaming is one thing that Storm absolutely cannot compare with: it is located in the Spark ecological technology stack, so Spark Streaming can be seamlessly integrated with Spark Core and Spark SQL, which means that we can For data, immediately perform delayed batch processing, interactive query and other operations in the program seamlessly. This feature greatly enhances the advantages and functions of Spark Streaming.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.