Two high-performance parallel computing engine storm and spark comparison

Source: Internet
Author: User

From http://blog.csdn.net/iefreer/article/details/32715153

Spark is based on the idea that when the data is large, it is more efficient to pass the calculation process to the data than to pass the data to the computational process. Each node stores (or caches) its data set, and then the task is submitted to the node.

So this is the process of passing the data. This is very similar to Hadoop map/reduce, in addition to actively using memory to avoid I/O operations, so that the iterative algorithm (the input that the previous step calculates the output as the next step) performs more.

Shark is just a Spark -based query engine (supports ad-hoc ad hoc analysis queries)

And Storm 's architecture is diametrically opposed to Spark. Storm is a distributed flow computing engine. Each node implements a basic calculation process, and data items flow in and out of interconnected network nodes. instead of Spark, this is about passing data to the process.

Two frameworks are used to process parallel computations of large amounts of data.

Storm is better at dynamically processing a large number of generated "small chunks" (such as real-time computation of aggregation functions or analysis on Twitter data streams).

Spark is working on an existing complete collection of data (such as Hadoop data) that has been imported into the spark cluster, andSpark is based on in-memory Management can perform a flash scan and minimize global I/O operations for the iterative algorithm .

However, the Spark flow module (streaming module) is similar to Storm (both stream computing engines), although not exactly the same.

The Spark Flow module aggregates bulk data and then blocks distribution (treated as immutable data), and Storm is processed and distributed in real time as soon as the data is received.

Not sure which way to take advantage of data throughput, but Storm calculation time delay is small.

In summary, Spark and Storm design are reversed, while spark Steaming is similar to storm, which has a data smoothing window (sliding window), which needs to be maintained by itself.



Two high-performance parallel computing engine storm and spark comparison

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.