Many beginners have a lot of doubts when it comes to big data, such as the understanding of the three computational frameworks of MapReduce, Storm, and Spark, which often creates confusion.
Which one is suitable for processing large amounts of data? Which is also suitable for real-time streaming data processing? And how do we differentiate them?
I've collated the basics of these 3 computational frameworks so that you can get an idea of the 3 computational frameworks as a whole.
Big Data Learning Group 119599574
Mapreduce
Distributed Offline Computing Framework
Mainly applicable to large-scale cluster task, because it is batch execution, so the timeliness is low.
Native support for Java language development MapReduce, other languages need to be developed using Hadoop streaming.
Spark
Spark is a fast and versatile computing engine designed for large-scale data processing, which is an iterative, memory-based computation.
Spark retains the benefits of MapReduce and has a significant increase in timeliness, providing good support for systems that require iterative calculations and high timeliness requirements.
Developers can write data analysis jobs in languages such as Java, Scala, or Python, and use more than 80 advanced operators.
Spark is fully compatible with HDFS while collaborating in parallel with other Hadoop components, including yarn and hbase.
Spark can be used to handle a variety of job types, such as real-time data analysis, machine learning, and graphics processing. It is used for recommendations and computing systems that can tolerate small delays.
Storm
Storm is a distributed, reliable and fault-tolerant streaming computing framework.
Storm was designed for real-time processing, so it is widely used in real time analysis/performance monitoring and other areas that require high timeliness.
Storm theoretically supports all languages and requires only a small amount of code to complete the adaptation.
Storm zookeeper the state of the cluster to a local disk, so background processes are stateless (no need to save their state, all on zookeeper) and can fail or restart without affecting the health of the system.
Storm can be applied to-data flow processing, continuous computing (continuously sending data to the client, which can be updated in real time and presenting data such as site metrics), distributed remote Procedure calls (easy parallelization of CPU-intensive operations).
How to use 4 months to learn Hadoop development and find a yearly salary of 250,000 jobs?
Share a free 18 of the latest Hadoop Big data tutorials and 100 Hadoop Big Data Mandatory meeting questions.
Big Data Learning Group 119599574
Tutorials have helped 300 + people to successfully transform Hadoop development, with 90% starting salaries of over 20K and a doubling of wages than before.
Recorded by Baidu's core architect of Hadoop (T7 level).
The content includes 0 basic primer, Hadoop ecosystem, real Business Project combat 3 most. The business case allows you to get in touch with the real production environment and train your own development skills.
Figure out the differences between Spark, Storm, and MapReduce to learn big data.