How do you look at Spark technology in contrast to Hadoop?

Source: Internet
Author: User

The main thing is to look at the MapReduce model what is the problem?

First: Write a lot of low-level code is not efficient, second: all things must be translated into two operation Map/reduce, which in itself is very strange, can not solve all the situation.

In fact, Spark appeared to solve the problem above. Describes the origins of some spark. From the 2010 Berkeley Amplab, published in Hotcloud is a successful model from academia to industry, but also attracted the top Vc:andreessen Horowitz Capital Injection Amplab This laboratory is very powerful, big data, cloud computing, The industry is very close, before they do Mesos,hadoop online, in 2013, these Daniel (MIT's youngest assistant professor) from Berkeley Amplab went out to set up the databricks. It is written in the functional language Scala, and Spark is simply the memory (including iterative computing, Dag Computing, streaming computing) framework, before MapReduce is often ridiculed for inefficiency, and Spark's appearance makes everyone very fresh. As the spark core developer, Reynod introduces spark performance over Hadoop, with only 1/10 or 1/100 of the algorithm implemented.

Why use Spark, the most direct is fast ah, you use Hadoop to run a large number of hours to run, this is only dozens of seconds, this change is not only an order of magnitude, and is the development of your way earth-shaking changes, such as you want to verify an algorithm, you do not know exactly how the effect, But if you can give feedback in seconds, you can adjust it immediately. Other than MapReduce flexible Ah, support iterative algorithm, Ad-hoc query, do not need you to spend a lot of effort on the software to build. On last year's sorting benchmark, spark used a node less than Hadoop to run a 100TB order at 23min, refreshing the world record previously maintained by Hadoop. is compared with Hadoop and spark in the regression algorithm, in the world of Hadoop, iterative computing is very resource-intensive, it costs a lot of IO sequence, so each iteration needs to wait almost. The first time that spark starts is loaded into memory, and then the iteration is done directly in memory using intermediate results, so the iteration speed of the latter is fast to be negligible.

How do you look at Spark technology in contrast to Hadoop?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.