How do you look at Spark technology in contrast to Hadoop?

Last Update:2015-04-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The main thing is to look at the MapReduce model what is the problem?

First: Write a lot of low-level code is not efficient, second: all things must be translated into two operation Map/reduce, which in itself is very strange, can not solve all the situation.

In fact, Spark appeared to solve the problem above. Describes the origins of some spark. From the 2010 Berkeley Amplab, published in Hotcloud is a successful model from academia to industry, but also attracted the top Vc:andreessen Horowitz Capital Injection Amplab This laboratory is very powerful, big data, cloud computing, The industry is very close, before they do Mesos,hadoop online, in 2013, these Daniel (MIT's youngest assistant professor) from Berkeley Amplab went out to set up the databricks. It is written in the functional language Scala, and Spark is simply the memory (including iterative computing, Dag Computing, streaming computing) framework, before MapReduce is often ridiculed for inefficiency, and Spark's appearance makes everyone very fresh. As the spark core developer, Reynod introduces spark performance over Hadoop, with only 1/10 or 1/100 of the algorithm implemented.

Why use Spark, the most direct is fast ah, you use Hadoop to run a large number of hours to run, this is only dozens of seconds, this change is not only an order of magnitude, and is the development of your way earth-shaking changes, such as you want to verify an algorithm, you do not know exactly how the effect, But if you can give feedback in seconds, you can adjust it immediately. Other than MapReduce flexible Ah, support iterative algorithm, Ad-hoc query, do not need you to spend a lot of effort on the software to build. On last year's sorting benchmark, spark used a node less than Hadoop to run a 100TB order at 23min, refreshing the world record previously maintained by Hadoop. is compared with Hadoop and spark in the regression algorithm, in the world of Hadoop, iterative computing is very resource-intensive, it costs a lot of IO sequence, so each iteration needs to wait almost. The first time that spark starts is loaded into memory, and then the iteration is done directly in memory using intermediate results, so the iteration speed of the latter is fast to be negligible.

How do you look at Spark technology in contrast to Hadoop?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How do you look at Spark technology in contrast to Hadoop?

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support