International - English

Cart Console

Topic Center

Contact Sales

Home > Internet > Big Data

The five reasons why data analysis does not use Hadoop

Last Update:2014-12-10 Source: Internet

Author: User

Keywords Can analyze no work solution we

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I was once a staunch supporter of Hadoop. I like it to handle PB-level data easily, like it can extend operations to thousands of-node http://www.aliyun.com/zixun/aggregation/13452.html "> Distributed computing Capabilities, It also likes the flexibility of storing and loading data. But after a series of explorations and use, I was very disappointed with Hadoop.

Here's why I don't use Hadoop for data analysis.

--hadoop is just a framework, not a complete solution. Hadoop is expected to satisfactorily solve large data analysis problems, but the fact is that Hadoop is OK for simple problems, and it still requires us to develop map/reduce code for complex problems. So it looks like Hadoop and the way the Business Analytics solution is developed using the Java EE programming environment indistinguishable!

--pig and hive are very good, but they are limited by the architecture. Pig and hive are ingenious tools that allow people to get started quickly and improve productivity. But they're just a tool for translating regular SQL or text into map/reduce queries on Hadoop environments. Pig and Hive are limited by the operational performance of the Map/reduce framework, especially in the case of node communications (such as sorting and connectivity).

-Without software costs, deployment is relatively easy, but maintenance and development costs a lot. The reason Hadoop is very popular is that we are free to download, install, and run. Because it is an open source project, there is no software cost, which makes it a very attractive solution to replace Oracle and Teradata. But once you get into the maintenance and development phase, the real cost of Hadoop is highlighted.

--Good at large data analysis, but poor performance in some specific areas. Hadoop is very good at large data analysis and the useful data needed to transform raw data into applications such as search or text mining. But if we don't know exactly what to analyze, and we want to explore the data in a pattern-matching way, Hadoop will soon be a mess. Of course, Hadoop is very flexible, but it takes you a long time to write map/reduce code.

The performance of parallel processing is excellent, but no exceptions are excluded. Hadoop can put thousands of nodes into the calculation, which is very performance potential. But not all work can be done in parallel, such as data analysis with user interaction. If you're designing applications that aren't specifically optimized for Hadoop clusters, performance is not ideal because each map/reduce task waits for the previous job to complete.

To sum up, Hadoop is indeed a shocking computational framework that can perform large-scale data analysis. On the other hand, this means that data analysis must be based on a lot of programming work. (Zhang Zhiping/compiling)

(Responsible editor: The good of the Legacy)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Database operation and maintenance personnel should know thes... 10-23

"Prism Gate" provides a model for people to reflect on person... 04-30

There is no shortage of data mining talent in China, but it i... 04-30

In order to get "big data", Strategic investment Love Station... 04-30

The internet industry in China, we still keep the data very t... 04-29

News client The biggest gold mine is big data 04-27

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Hot Article

Hot Tags

computing conference access forum computer class data get http html applications

Popular Keywords

html add blank space register business logo register ssl certificate full site sign in sign up node js build cloud register register a subdomain in python network management system tutorial how to learn computer science by myself

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The five reasons why data analysis does not use Hadoop

Contact Us

Hot Article

Hot Tags

Popular Keywords

Recommend Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support