The five reasons why data analysis does not use Hadoop

Source: Internet
Author: User
Keywords Can analyze no work solution we

I was once a staunch supporter of Hadoop. I like it to handle PB-level data easily, like it can extend operations to thousands of-node http://www.aliyun.com/zixun/aggregation/13452.html "> Distributed computing Capabilities, It also likes the flexibility of storing and loading data. But after a series of explorations and use, I was very disappointed with Hadoop.

Here's why I don't use Hadoop for data analysis.

--hadoop is just a framework, not a complete solution. Hadoop is expected to satisfactorily solve large data analysis problems, but the fact is that Hadoop is OK for simple problems, and it still requires us to develop map/reduce code for complex problems. So it looks like Hadoop and the way the Business Analytics solution is developed using the Java EE programming environment indistinguishable!

--pig and hive are very good, but they are limited by the architecture. Pig and hive are ingenious tools that allow people to get started quickly and improve productivity. But they're just a tool for translating regular SQL or text into map/reduce queries on Hadoop environments. Pig and Hive are limited by the operational performance of the Map/reduce framework, especially in the case of node communications (such as sorting and connectivity).

-Without software costs, deployment is relatively easy, but maintenance and development costs a lot. The reason Hadoop is very popular is that we are free to download, install, and run. Because it is an open source project, there is no software cost, which makes it a very attractive solution to replace Oracle and Teradata. But once you get into the maintenance and development phase, the real cost of Hadoop is highlighted.

--Good at large data analysis, but poor performance in some specific areas. Hadoop is very good at large data analysis and the useful data needed to transform raw data into applications such as search or text mining. But if we don't know exactly what to analyze, and we want to explore the data in a pattern-matching way, Hadoop will soon be a mess. Of course, Hadoop is very flexible, but it takes you a long time to write map/reduce code.

The performance of parallel processing is excellent, but no exceptions are excluded. Hadoop can put thousands of nodes into the calculation, which is very performance potential. But not all work can be done in parallel, such as data analysis with user interaction. If you're designing applications that aren't specifically optimized for Hadoop clusters, performance is not ideal because each map/reduce task waits for the previous job to complete.

To sum up, Hadoop is indeed a shocking computational framework that can perform large-scale data analysis. On the other hand, this means that data analysis must be based on a lot of programming work. (Zhang Zhiping/compiling)

(Responsible editor: The good of the Legacy)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.