Don't put everything in the same Hadoop database.

Source: Internet
Author: User
Keywords All put things dreams

Those who recommend putting everything in the Hadoop database clearly forget some of the bitter lessons that have occurred in the history of the database.

Although there is a proverb called "Elephants will never forget", I still doubt the meaning of this proverb. I know a special kind of elephant--their name is Hadoop, and these elephants don't seem to remember that they're http://www.aliyun.com/zixun/aggregation/8302.html to the corporate class. "> Data warehousing (EDW) Some experiences in the process of market march. On the stage of Hadoop, some products seem to have not improved on some of the flaws that have plagued the market for a long time, and are making the same mistakes constantly.

I am skeptical that Hadoop can, and should be, the central hub of all the analytics data for the enterprise.

In the early days of the big data [note], the Edw field had put forward the idea of "putting all the eggs in one basket." Although it is theoretically significant to create a single version of the real Data Warehouse for all Analysis topic domains, few customers are willing to spend money, time and resources consolidating different analysis databases onto a single platform. In the EDW market, many enterprises integrate the core records system data, but we can still see the enterprise-specific tactical Data Warehouse, data mart, operation data storage, online analytical processing (OLAP) database, and other analysis database for specific region, business domain, application and user.

In the Hadoop era, the idea of a single "enterprise data Hub" still has an opposing voice. In fact, there is a question about the recent article by Loraine Lawson about the equivalent concept, the "Data Lake", which is the center of Hadoop. Lawson the idea as "big rock Sugar Mountain", and she believes that in a data-centric architecture, distributed computing will eliminate islands of data. "Dumbill points out that Google and Facebook developers ' live in this dream completely," Lawson Edd Dumbill in a discussion about "data lakes", proving that it's not just a developer's dream. ”

I have no way of knowing the logical basis for Dumbill to make this argument. These specific developers do not acknowledge that this is a developer's dream. The specific developers from Google and Facebook are the early developers and users of Hadoop, two companies that have created their own web services on the platform. Nor does it prove that the dream exists outside Silicon Valley.

In fact, user ideas in the big data age have begun to transform into a "mixed deployment" model. This "mixed" deployment model consolidates Edw, Hadoop, NoSQL, memory, and other data platforms into a cloud-capable heterogeneous infrastructure.

In a mixed architecture, the "Data Lake" dream seems to target a large data deployment role: A sandbox with experimental properties. The sandbox is a data consolidation and statistical modeling Center for users who are the team of data scientists who need to filter data from a vast amount of structured data. As I said before, global data scientists are taking Hadoop as their data "sandbox".

Hadoop is becoming a critical application deployment and execution platform in large data analysis. I have nothing to be picky about the outlook for the "Data Lake". Data scientists are key application developers in the age of large data. Hadoop is rapidly becoming a multipurpose distributed task execution layer, capable of performing a large number of tasks written in other languages.

But that's not to say that Hadoop will be the only platform. In fact, all large data platforms, including Hadoop, massively parallel processing Edw, NoSQL, memory, and streaming, are application development and execution platforms. Any idea that a platform will unified "an analysis-centred application development" is incorrect.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.