Cloudera intends to build Hadoop as a universal data solution

Source: Internet
Author: User
Keywords Existing build and data solutions but

Cloudera's idea of Hadoop as an enterprise data hub is bold, but the reality is quite different. The Hadoop distance has a long way to go before other big data solutions are eclipsed.

When you have a hammer big enough, everything looks like nails. This is one of the many potential problems that Hadoop 2.0 faces. For now, the biggest concern for developers and end-users is that Hadoop 2.0 has massively modified the framework for large data processing. Cloudera plans to make Hadoop 2.0 a universal hammer that can handle all the different nails.

There is no doubt that Hadoop 2.0 has a significant performance boost compared to previous offerings. For mapreduce tasks, Hadoop is just a batch data processing framework. Now that Hadoop 2.0 is a common framework for deploying applications across node systems, MapReduce is also able to run across nodes, a feature that clearly makes Cloudera feel very excited. In a keynote speech at the O ' Reilly strata-hadoop Conference in New York at the end of October 2013, Cloudera the idea of a "business data hub" driven by Hadoop. All forms of data can be entered into this hub, where data can be properly processed and extracted on demand.

That sounds pretty good, but how much more feasible? These hubs are too far away for companies that are not in a position to get big data in time to find the right place for a massive data farm (farms). Incorporating these "islands of data" into the Hadoop facility is not easy.

While Hadoop is a big hurdle, the biggest hurdle to this idea is not hadoop itself. By interacting with manufacturers and users at the Strata-hadoop conference, we find that manufacturers and users simply treat Hadoop as a pile of bucket parts, and they need to be welded to fully function.

Most of the functionality of Hadoop is being implemented through a third party. These third parties introduced Hadoop functionality into the Just-in-time-deployment (Ready-to-deploy) offerings, not just Cloudera or Cloudera rivals Hortonworks, but also Microsoft (Hortonworks's partners), Amazon , SoftLayer, Rackspace and other cloud service providers. Only a small fraction of them have not provided the various levels of abstraction required by software tools. Puppet or Python scripts here are just options, not required.

Even in small-scale deployments, the sheer number of Hadoop moving parts and sharp edges is frightening. At the group meeting, Oracle Product manager Dan McClary introduced Oracle's hard work in creating Hadoop tools. This gives us a glimpse of how hard it will take to integrate Hadoop into deliverable products, even for big companies like Oracle. Over time, McClary says, the edges and imperfections of Hadoop will be sharpened and resolved with the joint efforts of the community and the manufacturers, but this time will certainly not come soon.

Another major obstacle remains the migration of applications to Hadoop. The new infrastructure based on Hadoop yarn (verb Another Resource negotiator, another resource coordinator) is more open than ever, but it is not easy to write applications that must be rewritten to run applications. There may be some contingency equipment coming up to speed up the process. For example, a virtualization encapsulation tool can be added to the framework at will, but the job is not easy.

There is a lot of work going on in the industry, such as developing connectors, data funnels, etc. to make Hadoop work better with existing applications. Although most people think that existing apps will eventually migrate to Hadoop, few of the workshops focus on migrating existing applications to Hadoop. Most people still want to migrate existing apps to Hadoop, as opposed to discarding existing applications all over again.

In other words, the level of activity at the O ' Reilly meeting is an important harbinger of how long this situation will occur. By this time of 2014, the meeting will be held at the Cloudera Conference Center in Manhattan, New York, where some of the statements may not cause much optimism. The current trend is to develop Hadoop as a complement to the existing large data systems, rather than to develop Hadoop as an upgrade system for existing large data systems.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.