The deployment of Hadoop requires careful consideration

Source: Internet
Author: User
Keywords Express need to run very

In recent years, Hadoop has received a lot of praise, as well as "moving to the Big data analysis engine". For many people, Hadoop means big data technology. But in fact, open source distributed processing framework may not be able to solve all the big data problems. This requires companies that want to deploy Hadoop to think carefully about when to apply Hadoop and when to apply other products.

For example, using Hadoop for large-scale unstructured or semi-structured data can be said to be more than sufficient. But the speed with which it handles small datasets is little known. This limits the use of Hadoop in the Metamarkets group. Metamarkets Group is located in San Francisco, providing real-time marketing analysis for online advertising.

Metamarkets CEO Michael Driscoll revealed that, in a tight time, the company uses Hadoop to process large, distributed data, including running a day end report to review the day's turnover, or to view historical data several months ago.

But in its core business of delivering to customers-running real-time analytics-metamarkets does not use Hadoop. Driscoll that the best approach is to run a batch job in a database to view each file. In the final analysis, this is a trade-off: Hadoop sacrifices speed in order to establish a deep correlation between data points. Driscoll said: "Using Hadoop is like having a pen pal, you write a letter to him, send it to the past few days before getting a reply." This is a far cry from the experience (SMS) or email. ”

10gen's product marketing manager, also MongoDB NoSQL database developer Kelly Stirman, says that fast response is critical online, while Hadoop is constrained by time. For example, online analytics applications such as product recommendation engines rely on fast processing of small amounts of information, but Hadoop does not do it effectively.

Do not consider the permutation database

Because open source technology has greatly reduced the cost of technology, some companies may consider scrapping traditional data warehouses to select Hadoop clusters. But Carl Olofson, a market research analyst at IDC, says the two are simply not comparable.

Olofson says relational databases provide the power for most data warehouses to accommodate data flows that are remitted over a period of time at fixed frequencies, such as transactions in daily business processes. On the other hand, Hadoop is good at processing large amounts of cumulative data.

Because Hadoop is primarily used in large-scale projects, which themselves require a large number of servers and employees with specialized projects and data management techniques, the cost of each unit of data is lower than the relational database, but the supplement to Hadoop is expensive. Olofson said: "The cost of these needs to add a plus will find that it is only ' seemingly ' cheap." ”

The company needs specialized development skills because Hadoop uses the MapReduce software program framework that is known to developers, says Todd Goldman, vice president of enterprise data Integration at Informatica, a software vendor. This makes it difficult for companies to get data from SQL databases in Hadoop.

Many vendors have developed associated software to help implement data movement between Hadoop systems and relational databases. But Goldman says there is too much regulation for open source technology for many organizations. "There's no need to reinvent all of the company's data structures for Hadoop," he said. ”

It's helpful, not overly publicized.

Goldman cites an example of a temporary space and data integration platform for running extract, transform, and load (ETL) functions to illustrate when to apply Hadoop. The app may not be as exciting as the hype of Hadoop, but it's especially about when it needs to merge big files. At this point, the processing power of Hadoop comes in handy.

Driscoll says that Hadoop is good at extracting, transforming, and loading (ETL) operations because it can isolate integration tasks from many servers in a cluster. By using Hadoop to integrate, organize data, and load it into a data warehouse or other database, companies are able to understand whether it is reasonable to invest in technical investments in larger projects that require better use of Hadoop scalability.

Of course, industry pioneers such as Google, Yahoo, Facebook and Amazon.com were users of Hadoop many years ago. The new technology to make up for Hadoop's flaws has also been formed. For example, some vendors have already released tools to analyze Hadoop data in real time, and Hadoop 2.0 will also allow applications other than MapReduce to run on Hadoop.

Ultimately, Stirman, it is important for it and business decision-makers to be able to think clearly about how Hadoop can be combined with their work. Hadoop can support many useful data analysis functions and is undoubtedly a powerful tool, but it still needs to be presented in the form of technology.

"There is so much hype about Hadoop that people think it can do almost anything," says

Stirman. The reality is that it is a very complex technology and still immature. Organizations need to focus on and control it in order to make it worthwhile. The

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.