Companies deploying Hadoop need to think carefully
Source: Internet
Author: User
KeywordsInstall wwwhttp run express
In recent years, Hadoop has received a lot of praise, as well as "moving to the Big data analysis engine". For many people, Hadoop means big data technology. But in fact, open source distributed processing framework may not be able to solve all the big data problems. This requires companies that want to deploy Hadoop to think carefully about when to apply Hadoop and when to apply other products.
For example, using Hadoop for large-scale unstructured or semi-structured data can be said to be more than sufficient. But the speed with which it handles small datasets is little known. This limits the use of Hadoop in the Metamarkets group. Metamarkets Group is located in San Francisco, providing real-time marketing analysis for online advertising.
Metamarkets CEO Michael Driscoll revealed that, in a tight time, the company uses Hadoop to process large, distributed data, including running a day end report to review the day's turnover, or to view historical data several months ago.
But in its core business of delivering to customers-running real-time analytics-metamarkets does not use Hadoop. Driscoll that the best approach is to run a batch job in a database to view each file. In the final analysis, this is a trade-off: Hadoop sacrifices speed in order to establish a deep correlation between data points. Driscoll said: "Using Hadoop is like having a pen pal, you write a letter to him, send it to the past few days before getting a reply." This is a far cry from the experience (SMS) or email. ”
10gen's product marketing manager, also MongoDB NoSQL database developer Kelly Stirman, says that fast response is critical online, while Hadoop is constrained by time. For example, online analytics applications such as product recommendation engines rely on fast processing of small amounts of information, but Hadoop does not do it effectively.
Do not consider the permutation database
Because open source technology has greatly reduced the cost of technology, some companies may consider scrapping traditional data warehouses to select Hadoop clusters. But Carl Olofson, a market research analyst at IDC, says the two are simply not comparable.
Olofson says relational databases provide the power for most data warehouses to accommodate data flows that are remitted over a period of time at fixed frequencies, such as transactions in daily business processes. On the other hand, Hadoop is good at processing large amounts of cumulative data.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.