Six major misconceptions about Hadoop

Source: Internet
Author: User
Keywords Big data can misunderstand solution
So far, Hadoop and large data are actually synonyms. But with the hype of big data rising, there's been a lot of misunderstanding about how Hadoop applies to big data.


Hadoop is an open-source software framework for storing and analyzing large datasets that can handle data distributed across multiple existing servers. Hadoop is designed to handle diverse, large-load data from mobile phones, e-mail, social media, sensor networks, and other different channels, and is often considered a large data operating system. And that is the first source of misunderstanding:


1, Hadoop is a complete solution.


that is not the case. Whether you call it "frame" or "platform," it's just not possible to think of Hadoop as a solution to all the big data problems.


"There is no standard Hadoop product on the market," says Fille Simon, author of the "Big data business case", said: "Unlike anything else, you can get a standard database from IBM or SAP." ”


However Simon does not think that this is a long-term problem. First, since Hadoop is an Open-source project, many other Hadoop-related projects such as Cassandra and http://www.aliyun.com/zixun/aggregation/13713.html ">hbase, can meet specific needs. HBase provides a distributed database that supports structured data storage for large data tables.


In addition, just as Red Hat, IBM and other vendors packaged Linux as a variety of user-friendly products, there are many big data startups that are doing the same thing with Hadoop. So, while Hadoop itself is not a complete solution, most organizations will actually encounter it in a relatively complete large data solution.


2, Hadoop is a database.


Hadoop is often used as a database, but that's not the case. Marshall Bockrath-vandegrift, a software engineer at Damballa Security, said: "There is no core platform like query or indexing in the Hadoop core." "Damballa companies use Hadoop to analyze real-time security risks.


"We use HBase to help our risk analyst run real-time queries against passive DNS data." HBase and other real-time technologies are not only complementary to Hadoop, but most rely on the Hadoop core distributed storage technology (HDFS) to achieve high-performance distributed dataset access. "he added.

"Hadoop is not built to replace the database system, but it can be used to build a database system," said Prateek Gupta, a scientist at the
Bloom reach data marketing analytics company. ”


3, Enterprise-class Hadoop applications are too risky.


Many businesses worry about Hadoop Tainu, untested and not suitable for enterprise applications. No idea is more wrong than this. Don't forget, Hadoop is built on a distributed storage platform based on Google's file system and Googlemapreduce data analysis tools running on that filesystem. Yahoo has invested money and energy in Hadoop and launched its first large Hadoop application in 2008, a search for "site maps" to index all known pages and corresponding metadata to complete searches on those pages.


now Hadoop is being used by companies such as Netflix, Twitter and ebay, and companies including Microsoft, IBM and Oracle have Hadoop tools to sell. It is still too early to call Hadoop a "mature" technology, which is similar to any large data platform, but it does have the adoption and validation of large enterprises.


This does not mean that it is a platform without risk, the security issue itself is a more difficult problem. But businesses are far from being scared away by the youth of the Hadoop platform.


4, to use Hadoop, you have to ask a bunch of programmers.


Depending on what you are going to do, this may be true. If you plan to develop an excellent next generation Hadoop large Data suite, you may need professional Java and mapreduce programmers. Conversely, programming is not a problem if you are willing to take advantage of other people's achievements. Data integration Vendor Syncsort's recommendations analysts use Hadoop-compatible data integration tools to run advanced queries without any coding effort.


Most data integration tools have a graphical interface that masks the complexities of MapReduce programming, many with preset templates. In addition, entrepreneurial companies, including Alpine data Labs, Continuuity and Hortonworks, also offer tools that can simplify large data and Hadoop applications.


5, Hadoop is not suitable for small and medium-sized enterprises.


Many SMEs are worried about being shut out of the "big data" trend. Large manufacturers such as IBM and Oracle are naturally inclined to sell large and expensive solutions. This does not mean that there are no relevant tools for SMEs in the market.


cloud computing is rapidly promoting the popular application of some cutting-edge technologies. "Cloud computing is transforming capital spending into operating costs," Fille Simon, author of Big data. "You can use the same cloud services as Netflix. The same thing started happening in the Big data field, a business with only five employees, so you can use Kaggle. ”


Kaggle called itself "a bridge between data problems and data solutions." "For example, the start-up company Jetpac a 5000 dollar reward for an algorithm to find the most attractive vacation photos. Most vacation photos are not good, and screening is a tedious, time-consuming process.


Jetpac The 30000 photos by hand, and seeks an algorithm that can be similar to manual methods, simply by analyzing metadata (photo size, caption, descriptive information). If the company develops this algorithm on its own, it will cost more than 5000 dollars. And they can only get one solution, not a selection of alternatives. Jetpac's image processing tool eventually helped it get 2.4 million dollars in wind investment.


6, Hadoop is cheaper.


This misconception applies to any open source software. Saving the initial purchase cost does not mean you will save money. One of the problems with cloud computing, for example, is that it is so easy to build a research project on the Amazon platform that many people have built their own projects in AWS, forgetting the projects themselves while continuing to pay.

The blind expansion of the
virtual servers has dwarfed the increase in physical servers. While Hadoop can help you store and analyze data, how do you import old data into a new system? How do you visualize data? How do you protect the data that will be shared more?


  Hadoop is actually a patchwork solution. You can get a complete enterprise solution from a company like Cloudera, or you can set up your own highly customized solution. No matter what route you choose, be careful with your budget, because free software is never really free.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.