Insight into Big data: The seven misconceptions of Hadoop and cloud analysis

Source: Internet
Author: User
Keywords Large data can misunderstand run

The seven major misconceptions: Big Data and Hadoop

In the case of Hadoop, it's a myth in the open source world, but now the industry is accompanied by rumors that could lead it executives to develop strategies with a "colored" view.

From IDC Analyst Report 2013 data storage growth rate will reach 53.4%,at&t is claiming that wireless data flow has increased 200 times times in the past 5 years, from Internet content, email, application notification, social news and daily received messages are growing significantly, This is why many big companies focus on big data.

There is no doubt that Hadoop has become one of the main areas of investment to address big data needs, and the success of internet giants like Facebook in publicly touting Hadoop will also have to focus on Hadoop first. But for Hadoop technology, it's a multidimensional solution that can be deployed and used in different ways. Here are some of the seven misconceptions about Hadoop and big data:

1. Large data is just capacity

For large data, produced (multiple), variability (variable), velocity (speed) and value (values) are often mentioned in addition to volume. The key point is that large data is not a volume growth, more of the future of real-time analysis, structured and unstructured data development, and is used by enterprise CIOs for better decision-making.

To sum up, it is not only the analysis of large data to gain value. For example, the value of data stored and analyzed for 1PB may not be as valuable as analyzing 1GB data in real time, and getting value from "fresh" data is more valuable than dissecting outdated data.

2. Traditional SQL cannot be used on Hadoop

Many vendors devote their energy to Hadoop, and when they lay out their marketing strategy, it's clear that HDFs and MapReduce are limited to the ability to handle SQL-like languages, which is why hive, pig and Sqoop are eventually promoted. More enterprises manage a large amount of data through Hadoop and SQL compatibility, Pivotal HD is a combination of SQL parallel processing database and Hadoop 2.0, optimized for enterprise data analysis requirements of the Hadoop enhanced version.

3.Hadoop is the only new IT data platform

When it comes to data platforms, mainframes are a long-term investment in the IT portfolio, evolving with ERP, CRM and SCM systems. In the face of the big data age, mainframes do not want to be abandoned by the architecture, must demonstrate value in the existing IT investment environment, while many customers encounter the problem of speed, scale and cost, through large memory data network such as Vfabric Sqlfire to solve high-speed data access, facilitates mainframe batch processing or real-time analysis reporting on these issues.

4. Virtualization can lead to performance degradation

Hadoop was originally designed to run on the entity servers, but as cloud computing developed, many businesses wanted to serve as cloud data centers. The reason for virtualization of Hadoop is that the enterprise should first consider the scalability of the management infrastructure, recognizing that extending computing resources, such as virtual Hadoop nodes, can be useful for performance when data and computing are separated, Otherwise, if you close a Hadoop node, you will lose all of the above data or add an empty node with no data.

5.Hadoop can only run in data center

For the SaaS Cloud service solution, many cloud services allow the cloud to run Hadoop, SQL, which will undoubtedly help companies save time and money in data center construction investments. Especially for public clouds, Java developers can benefit from spring Data for Hadoop and some other GitHub use cases.

6.Hadoop has no economic value for virtualization

Hadoop has no economic value for virtualization

Hadoop is viewed by many as a fact that, while running on a commercial server, adding a virtual layer adds no additional value to the cost, it does not take into account that data and data analysis are actually dynamic. Virtualized infrastructures can also reduce the number of physical hardware, making capex (capital expenditure) directly equal to the cost of commercial hardware, while also reducing opex (operating costs) by automating and efficiently leveraging shared infrastructure.

7.Hadoop cannot run on SAN or NAS

While Hadoop runs on a local disk, it can perform well in a shared SAN environment as well as small to medium clusters, while high bandwidth such as 10GB Ethernet, Poe, and iSCSI also support performance.

Thus, large data has become the focus of industry chasing, the above seven major data on "misunderstanding" the objective view of the problem. Like different project requirements, Hadoop is a tool to help companies better deal with big data problems. Whether it's GemFire or sqlfire facing the data grid, or message-oriented RABBITMQ middleware, a complete SaaS solution is now much easier to implement than in a Hadoop environment.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.