Seven misunderstandings: Big Data and hadoop
For hadoop technology, it can be said that it is a legend in the open-source field. However, there are still some rumors in the industry. These rumors may lead IT executives to develop strategies with a "colored" view.
According to the IDC analyst report, data storage will grow by 2013 in 53.4%. at&t claims that wireless data traffic has increased by 200 times in the past five years, internet content, emails, application notifications, social messages, and messages received every day are growing significantly, which is why many large enterprises are focusing on big data.
There is no doubt that hadoop has become one of the main investment areas to address big data needs, and Internet giants like Facebook publicly boast of the success of hadoop, companies that are new to the big data field must also focus on hadoop. However, hadoop technology is a multi-dimensional solution that can be deployed and used in different ways. The following describes seven incorrect ideas about hadoop and big data:
1. Big data is only capacity
For big data, in addition to volume, variety (Diverse), variability (variable), velocity (speed), and value (value) are also often mentioned ). The key point is that big data does not increase in size. It is more about the future development of real-time analysis, structured and unstructured data and is used by enterprise CIOs for better decision-making.
In summary, not only big data analysis can gain value. For example, the value of storing and analyzing 1 Pb of ultra-time data may not be as good as that of analyzing 1 GB of data in real time, and obtaining value from "fresh" data is more valuable than anatomy of outdated data.
2. Traditional SQL cannot be used on hadoop
When many vendors put their effort into hadoop and laid out their market strategies, they knew that HDFS and mapreduce were limited by the ability to process similar SQL languages. This was why hive, pig, and sqoop were finally promoted. More enterprises manage a large amount of data through hadoop and SQL compatibility. pivotal HD is an enhanced version of hadoop optimized for Enterprise Data Analysis Based on the SQL parallel processing database and hadoop 2.0.
3. hadoop is the only new IT Data Platform
Speaking of the data platform, the mainframe has been a long-term investment in the IT portfolio, just like ERP, CRM, and SCM systems. In the big data era, if the mainframe does not want to be abandoned by the architecture, it must demonstrate its value in the existing IT investment environment. Many customers encounter speed, scale, and cost problems, use a memory big data network such as vfabric sqlfire to solve high-speed data access and facilitate mainframe batch processing or real-time analysis and reporting.
4. virtualization will cause performance degradation
Hadoop was originally designed to run on physical servers. However, with the development of cloud computing, many enterprises want to serve as cloud data centers. To virtualize hadoop, enterprises must first consider the scalability of management infrastructure and realize that the expansion of computing resources. For example, the separation of data and computing by virtual hadoop nodes will help the performance, otherwise, if you close a hadoop node, all the data above will be lost or an empty node without data will be added.
5. hadoop can only run in the data center
For SaaS cloud service solutions, many cloud services allow the cloud to run hadoop and SQL, which can undoubtedly help enterprises save time and money for data center construction investment. Especially for public clouds, Java developers can benefit from spring data for hadoop and some other GitHub cases.
6. hadoop has no economic value for Virtualization
Many people think that, even though running on a commercial server, adding a Virtual Layer will not bring additional value benefits, however, this statement does not take into account the fact that data and data analysis are dynamic. Virtualization infrastructure can also reduce the number of physical hardware, so that CAPEX (capital expenditure) is directly equal to the cost of commercial hardware, while OPEX (operational cost) can also be reduced through automatic and efficient use of shared infrastructure ).
7. hadoop cannot run on San or NAS
Although hadoop runs on a local disk, it can provide excellent performance in a shared San environment for small and medium-sized clusters, high bandwidth, such as 10 Gbit/s Ethernet, Poe, and iSCSI, also provide excellent performance support.
As a result, big data has become a hot topic in the industry, and the above seven questions about Big Data "misunderstanding" are objectively viewed. As different projects have different requirements, hadoop is a tool to help enterprises better cope with big data problems. A complete SaaS solution is now easier to implement in a hadoop environment, whether it is a gemfire, sqlfire, or message-oriented rabbitmq middleware.
Understanding big data: Seven misunderstandings of hadoop and Cloud analysis