The five misunderstandings of large data and the ways to solve them

Source: Internet
Author: User
Keywords nbsp large data misunderstanding data Warehouse

&http://www.aliyun.com/zixun/aggregation/37954.html ">nbsp; Some people think that the term "big data" is nothing but hype in corporate marketing. But even those who accept large data concepts need to eliminate some of the big data myths.

Gartner, the world's leading information technology research and consulting firm, has pointed out that the hype about the big data concept makes it more troubling for companies to choose the right action plan, but it doesn't help to eliminate some of the remaining myths.

For example, 80% of the data is unstructured, which is wrong, and as advanced analytics is just a more complex form of generic analysis, analytics firm Gartner points out that this is also incorrect.

In the two reports published by Gartner, "major misconceptions about the impact of large data on analysis functions" and "major misconceptions about the impact of large data on information infrastructure", focus on the large data on the analysis function and the impact of information infrastructure related to the misunderstanding, want to show large data related to more real situation. The following five misconceptions about the concept of large data.

Myth One: In large data technology deployments, everyone else is ahead of us.
While more and more companies are starting to focus on large data technologies and services, Gartner calculates that 73% of companies are investing in or planning big data technologies, but most are just beginning to accept the technology.

It is therefore alarmist to worry that competitors are using large data technology to develop rapidly. In fact, only 13% of the respondents actually started deploying large data-related technologies.

"The biggest challenge for companies is how to get value from big data and how to deploy large data technologies," Gartner said. Most organizations experience difficulties in the pilot phase because they do not use the technology in business processes or in actual use cases. ”

Gartner concludes that you are not behind. Develop strategies for actual tasks and work with it and business units.

Myth Number two: The volume of data is large, and small defects do not matter
It has been argued that, according to the law of SCM Numbers, independent data defects do not matter and do not affect the results of the analysis.

Independent data flaws do have a much smaller impact on the entire dataset than smaller datasets, but at the moment the volume of data is growing, and data defects are growing more than ever.

"As a result, the overall impact of low-quality data on the entire dataset remains unchanged," Gartner said. In addition, most of the data used by enterprises in large data environments comes from external data sources, whose data structures and sources are unknown. ”

"This means that the risk of data quality problems is higher than ever. As a result, data quality is actually more important in large data deployments. ”

Gartner concludes by designing new data quality management methods and selecting data quality levels. Strict adherence to the core principles of data quality assurance.

Myth Three: Large data will replace data integration capabilities
The enterprise wants to process information through a read-time schema (schema on read) and use multiple data models to read the same data source flexibly. This flexibility will help end users decide how to interpret arbitrary data on demand and achieve customization of individual user data access. However, most users actually use the write-time pattern (schema on write). In write-time mode, users can describe data and make content, and data integrity can be consistent.

Myth Four: It's pointless to use a data warehouse for advanced analysis.
Some people think that the deployment of a data warehouse is a waste of time when the Advanced Analysis feature can use the new data type. In fact, most advanced analysis projects use data warehouses for analysis.

New data types may also need to be refined to fit the data analysis. In addition, decisions are required for the relevant data, how to aggregate data, and the required level of data quality.

Gartner's conclusion is to use data warehouses to store manually collected data sets for advanced analysis functions whenever possible.

Myth number five: Data lake will replace data Warehouse
The data lake solution is typically sold as an enterprise platform for analyzing various data sources in the native format. But Gartner believes that data lakes are the wrong idea to replace data warehouses, or as an important component of the analysis infrastructure.

Compared with the already formed Data Warehouse technology, the data Lake technology is not mature and its function is not comprehensive. "The Data Warehouse has the ability to support multiple user groups." "Therefore, enterprises do not have to wait for data lake technology maturity."

Gartner concludes that data-Lake technologies such as Hadoop are used in existing data warehouses. Only by investing in Meta data management technology, tools and training can we create business value through data Lake technology.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.