Correct understanding of large data beware of data "bubbles"
Source: Internet
Author: User
KeywordsLarge data real large data understanding
Much of the data, which is almost deified by news media and academic conferences, has recently been repeatedly poured cold water. In early 2013, Gurieet Singht, co-founder and CEO of Avasid, a leading U.S. data analyst, hinted that "big data" might not be as reliable. He pointed out that the analysis of the data from the beginning of the query is a dead end, the researchers are only from the data collected to extract 1% of the analysis, and the 1% analyzed data is used to dominate the enterprise innovation and the formation of some ideas, this is clearly unscientific. May 2013, Ali Group Jian A "big data, you all understand the wrong" speech, immediately grasp the eyes of people, but also worthy of us to reflect on, what is the big data? Have you ever really caught the point?
We don't understand the real meaning of big data?
Gurjeet Singht's point of view has a solid basis. The speed of technological development is not matched by the explosion in the size of the data universe. According to IDC's latest report, people can collect about 1qB of data a day, and the size of the digital universe has reached 2.8ZB of data. IDC predicts the size of the digital universe will reach 40ZB by 2020. With the development of mobile technology and sensing technology, the ability of people to collect data is increasing, but the technology of identifying data is not so optimistic. For example, in the current data universe, a lot of valuable data, are based on the document of the unmarked unstructured data, people of this kind of data recognition, processing technology research has just begun. However, the so-called mature large data analysis methods in the market are generally based on valid data identification to collect data for analysis. According to this method, some data cannot be labeled because it is unrecognized, and therefore cannot be considered valid data, and they are discarded before being used for analysis. This problem resulted in the loss of a large amount of valuable data and was not exploited at all.
Jian's argument is also supported by a wealth of examples big data has been around for a long time, but it's not enough for the data to be "big", even if the European crash Lab, which has the most data in the world, would be meaningless to the public if his data were not connected to the Internet. Therefore, the study of the nature of data today should not be large, but should be "online." Online data collection is easy, for example, when the United States in the past to elect a president, need to do a Gallup opinion polls, the extraction of 2000 people to fill out questionnaires, and now, only on Twitter to analyze the status of everyone published, you can infer who the president is, and can quickly influence society. But it will take a long process to get the product and data together well.
workers--playing big data
, the chief scientist of New York start-up, Media6degrees, also gave a big hit on the way to being deified, "You can cheat yourself with data, but I'm worried that big data bubbles up." ”
, who feared that many would call themselves "data scientists," actually did not do their homework, but discredited the field. The big figures, Litcher, appear to be facing a labour bottleneck, as the existing big data experts are not doing enough to improve the speed of the data. A report published in 2012 by the McKinsey Global Society also showed that the United States needed 140,000 to 190,000 workers with "deep analysis" experience and 1.5 million more data-literate managers, whether retired or employed. This figure is undoubtedly huge.
managing large data is much more meaningful than collecting big data, how to ask questions, how to define a problem, and where to extract data? This requires professional data analyst skills, if the personal digital World algorithm is too simple, it will not achieve the expected description of the intelligence. Just imagine, if not digging out the real value behind the data, then the huge amount of data is only empty bombs, can not missed. Therefore, large data experts should be aware of the limitations and shortcomings of large data technology, cultivate experience and keen intuition, not only to listen to the data in the important position.
Maybe now most of the companies that use large data are only moving in the 1.0, 2.0 version, but the real big data age, perhaps to 3.0 version to achieve.
--, director, professor and doctoral tutor of Internet Science Center, Zhou
use the data to upgrade the version
in the new possible third industrial revolution, data, computing will play a role in materials, energy and advanced technology, and if computing is considered energy, it can allow it to enter into public life and flow like electricity, in a unified way of charging, regardless of where the calculation comes from, It's like we don't know if the 5-degree electricity we use today is from Dayawan or the Three Gorges. It is conceivable that cloud computing and other computing power in the future will become one of the most critical core strategies of a country. Data is one of the strategic materials, each enterprise, scientific research team, have the responsibility through some plans, purposeful collection, processing, analysis and indexing of data. However, the future of big data if you want to make great business, really advanced technology from more in-depth analysis, the need for a smarter mind, no longer equivalent to industrial technology during the Industrial Revolution, but with a smarter mind.
Of course, the big Data age differs from the previous industrial revolution in that its features are personalized and bring about great conceptual changes, as well as changes in business models. Zhou, director, professor and doctoral tutor, of the Internet Science Center of the University of Technology, summarized the application of large data in Commerce as 1.0, 2.0 and 3.0. "Perhaps most of the companies that use big data now are only moving on to the 1.0 and 2.0 versions, but the real big data age may not be achieved until 3.0," he said. "Zhou said.
What
1.0 shows is that the enterprise generates a lot of data through its own business needs, and then uses the data to optimize the relevant business through in-depth analysis. At this point, the data played a role in guiding decision-making.
The idea of the 2.0 and 1.0 editions of
's big data has changed. 2.0 emphasis on the extension of data, in addition to their own business and solve their own problems, the data itself has the ability to solve other problems, more capable of gathering other data to solve their own problems. This requires the enterprise to collect a large number of heterogeneous data directly or indirectly related to the target business, and to establish a complex analysis and prediction model to produce output for the target business, at which point the data itself is decision making.
3.0 may lead you into the real big data age. The 3.0 version is more concerned with the quality of the data, the data is good, how much value, exchange how to pay, especially the privacy of data security. At this time, there will be similar to the telecommunications operators of data operators appear, so that all academic groups, enterprises, Governments, can use large data, this is the real big data era.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.