The "island dilemma" of large data

Source: Internet
Author: User
Keywords Large data Tencent said

Is it possible to integrate "island" data in different enterprises and services to establish a data exchange platform while protecting personal information and normalizing the data?

This is beneficial to companies, individuals and even the society as a whole. But for business entities, big data is now part of its assets, and sharing data could mean losing its competitive edge

Take out a pen, in our country map to Heilongjiang province's Heihe, Yunnan province of Tengchong to draw a line of two points, the right side of the line only accounted for 36% of the land area, but living in the population of 96%-this is the famous population geographer Huanyong in 1935 found in the "Heihe-Tengchong line", also known as the Huanyong line, It is of great significance in geography and demography in our country.

"This is the big data 80 years ago. "July 25, in the" Big Data Link Future "forum organized by Tencent Internet and Social Research Institute, Tencent, the deputy general manager of the product department Shenyacheng showed a number of QQ at the same time online, and the" Heihe-Tengchong line "In contrast, the results show that the two pictures are strikingly similar.

Shenyacheng further, through the collection and excavation of large data, can meet the needs of government, enterprises and individuals, such as through the prediction of large data function for its decision to provide a certain reference.

Scale is not the sole basis for judgment

In Wikipedia, the big data is defined as: The amount of data involved is huge enough to be able to intercept, manage, process, and organize the information that human beings can interpret within a reasonable time; the entry of Baidu Encyclopedia is expressed as: large data, or huge amounts of information, Refers to the amount of data involved in a large scale to be unable to pass the current mainstream software tools, within a reasonable time to achieve capture, management, processing, and collation to help the business decision-making more positive purpose information.

Professor Wenjirong, vice president of Information College of Renmin University of China, said Wikipedia and Baidu Encyclopedia of the definition of large data is basically concentrated in the "big" concept, but did not reveal more profound problems.

"Big data is first and foremost an ability to make judgments and predictions." "Wenjirong explains that the premise is based on mastering the technology of this massive data collection and storage and processing, resulting in a new ability to judge or predict."

In fact, the so-called big data does not have an absolute quantity, can not say that the number of 100 T is not large data. Large data is mainly related to the size of the problem it applies. "Wenjirong explained," That is to use the data on a problem, the size of the problem, especially the size of the sample space, will determine that these data is not enough. ”

Wenjirong said: "If a data can adequately cover the sample space of the problem, it is big data for this problem." This data is large enough to cover all possible situations with corresponding data. ”

Talking about scale and quality

Experts cautioned that there is a phenomenon in the process of studying large data: Many people tend to think that the data is big enough, but ignore a problem, namely the quality of the data. If you take a bunch of unreliable data to do some so-called statistical analysis, the result is very dangerous.

"Traditionally, when we do a lot of statistical analysis, we especially emphasize the unbiased and random nature of data sampling." But today when we use large data seems to forget this point, I think as long as I collected a lot of data simple statistics, because I am a large data, I am a full sample, do not have to go to the quality of the data, which is undoubtedly a very dangerous trend. "Wenjirong thinks.

Wang, a researcher at the Internet Research Institute at Oxford University, warned at the forum that there are two major risks that large data can bring: One is misreading the data and the other is the deviation of the data.

Primeton Data product Director Xuan earlier in the media interview that some companies are based on data analysis to make some marketing trends, but if the data itself is wrong, the analysis of the conclusions may not be useful.

In the industry has been such a saying: if the data accuracy of 60%, the matter will certainly be scolded by the user, if the data accuracy of about 80%, users will say "not bad", only the data accuracy of 90%, users will feel real cow.

"Information Island" needs to be broken

Tencent Corporate social Networking business Group president, Tencent senior executive vice President Tong Dawsong also put forward in the Forum "Information Island" problem.

Tong Dawsong that the data we use today, most of them are collected by different enterprises and different services, that is to say, they are captured on an isolated island, which is contrary to the large data is a very important property-scalability, and the development of large data, but also makes scalability seem more important.

"On the island of information, every company may have its own cloud, and it's a lot of challenges to integrate the data on different islands to create a more integrated scenario to benefit from." "Tong Dawsong said.

Tong Dawsong said that the question he has been thinking about is whether it is possible to integrate these data on different islands and establish a data exchange platform, while protecting personal information and normalizing the data.

"This is beneficial to companies, individuals and even the society as a whole," he said. But I know the difficulty, because for business entities, big data is now part of its assets, and sharing data may mean losing its competitive edge. "Tong Dawsong said.

"One of the major bottlenecks in the development of large data is the competing balance of all aspects of the industry chain, for example, the Internet of things and intelligent city, these concepts want to fall completely without large data, but to achieve such a macro concept, in fact, does not depend on a certain enterprise or even any industry, but the whole social resources across the integration and balance. "Dr. Meng Zhaoli, director of the Industrial Economics center of the Tencent Internet and Social Research Institute, said.

Meng Zhaoli suggests that there needs to be a cross-sectoral data-sharing pool, preferably a pool of government-neutral third parties, with leaders from all sectors.

"This will inevitably lead to a number of both competitive and cooperative partnership of enterprises will be involved in this ecological circle, at this time the most critical is the establishment of a reasonable management mechanism, so that more contributions to the enterprise can get some feedback, and less contribution enterprises can consider to provide some paid services." "Meng Zhaoli said.

Data security problems cannot be neglected

In the era of large data explosion, enterprises can provide the basis for their own business decisions through the development of large data business, but also accompanied by the test of data security, that is, how to guarantee their own and user data security privacy, has become the primary issue of large data.

Tencent Company Cloud Platform Department general manager Chen said, Tencent once made a security scan analysis on 90 of the electricity dealers ' websites and micro-credit public accounts requiring users to pay with credit cards or bank cards, and found that more than 60 of them had more or less security problems, including more than 20 problems, including stealing the identities of users, Malicious to consumers and other acts of consumption.

"So today, when I was staying at a hotel, if the hotel staff asked me to leave my credit card, I was very worried because there are so many security problems in the Internet products we face today," Chen said. ”

"To do a good job of large data services, we first have to solve the problem of information security." Especially for Tencent, the first challenge is the security challenge. "Chen said.

According to Shenyacheng, Tencent QQ users are producing a lot of data every day, such as QQ messages generated by users every day 15.5 billion, Tencent daily to deal with this data will be new storage 200T.

So, facing the massive data information, Tencent is how to assume the data "security" role?

According to Chen, Tencent has a complete set of security protection measures, from the operator's network to have a strong protection, the latter also through technical means to strengthen the user server, including the external application of the firewall, to help users solve security problems.

However, do not disclose user data is on the one hand, on the other hand, such as QQ, micro-letter chat tools, every day will produce a large number of information related to user privacy, Tencent will it also be included in the scope of large data development, and to the user privacy to bring infringement?

Chen in a weekend interview with the rule of law, said: "Tencent will not use chat records, will not use the content stored in the micro-cloud, will only share the content of the user to analyze." ”

"But even the sharing of this content will be graded, such as users in private circles to share information, Tencent will not be traced back to the user himself, remove the sensitive information section and then use." Chen stressed.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.