Large and false data (i)

Source: Internet
Author: User
Keywords Big data big data we big data we now big data we now this big data we now this so

October 24, China TMT International Chamber of Commerce in Beijing Fairmont Center held a forum on the theme of "Big Data big impact", will be well-known it commentators Xie on the current speculation on the Big Data topic, made a wonderful speech, causing the audience enthusiastic response. China Broadband Industry fund chairman of the site reviews, "This is my six months to hear the best about big data speech." China Cloud Network will address the finishing, the full text as follows:

Well-known it commentator Xie

I have written about 10 articles on large data from about half a year or so, and published it in magazines, which is just out of my hobby, about the Internet, about the development of big data, I didn't think this response is much higher than the past experience. It should be said that the concept of large data has just begun to come out, but from my recent paragraph is often a variety of agencies, various departments, enterprises call to attend, make a speech, the most important confusion is that we are interested in large data. A mouth, the opposite, completely not a point of interest. So I would like to stand in the history of Internet development, and investment point of view to introduce you I want to understand really large data, or half-truths, or large data links, or false large data.

In the beginning of the internet in China, 1999 to 2010 I call it the Internet controversy, 1999 to participate in brokerage and investment in the field of the meeting, when someone can count out China 80 is the Internet concept stocks, very frightening, the results are self-evident. From 2003 to 2005, when the Internet really saw real money, we all survived by the leniency of telecoms companies. But at that time the concept of ISP, a mixed bag, the final chaos of this business, but also a conceptual confusion. 2007 to 2009 is WEB2.0, the final dust settles, China unexpectedly no one like Facebook company can stand out, I do early in the school, the current flow row 12th. In fact, there are also about the true and false new media, true and false group purchase, now the aftershock is not flat, about tens of thousands of group buy 2 billion or 3 billion, now a feather, so I think this year, including the following two or three years big data will be involved in a melee.

Because we are from investors, entrepreneurs, operators or media, we like to see macro, Micro, local, this is a good thing, afraid of you stones, all the way to touch the more miserable. The discussion of the "tense" of large data is still different, and the friend with the view is modern and more in progress. In the past, large numbers have been in the fields of physics, biology and medicine for decades. This concept itself listens to poised, big data, adjective, not strict definition, like to say the past when most of the academic people, intellectuals, academia, they have a pure discussion data how to do?

When completed, "We are able to provide a complete large data solution from hardware to software to methods" he said casually, I was ready, you provide money is, I am not saying that the road is wrong. Or I discuss the issue is this discussion, I never put the innovation and technical barriers as a prerequisite to the front, I assume that the technical conditions are available, we recall several times the Internet innovation, technical conditions are there. So many internet companies only you have good ideas, they will be able to keep up.

In progress: We are increasing investment in data acquisition, storage, integration, mining and other aspects of strenuous efforts. It's all about internet companies, or telecoms companies, and he means it's all in the bag. These are the Internet companies, but these three kinds of I personally do not agree with, I am using the future, large data is not only the future of the network industry, but also the entire social and economic development of the future, now should start to discuss, study, try this direction of the problem.

I will take it as the unknown, as the direction of effort, as the future to discuss. I do not dare to say that I am right the other three kinds of is wrong, but the relationship between each other, I personally think that the past is at least the completion of the affirmation is wrong. Our TMT Chamber of Commerce will not be interested in archaeology, not in studying history, but in the future. There is also an observation point is to discuss the timing of large data, this is what we said in the third article, very coincidence, 1996 Yahoo listed, 2004 Google listed, the 2012 Facebook listing, the interval is eight years. Yahoo in my opinion solve the problem of Web page aggregation, the pages of the Dale, he thought of a way, portal solutions can be categorized, you do not have to find, by our team of experts to edit it into a category, all-encompassing portal, this era back, I call it the Weo1.0 era, to Google, he said I care about you what web page, I put it to subdivide a level, directly grasp the keyword, the keyword according to an algorithm, aggregated, so that people efficient access, this is much more efficient than the way the Web page aggregation, but essentially and the page is the same, from the content of the path. So it's called WEB1.5. Facebook changed an angle, called users to aggregate, from the point of view of the user to contact, engage in relations, according to the true dissemination of information channels to find a way out, this is what I call WEB2.0, it represents an era. In another eight years, I'm pretty sure now that we're going to go into the next phase: Data aggregation, I can start with the data to put things that have been online, including material things, everything in the world can be data, we get it up, I think the next big innovation should be this. In addition, the 2008 economic crisis, now in retrospect is just Facebook created platform separation, application platform. Is it a sign that it is not a good time for you to do routine gestures? and to find ways to participate in a major innovation is the way to live, so I think the recent discussion of large data is very hot, people from all walks of life have come, even senior government officials are very concerned.

When Yahoo went on sale in 1996, Google was not set up, and Facebook was not set up when Google was listed, so we could speculate that Facebook was on the market this year, and that the company that led the big data innovation was not there yet.

The innovation difficulty of large data in PowerPoint presentation

What is not big data? I can categorically say that the data is not equal to large data, most people call the data large data, the past how many k, and then how many trillion, now with T. People wait early, there are more than 20 names, this simple number of growth is really quantitative and qualitative change it? Does the quantity itself have a gap? Most of the existing equipment and technical methods can be processed by large data, not large data. Data mining, fine operation, precision advertising, personalized service, and promotion are not major parts of the future business model of large data services. You can imagine the cost of his fine operation, presumably offsetting the revenue of fine marketing, essentially zero. Will not have to push back, play again the effect. This is now the most common large data definition, three v definition, diversity, volume, speed. I followed that logic first to an operational definition, such as diversity, what is called diversity, name, height, which is constantly describing a person's refinement, "more" is not diverse, so I said that the first data sources are diverse, just now our leadership has specifically talked about this issue. Large data must be open and must be public. No matter how big a company, we Tencent is the largest in China, you have to know, his data is quite homogeneous. There are considerable limitations, there is a considerable degree of intervention, if the cross out will be much better, such as Tencent and Baidu, now two major companies, your two company data is fully convertible, sharing, it is interesting.

Now the U.S. government's data-opening policy, 40 countries, the United States and the United Kingdom is very interesting, reflects the only we in the real world can not break the boundary, at the data level we broke. Large data sources are likely to be diverse, and I'll talk about that later. The diversity of data types, the change in data form, there are text, voice, graphics, pictures, video, information and data are different, there is a definition of information is data, but there are quite a lot of things we can not solve, can not be called data, just information. The diversity of data objects, personal information, personal data, business services data, social public data, as well as natural, material world data, only to investigate the diversity is in this sense, rather than a family of their own things more and more fine, this data can only be life. The higher the diversity of large data, the greater the potential value.

Volume is very simple, and now we're talking about big data, at least TB as the basic unit of measurement, I was in the United States to school, work, eat to do data analysis, that time basically a G on the helpless, now TB is a can do, the cost is not so high, many existing manufacturers have solutions, is not another paragraph to Pb, perhaps. Now if you look at Facebook and say 500T of data every day, Google says I'm three P data, which is the concept of volume. There is another concept that I value very much, the relationship between data and complexity, this extremely complex to do modeling, do data mining should have experience, for a while Google to find I hope I go, at the end of 2008, their model is the model group, large model matrix, 62,000, can be arbitrarily linked, Zhang, Li Shijian Countless models, this concept should be placed in the concept of volume, the complexity of the relationship is also placed under volume. The larger the volume of large data, the greater the potential value.

There is also a speed, I write two of all use Moore's law, in fact, Moore's law in large data still valid, One direction is the data species doubling every year, the volume of data is doubled every year. The back is basically a variant of Moore's Law, unit data acquisition cost halved each year, unit data storage cost per year halved, unit data utilization cost halved every year, if there also doubled, the cost doubled, that is impossible. The higher the growth of large data, the greater the potential value.

Large data recognition There are several misunderstandings, the first is only from the quantity said, you see the data growth, so that you can not be underlined, it is impossible to tell the difference between ordinary and large data, a T and a P data essence what is the difference? Just because of the big? H-P said no problem, I have a p and a t like, solved.

From the background of industrial development and social progress, it is not important to discuss large data. As long as you do this thing, technology can be known, always someone to find a way, privacy, ah, algorithm ah, there must be a way. It is hard to prove the value of the big data as a technical issue. The first wave of stars in our internet history, even the company that is now alive, is less than the real gold digger. So can not be separated from the industrial development, not out of social progress, especially the big data a bit like the internet has just begun, the internet for many years, the interconnection really started is the United States Gore lead. Last year, the U.S. government launched a big data national strategy, the United States Government to take the lead first, any use of the federal government a penny, the unit must publish data, spread to all developed countries, now spread to a large number of developed countries, Kenya, the Philippines, such as countries are starting to do this, So we have to start thinking about the social, economic, and broader benefits behind big data.

(Responsible editor: Schpeppen)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.