Academician of CAs: the scientific and technical problems of large data and large data

Source: Internet
Author: User
Keywords Big data we past
Tags analysis application applications bandwidth based behavior big data business

The first China cloud Computing conference was opened in June 2013 5-7th at the Beijing National Convention Center. The Conference, with an international perspective, insights into global cloud computing trends, and from the application, explore cloud computing and large data, cloud computing and mobile Internet, cloud security and cloud computing industry applications, such as the focus of the topic. In particular, the Conference has set up a cloud computing services display area, exchange the latest research results of international cloud computing, showcase domestic cloud computing pilot City development achievements, share cloud computing development experience, promote global cloud computing innovation cooperation.

China Electronic Society cloud computing Expert committee designate chairman, academician of Chinese Academy of Sciences bosom enters Peng

In the fifth session of the Cloud Computing Conference, the second day of the speech, the China Electronic Society cloud Computing Experts committee chairman, the academician of the Chinese Academy of Sciences in the name of "large data and large data science and technology issues" keynote speech, Huai Jin Peng first pointed out the concept of IT development-economical. In the past 20 years, the speed of computing and storage capacity due to the development of microelectronics, CPU performance increased by 3,500 times times, but the cost of memory and hard disk has been reduced by 45,000 times times and 3.6 million times times. Because of the decreasing cost of bandwidth, and more than Moore's law, the processing of data has also gone from local to network. In recent years, the hot technology, cloud computing is undoubtedly very consistent with this law.

Then bosom into Peng analysis of large data 4 V, he pointed out that large data is not just a huge amount of data, has a huge amount of data, and has the ability to process and analyze, mining the value of data can get the value of data, from the truth. In addition to the big data will bring us some two times value, from the idea of changing our understanding of the data, we need to obtain a trend, a prediction.

The following is a live record:

I am very glad to have the opportunity to share with you my understanding and knowledge of the big data. Maybe some of the content is too technical or theoretical, I am as simple as possible.

Prerequisites for Cloud computing

Big data has become a very lively thing, I this time mainly with you to share the current Internet big data and some thinking about the future problems.

Information technology from the application to see a flow, from the acquisition, transmission to the calculation of storage, to the final use. In the course of the past development, Moore's law gave birth to the rapid development of microelectronics, in fact, through the prediction to further promote technological change. Another is the Gilder law, which says that the backbone bandwidth increases 1 time times every six months, and the cost per bit will tend to zero.

In computing and storage, computing speed and storage capacity have increased by 3,500 times times the performance of microelectronics over the past 20 years, but the price of memory and hard drives has dropped 45,000 times and 3.6 million times times. When such bandwidth becomes increasingly cheap, the bandwidth of communication far more than Moore's Law, the single computer into the network computing, off-line into the online era. This is a very, very big change.

What clouds can bring to us, why to use the cloud

Why is there a cloud, and why is there such large data? When we get the information resources on the Internet become more and more low-cost, now the communication bandwidth development further beyond the Moore's law, so that the application of the Internet into the second value mining. Single machine into the network, off-line into the online, so that the maintenance of the terminal and the maintenance of the system began to enter a new era. We don't need to know where the service is, just pay attention to the services and resources we need to get.

Because of the application of Internet, especially the web2.0, not only from the way of the past technology one-way communication began to enter the era of two-way communication, but also further accelerate the development of the Internet and the creation of new capabilities.

Evolution of computational patterns

In the past, Internet mode has gone through three typical times, the mainframe era to the network computing era to the current cloud computing. We call it the Virtual Network computing environment, we are from the closed controllable platform into the open and not centralized control of the network environment. In the final development of the future, in the continuous development of broadband, the price of microelectronics is declining, and the ability to acquire resources to increase, there are new changes in the computational model. My understanding this is a change in the computational pattern caused by the constant change in bandwidth and cost.

If in the past we understood the first computer revolution in the 80 's, it was due to the PC era, when software became the first commodity, and began to become popular in the marketplace by buying copyright as an invisible commodity. By the 90 's, the application of Internet effective scale becomes the platform of Information acquisition and information exchange. For the third time, it is not this computational model that is driving us into the new era of unprecedented volume accumulation and qualitative change.

For any technology and product in it, the window of time is not long, such as PCs, mobile phones and the Internet, once the standard of technology or a certain scale of maturity, for followers there will never be a chance, or can only be on its bystanders.

In recent years there have been a lot of very hot and very effective ways:

1. First talk about cloud computing, it for us in such a high-speed Internet development process, to enhance the utilization of high-end computing and application, improve the ability to deal with low-end computing transactions and services, there will be important changes. Perhaps this computational model will further deepen our understanding of this.

2. The second category, also due to the technical support of interaction, social networks or social networks have changed a lot, and it is clear that, like Facebook, Renren, and now the scale of the internet is changing.

3. There is another category, which is our production control system, embedded system, sensor transmission system, brought us a lot of new and more important one kind of application mode. Of course, scientific computing is always the basis for large data generation.

But whether it's from business, industry, or scientific computing, and now we're talking about social computing, we have a new problem: where is the development of two of times and new challenges for the Internet? Cloud computing as a computing model, is leading us to play a real role, cloud computing behind the real application or the actual needs of the problem, people come up with large data is also one of the options.

What is the big data and what does it bring to us?

1.4 V of large data

There is a lot of talk about big data, from the explicit features, simply called 4V or 5V, from the scale and frequency of change as well as the type and value density of the angle. Wikipedia also gives an external definition of big data, and its data is large, but the existing methods cannot be dealt with. For large data, it is not easy to see it is called a data, more importantly, it represents the data from the quantity to the quality of the process of change, how we face. So it is not in the past we say mass data, massive data to large data, not simply from the size of the scale, but the quality of the changes took place, brought us new problems. Is the traditional data, from the static into the dynamic, from the simple multidimensional into a huge number of dimensions, and its type is we have no way to control.

2. Large data: value vs. flooding

In this context we know a lot of specific data and actual situation, but the Turing Prize winners have created the so-called data law, is doubled every 18 months. This huge amount of data is different from the traditional structured processing of data, which brings us many problems. Therefore, how to control the data, refers to the excessive data flooding or data is not easy like processing business data. Whether the data is flooding or we need to find a new way, in fact, there are many types of data. Some data do not deal with the relationship is not much, and there is not much value, the key to how we view the real, valuable data and the use of good this kind.

In 2010, the Economist had a topic titled "Data Flooding" or "Data Deluge", and it said there would be a lot of new problems when data went from scarcity to abundance. The issue of data economy has also been addressed in this topic, and a new issue has been raised, namely, that data has entered a new economic era.

3. Large data in production, life and scientific research

From the research of the past, we find out the new value from the discovery of the relation of the data and the statistic characteristic of the data. Therefore, the development of information has created many man-made data, unnatural data. Some of these data, especially those related to the economy and society, may give us a lot of inspiration. At the same time, there are many important contents of scientific value research.

How big is the big data? On Twitter, Japan's tsunami information was transmitted ahead of time to alert the affected information. Last July 21, Beijing's rainstorm, more than 9 million micro-blogs, the possible rescue plan in advance in the microblog release. About the Diaoyu Islands, reflecting the social information and emotional advice, how to deal with such problems more effectively. We also know that Google predicted the spread of winter flu and the flu in a few weeks before the 2008 swine flu outbreak.

We are very clear that only the CDC confirmed the symptoms of a swine flu, to the local data statistics in the national CDC. This time in the waiting, confirmation to report, must be two or three weeks. Google can find this thing, it is through the national, global analysis of user's typing habits and behavior of early warning. According to the user query symptoms and programs, as well as consulting, is based on the Internet online information to explore the social problems that may be faced. For example, like Alibaba, Ma Yun told me he had a premonition of the financial crisis. The reason is that in his e-commerce transactions, real-time transactions in the payment has been greatly reduced. The normal situation is that before Christmas, the purchase plan should be billed six months in advance. However, March did not, June did not, until September is still in decline, for our small and medium-sized enterprises, the manufacturing industry has arisen a new problem. There are also Baidu, 400 million users to analyze the personalized search provided.

We also know that in the West, using microblogs and social networks to create new values, such as the mood of the people or the impact on a particular stock, hedge funds can determine whether to buy stocks based on analysis of the business, and whether or not a listed company is bankrupt can help with financial analysis. In addition, you can find out what other people are interested in, and we have traditionally had some examples of putting baby diapers, powdered milk and cigarettes together as a way. There is also a corresponding strategy is to the infant formula and cigarette separated distance. One way is to steal, the other way is to ensure that more time to stay in the mall, stimulating consumption. These are based on the statistical laws of the actual behavior of the judgment and analysis.

4. The actual value of large data

In fact, the value of economic and social development of data shows how we classify and analyze it and make effective predictions. Therefore, having large data, having large, truly operational data, and being able to analyze and deal with it, may be an important force in our ability to continuously improve our competitiveness.

In this respect, the value of large data in the future investment and development, in fact, last year, Gartner forecasts, he believes that the development of cloud computing and large data will be important opportunities in the future. Of course, consulting forecasts are always risky. From his point of view, there will be a new round of great opportunities in the development of global data in the 2016. Similarly, he conducted another analysis, in the current large data investment field, gives the current more than 30% and future investment areas of the division, listed such as education, transport, medical, etc., in these areas may, and is already engaged in the behavior.

5. Large data brings about changes in social patterns and modes of thinking

From the past, we all know that the Internet has changed the way we communicate, the younger the more adapt to the habit of communication, send an email, micro-blog, micro-letter discussion. Will big data change our economic life? I mentioned some examples earlier. It is also said that with Baidu or Google, you can let us familiar with the user's browsing behavior. With Taobao and Amazon, we can understand the user's shopping habits. With the content of Weibo, the thinking habits and the understanding of the stage society will have different reflection. This is one aspect of changing our lives.

From another perspective, is it possible that large data will change the way scientific research is concerned? Academician Li made a very good report yesterday. There are three kinds of models in the past, theoretical research, experimental validation plus simulation or calculation. It is now starting to suggest whether it is from the past to data-intensive scientific discoveries. Can large data be a new or new way of human science in the field of scientific research? If this way can be used in future development, our way of thinking will be changed:

First, people understand that because of the 4V characteristics of the external data, we have a change in the concept of research methods and methods of large data processing, such as large amount of information. The method of statistical characteristics in the past is not fully applicable because it requires uniformity. Like cooking, we put the easy to mature, not easy to mature according to the order of precedence. If the pot is heated evenly, when you think a dish is fast ripe, may use a taste of the method, a taste is the concept of sampling, you estimate it ripe, and then on the table. Unless you have a very strong experience, an estimated 3 or 5 minutes can be judged in terms of color. Our assumption is that sampling is important and everything is homogeneous.

The second is the past from accurate to imprecise, when you want to buy a pair of sneakers, you may not run all over Beijing's shoe stores. In other words, we need not exactly calculate, compare price, style, all aspects of the past can make a decision. It's based on your conclusions about goals and trends.

The third is from causal to relational. They may have been unaware of the flu that Google has just mentioned, but they have established the trend and possible relevance. Just like a famous doctor, it is a way to see the Pharmacopoeia formula to support the patient's solution. But many are based on his experience and may not fully know why the same symptoms apply to him.

In these contexts, the thought patterns of large data processing may change, will it lead us to study the changes? How to deal with large data becomes an important issue. Therefore, from the above discussion, I personally feel that the explicit large data 4V features may have to be calculated mode changes, what will be the impact?

I think it's called approximation, from 4V to 3I, in terms of data. Approximation is that the traditional precise processing is no longer applicable, allowing for the pursuit of approximate solutions within a certain range of solutions. As I mentioned earlier, when you buy a pair of shoes, you will not go all over the shoe stores in Beijing, but according to your understanding of a certain goal and the trend of judgment. Incremental. The data is a steady stream of dynamic changes, traditionally with a closed assumption that all the data are fully recalculated. Therefore, incremental computation is needed in the dynamic variation characteristics of large data. At the same time, academician Li mentioned yesterday that the calculation of the past is a system reduction method, given problem A, which turns a into A1 until an. A1 to an effective solution, representing a solution. is now completely changed, because the problem unit data is insufficient, the need for deviation processing. Second, the way to deal with the problem is to take an inductive approach, since the implicit relationship between the majority is important. Just like Weibo, where you use audio, there is video, there is also useful text, the same way of expression is different, across different regions, even completely unrelated areas. Therefore, how to induce effectively is also an important problem.

From large data to large data calculation, we have the external 4V data representation features to understand 3I computing properties, this is my basic understanding of this problem.

Large Data vs. algorithm

Why is the study of large data to be calculated from the point of view of the first? Because we all know that computing is the nature of computer science. What we actually do on the computer is to always solve a formula G=f (X). F is an algorithm or software program, x is input and data, G is a program given input, processing the corresponding results. What has been the situation in the past 50 years? has been based on algorithms. The 70 generation of simple algorithm research. By the 70 's, multiple time algorithms have been discovered, so not all computations can solve the problem. The random algorithm was discovered by the 80 's because it accelerates. To the 90 's the so-called approximation algorithm, because the optimal solution could not be found.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.