Huaijin, president of Beijing Aerospace University: Cloud computing gives big data new value

Last Update:2014-12-18 Source: Internet

Author: User

Keywords We big data very

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

August 13 News, Beijing Aerospace University president Huaijin Peng in the afternoon, "Hall of Fame-Internet trend Forum," said that cloud computing under the big data will produce new value. One is commercial social value, the other is academic value.
Large data not only change our communication model, but also affect our economic and social development model, more importantly, it may be an important means of future academic and scientific discoveries.
But he also said the application of large data would inevitably involve privacy.
Huai Jin Peng proposed that the site using large data can be a lot of personal information mining out. In the future when the big data more applications, Baidu can know you online behavior, you think of the possibility of concern. Taobao can understand your shopping habits, Weibo will understand your thinking in a certain area. Therefore, confidence and privacy are also issues to be concerned about in the future.

below is bosom enters Peng speech record:

Bosom into Peng: Thank the host, just before Mr. Wu made a very wonderful report on large data, I would like to report to you on the cloud computing and large data in some aspects of thinking. Mainly in two parts, the first Internet development derives and influences a new model and data as a current and future focus. The second is some thinking about cloud computing and big data research.

We are all very clear, because the backbone of the six-month increase, and the cost of the area is 0, this is also the famous guilder (sound) Law, this 20 in the calculation of reserves and calculations, the calculation of the speed of storage capacity, memory hard disk prices fell 45,000 times times and 3.6 million times times, Such a data gives us a simple understanding of how one data gets the bandwidth and the cost of calculating and storing the other data. So as in the current Internet application, we actually enter a new era of better data services. Because bandwidth has become a basic cheap cost, we do not worry about the internet need a lot of money, communication over Moore's Law, we are all online everything is the calculation of the Internet, so that the IT and communication areas are further integrated into the business. So it is difficult to distinguish between iphone, Google (Weibo), Yahoo, including part of our Microsoft business, it is hard to say whether it is traditional it or the content of new services.

There is a saying in this field, the pioneer of the Internet in the 60 's was also a psychologist called Reed (voice), he said the future of computer Internet He has a vision, the idea is to connect all the world's computers, to enable users to use remote computers, to achieve two functions: the first function how to obtain data, The second function how to use someone else's computer. For the first feature and the ongoing use of multiple computers, the remote aspect is in fact always the direction of research and practice in the field of computer and information science, the first information appears, the emergence of TIC, which allows us to serve the content of data anywhere in the world, this thing is important in the computer world. On the other hand we know that the 780 's has a more important development is the operating system, the operating system is actually by the computer to the traditional professionals can more to the ordinary non-computer professionals to use the computer. The important contribution of this thing is that we can not know how to manage memory, how to calculate how to allocate time, such as stand-alone and mainframe management, also provides the technology and methods. Due to an important breakthrough in the operating system, there have been two Turing awards in this field, one for the IBM360 and one for the (English). These two persons were all contributed by the operating system to the Turing Award.

computer an eternal theme is to explore how the data processing ability is better, how faster how to stronger. Around such a field from the past so-called peer-to-peer now cloud computing and mobile computing networking and Intelligent Earth Application model, are the database as its important core. To emphasize data resources for more efficient use by transforming past computing patterns into the cloud into other ways. In this process, whether from the point of view of science or the application of a certain field, or from the transformation of computing mode, for a certain type of computing terminals are better computing power. Our low bandwidth makes it cheaper to surf the internet, which means that the Internet provides us with new opportunities to access the Internet as a wider computing platform from simple communication platforms.

What the cloud is a content we all know, it is based on data center, emphasizing cost performance, efficiency, and credibility of the new service operation model, which is to improve the high-end computing utilization, while improving the processing capacity of low-end computing things, we do not pay attention to the capabilities of their own computers, more provided to the background, by the strong background processing capacity. We see that large data is based on the application of cloud computing patterns, and may not be entirely effective in service quality assurance. Just like anti-revolution and focal, if you want to eat well, you have to pay your cost to eat better. We normally provide 5,000 people in the upstairs of this meeting, and suddenly 50,000 people have to eat. A simple way for example nutrition to maintain food problems, cabbage stew tofu, not hard to say, but to provide basic services, to a certain extent, cloud computing is not able to provide a high-quality service capabilities, the application of the environment continues to develop a large-scale application of the Internet in the process of a computing model.

Basic Application Mode It is the first time that information technology can be used as an infrastructure in the future, then its core technology just before Wu, the academician presented a TB way to provide more isolation in order to provide more efficient services. Introducing new computing tools and capabilities for Internet applications, this decade from different angles of exploration, another Internet application requirements also provide new requirements for the cloud computing model, is to the front of the big data itself characteristics, large-scale, rapid change, kind of miscellaneous, in the social class, search classes have a considerable number of data types of performance, And many aspects of our social life. Once Turing Prize winner said 18 Double, the past data is deterministic data, the current is the human-machine fusion data diversity and heterogeneity is the current data is particularly important characteristics. The data now goes well beyond 18 months, and higher data is developing.

last year's big data became a popular word in the entire information technology and society, and became the world's second most popular word and attracted worldwide attention. Last year it was a trend towards the development of large data on cloud computing. This figure gives the so-called private cloud and hybrid cloud of cloud computing, as well as future large data, which can generate industry space in the development. Its predictions are probably in 2016. The large data related industries will reach the 2000多个亿 scale, and the data become an important driving force in the economic society. Also made an analysis, the second half of last year for the entire global enterprise investment in large data research, has been invested in large data areas can be seen in the education, transportation and energy in these areas of more than about 30%, has been a factual investment, as the next step to continue to work there are many developments, Investment in big data and cloud computing.

This way we see because of the rapid popularity of the internet, a large number of applications in the Internet Computing services from mainframe to customer service to virtual computing performance, perhaps cloud computing is only a virtual computing environment, a way of expression, there are many. Including the internet of things we say, and so on, as time and application patterns change, this noun will be many, people to its understanding and request will have many changes.

the second part of the future of the Internet as a common user is more want to become a large computing services platform, the equivalent of our own laptops, desktops on their own use of their own system resources. The need for configuration is equivalent to having a large computational operating system to manage this system in order to realize the conjecture of the future Internet at that time, allowing remote computing remote data resources to work together to accomplish the services you need.

This model is actually a way of thinking about the Internet both from the center to the more efficient connection to the terminal. We say that the internet has changed the way we communicate, and big data has now changed much of our economy and our lives. Mr. Wu's report contains a particularly convincing example of how big data is not only a means of communication, but also a change in our economic and social life. I have a few examples here, such as Google in 2007 with 2 trillion words training language model, this is also a big data produced very good results. We know that medical ks.bs research is good for producing new drugs. Predicting H7N9 influenza outbreaks using 450 million simulations, and finally weeks earlier than the traditional CDC forecast the type of area, Alibaba Baidu work has a lot of outstanding performance, mainly because they have a strong, real operational data provided. We are in a situation where Baidu and Google make it possible to study and analyze the behavior of each of our individuals online. Taobao and Amazon can be familiar with the user's shopping habits and social interaction habits. Like Weibo, there are many changes in the way we understand our social thinking, can be seen from such a few data examples can see the Internet as the first phase of change communication mode, the so-called depth mining into a new way, and this data macro-statistical analysis also changed our past research to know the reason why, And not only the analysis.

Third, the new value of large data under cloud computing, this value is more meaningful is academic value, communication theory practice is the three means of scientific research. It is similar to the way we see many of these studies being done in such a basic manner. Now many scholars predict that the data-intensive calculation becomes the fourth kind of scientific research mode, which promotes our understanding and understanding of society and nature. This change is a commercial social value, on the other hand, academic value. To give our computer researchers a challenge is to be in a period of transition and new change in software and theory. I'm here to make a rough understanding, from the beginning of computer discovery we revolve around scientific computing, the second stage we are in the business phase, now assuming that the simple pronoun of cloud computing, we know that the basic problem of computer research in the past is the algorithm and complexity of Turing machine, business research process and data processing, Cloud computing considers data science and data theory. Scientific computing and data processing as the basis for advancing the development of the database, in the large data in the data science, now we know that hpdoop,mis this is just a discrepancy, there will be many ways to advance the work.

from the Computer development transformation and the process of new computing model of new software theory and complexity, as well as software systems Internet software has put forward a lot of challenges. The first big question, for example, is the ability to service software and data. Because the complexity of software has surpassed our traditional software in the past. Rather than functional properties, we're done adding and subtract this way, more consideration of the quality of service availability, and the current Internet applications we used to be very expensive for software maintenance and production costs, no need for complex system configuration at present, no more management for terminal resources, no need for your service object where, Just focus on what services you need on the Internet and what resources you need. So the software in cloud computing and the way we have traditionally seen software research have changed a lot in the way it spreads and how it is maintained.

In the past, we considered stand-alone or simple local area network machine changes, under the Internet how to carry out software development, computing platform is not a simple small AP such software, future application Mode we want the Internet as a complete computing platform, so for future software opportunities we may now start to taste this effective mode Is that the user is the developer and the consumer. Operators of data and services that upload their software to upload data services are actually provided by data service operators to support configuration integration, development and application of software services, perhaps in the future like telecom operators, service operators will become an increasingly important new computing platform for the Internet. At the same time in this computing platform we see the technical challenges of data processing itself is very much, such as the United States Presidential Committee report, in the past 10 consecutive years, has been creating more than 1 billion dollars in the computer industry mainly by data processing these areas, parallel database, data mining tools in the traditional data processing capabilities, Has not adapted in the cloud computing processing, 2010 annual processing has 70TB and is compressed data, such a large number of data with traditional data not only can not save, and the price is expensive. Now Yahoo hpdoop node, a year down more than 4,000 nodes, is not a simple single database, comprehensive data aspects over 3,000 nodes, for such data processing requirements and current technology to provide such equipment look, it should be said that this area will have important challenges and opportunities. At the same time in the data processing itself, the cost of maintenance, the cost of data update and its own data maintenance patterns are many different.

Therefore, there will be unlimited vitality and new technology challenges in such a field. Here are a lot of problems, such as we have seen the way of MIS, the equivalent of all data to the same time processing, recently found that there are many problems in the application process, two years ago Osdi to the MIS further upgrade consideration, the data increment computation has many limitations, to new algorithm new problem has many limitation, Large data algorithms face great challenges, not only from small to large numbers, but also from more fundamental changes. In the data processing computation support, simultaneously to the large data computation model, the Distributed system structure, to the data mining, the forecast aspect all is now the technology cannot completely solve. In the future large-scale data processing support platform is also an important problem in cloud computing and large computing. Especially in our past calculations, which are passive, in order to calculate the initiative, in the transformation need to store computational linkage, as a new architecture and new way should be said to be also under research and development, how to effectively store and compute, especially based on data-active computing as a new organization and processing platform design becomes an important problem.

We mentioned earlier that large data not only change our communication patterns, but also affect our economic and social development model, more importantly, it may be an important means for future academic and scientific discoveries. This important tool has also changed our past computer research a lot of new horizons and space. For example, we have been sampling in the past since the 50 's, industrial testing by sampling, large data is not done by sampling, for example, we have a taste of cooking, we have prior understanding, the heating area even, when you try to taste although the partial analysis, but the overall data are sure. But the large data under the local data is not possible, stir-fry the process of constantly adding new dishes, so local processing has not guaranteed us the integrity of the data.

we go to buy shoes do not run all over Beijing shoe stores to buy shoes, we usually with our cost calculation will not achieve the final absolute cost.

in medical care it's hard to say that a doctor puts the cause of the problem in a very good scientific way, more of a relational relationship, and a map of past experience to our understanding of current treatment. Big Data brings us a lot of new research and challenges. We're doing the computer. The data base is our discrete data and our traditional data of 200 years ago, and now we are confronted with the understanding of the analysis of new statistical data not only in the last hundred decades, especially in industrial development, but also in the application of mathematics and in the whole range of our computer science.

's big data on cloud computing is quite significant for scientific research, so I understand our problem with big data! On the characteristics of the four-dimensional society we have more understanding of how big data affects computing itself from a research perspective, which I mentioned earlier how to get into incremental computing, how to get into non-deterministic computing, how to study inductive calculations, and change the way our computers do systems, turn a problem into n questions, Each question thinking represents our solution to the problem. Large data due to increment and more uncertainty, we need to do a comprehensive analysis of the method of induction, for the scientific calculation of large data, we consider the continuous increase in data growth, especially for high real-time requirements, its incremental calculation, and the past reduction method combined with the new calculation. We see these problems as large data calculations and new problems, and this is also our understanding and recognition of future data processing.

The second question is about the basic problems of computer science. We're going to do it in the computer. For computer people, not all problems can be counted, only can calculate the problem is worthy of computer calculation, as we keep secret, encryption to 10 of the time, Children's Day times in a certain period of time can not be counted. We're studying basic questions. We say that there are five cities can not repeat every point, this is our traditional said TSP, there is no one algorithm allows you to complete again and again do not go other road, such as circuit board design, there are many can not be calculated, the algorithm is we engage in computer basic problems, is not able to calculate.

since there were computers in the past, it should be said that algorithmic research has always been the fundamental problem of computer science, I only listed here from the 70 's to the 90 's 10 Turing Award winners, they in the algorithm and important historical phase of the computer won the highest award, called Turing Award. found that some of the problems are not, we know that the 60 's the United States did a long-term research work, one and the treatment of cancer, the moon landing program is as important as the plan. The research of algorithm is the important research of computer, and the computational complexity and algorithm of large data have new problems. The basic reason we are very aware of the amount of data is so large, so the machine and algorithm storage capacity is full. Therefore, as a computer workers faced with large data and new computing models face new problems, data can not be calculated and stored under the new means of support.

we used to study the problem. I'm here to report to you, the world's fastest hard disk read speed is 6 GB per second, this is a linear scan, scanning a PB of data to be nearly two days, an EB needs more than five years, and Baidu one day processing page number of 10 Pb, 19 days before you can scan it. It is clear that the work in this area is a big disaster, but there are many new opportunities for research. Above me this picture is the world's fastest scanning device to read the fastest disk, to 19 days to complete its scan. Such a problem large data transmission must be a difficult problem, so obviously know that large data has brought us new problems, its traditional computational complexity in the immediate requirements of the scan 1.9TB, we need a minute to see a data, how to define? What's the analysis? How to study? Brings a lot of problems. These issues should be an important issue for computing complexity algorithms over the past 50 years.

I am here to give a picture of what was discussed at the famous conference 12 years ago. We see that the longitudinal axis is the accuracy rate of the test data, the horizontal axis is the data scale, as the scale increases, the difference between the good algorithm and the bad algorithm is not too big in the small data sample. Time is bad algorithm I see the longitudinal axis of 1 coordinates below 80% of the recognition rate algorithm, with the data scale 10 times times 100 times times 1000 times times change, has been close to the best algorithm, concise algorithm, etc. to do processing. This brings us to new problems in the analysis and design of our complexity. Our scientific problems in the second case we not only used to consider algorithm research F, income is s, use F to do function is a result, in the past to consider how well the F design can be, now s is not the amount of change but the quality of the changes, its impact on the algorithm, small data algorithm is particularly important, The algorithm data under large data are greatly affected.

This must consider how the algorithm and the data itself are dynamically changing to find and that it best approximates the nearest-most effective method, considering how to find the equilibrium point in large data calculations, which requires us to consider the number of data-quantity algorithms, Now the F and S stack up to consider the problem for our new system design is very much, so that we have a lot of problems in computing.

The third problem is that large data is not able to represent the data, most of the new data, in the current use of not make, when we are one-dimensional 10-D to 30 million-dimensional data we deal with how to express. So we need to get out of the traditional computing model, and second, we can recognize and quantify the characteristics of high dimensional space extraction and computation.

We see that we are writing micro blog is very simple, to the background of the computer processing your existing way to continue, this is also an important research opportunities, representation, computing, heterogeneous than data. We can now compute only simple storage, especially every day in the past will bring many new problems. There are still new problems in the analysis and mining of data in cloud computing, which is of large scale, variety and rapid change. For example, we now have four micro-blogging system in China, for the past excavation only in a large micro-blogging system, the same thing to understand the same things I use language in words with the voice of the image in different ways to express the linkage between the cross, how to migrate in different areas. In the past in a single excavation, small space to do the data is very beautiful, in the wide range of expanding the size of the data processing capacity, analysis of comprehensive capabilities have encountered many problems. So it's important to understand and analyze data. Given so much data the results of your analysis are effective and credible, so the understanding of the data itself has changed, the visualization of the data is more important given us to the multivariate data heterogeneous classes of data to give intuitive visual results, this is also our large data model for the study of the problem. For cloud computing, large data and cloud computing itself, we think cloud computing is a computing model, the underlying processing is important, as the application of cloud computing service quality will be important to the content of the research, mining effective information, correcting uncertain information, and can combine the diversity of data, It may also be the next big data under its quality of service a new challenge, including intelligent search, we used to be keyword, document search into the social network, actually began to enter (English), this new search model is also the development of all Internet companies important content.

fourth important question is about credibility and privacy. I'm here as an example of a couple of years ago when there was a company where he could follow the habit of surfing the internet, except that the name was not specifically tapped, and he knew where he was the architect, the demographic structure of his family, and his recent buying habits. We know that Westerners ' birthdays are directly related to his buying habits, this data is absolutely personal privacy, you are on the Internet any one unit, provide home address and personal birthday number is illegal is prohibited, so through such a website analysis we can see that he can give you a lot of relevant information to dig out. At the same time in the social network is also a lot of use as sensitive information discovery, so the future when large data more in the analysis and application, we know that Baidu can know you online behavior, you think of the possibility of concern. Your behavior on Taobao knows your shopping habits, and you know more about your thinking in this area, and about trust and privacy as well as future concerns.

If we have ever had the rapid development of the Internet based on information services, then the future around large data or the name is also called Cloud computing, then the new virtual computing model is important, the basic symbol is that data services become more and more industry technology and research important content. Changes in the computational model may lead to changes in the times. As we explore the new value of China's Internet and its scientific or industrial value, I think we have a lot of opportunities here, and I think we will work hard to explore it.

Thank you!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More