Delaware Professor Gao Guangrong: Large Data core technology

Source: Internet
Author: User
Keywords Large data bdtc bdtc2014

"Csdn Live Report" December 2014 12-14th, sponsored by the China Computer Society (CCF), CCF large data expert committee contractor, the Chinese Academy of Sciences and CSDN jointly co-organized to promote large data research, application and industrial development as the main theme of the 2014 China Data Technology Conference (big Data Marvell Conference 2014,BDTC 2014) and the second session of the CCF Grand Symposium was opened at Crowne Plaza Hotel, New Yunnan, Beijing.

2014 The first day plenary meeting of China large Data technology conference, University of the Department of Electronics and Computer Engineering, Delaware, founder and director of computer system architecture and parallel system laboratory Gao Guangrong, brought the speech "core technology of large data system". Gao Guangrong mainly introduces the serious challenges faced by large data systems, the key technologies of large data systems, the innovation of data flow and large data engine, and the opportunities and challenges of the development of large data systems in China.


University, professor of Electronics and Computer Engineering, Delaware, founder and director of computer system architecture and parallel Systems Laboratory Gao Guangrong

The following is a transcript of the speech:

Today we have heard a lot of reports, big data to the highest point, distance to the real landing still a long time. I just want to make a summary of some of my recent work and my history, I don't have a lot of advertising capabilities, but there are probably one or two slides for me to illustrate. My speech has two aspects: the first big data hype Cycle's latest forecast, and the second how to combine big data with cloud computing. This big data can't fall from the sky, it's impossible to cut it open and say I suddenly have a way to do something about what I did in the past, it's understandable, but not necessarily realistic, to do it in a different way. In this case, we say real-time large data, must not forget the long-term accumulation in high-performance computing these accumulation, especially in China due to various needs, our work on high-performance technology is obvious to all. The Changsha machine is still the world's top one, which shows that we are a frontier in this area, I did not see Yang Xuezhi here today, but I see several other colleagues, the big data is actually to say the relationship between the two things, HPC and large data. Large data has actually exceeded the highest hype content, the next step should be gradually to the landing stage, and then there is a stable development. It goes on to say two things, one is just what I said, HYPE cycle large data, attention is reak of inflated expectations, with large data inflated expectations, we say too much experience too little. I have a friend in Shenzhen said big data knowledge mining and so on, the most important to him is money digging, can we all of these technologies finally become real effect, this one we have too few examples. He also has a third piece of advice, and the third is to never forget to work with small companies. His proposal is for the CTO of Big companies, CIO, so I think these three is a good summary, is currently in hype cycle this direction.

I have two left to illustrate the current challenge, which is to say that the engine should not revolutionize innovation, or to see the revolution which is a little stronger than me, I hastened to repair this, I can not more than a, more than a after a little better than me, or that the country as a whole to do revolutionary innovation. I don't talk about the history of the computer, I just want to talk about another area of history. More than 100 years ago, the problem with the plane is whether the aircraft engine can apply our concept of the engine to the car, do not need to revolutionize the concept of this thing can be done, or I do not need to do this plane, and then you found a new model, this model is this flight dynamics, From the ground dynamics model to the spatial dynamics model, we can really build this engine, not to revolutionize the car engine, so this model creates a new structure, from the legs to the wings and then the structure changes. More than 100 years ago that history was from model to structure, and the development of our computer was actually the same as in the history of computer, which was a new computational model, a model of execution, and then the development of this structure. I think we have big data on the structure of our people and the people who make the system where is this model? Then there is the structural impact.

With this as a basis we can start talking about the evolution of the engine core structure. I think there are three analysis techniques for large data engines. One is the technique of executing models and structures. The second system software technology. The third is the engine programming model and optimization technology, these three complementary and indispensable, do HPC this group of people have had painful experience and profound lessons, these three directions of development, in the big data engine above these three are also very important. My main focus today is execution model, an API that performs the definition of models, called the execution Model API, and then you develop a seamless connection between the definitions of the model so that it can achieve the goals you need. The latest view on this is that the execution model not only affects this layer of APIs, it also affects the relationships between other layers. So this thing is very important, what is execution Model? For example, 1948 summed up, that execution model live so many years, all our interfaces, all of our string operation interface in the software aspect of the hardware so long, we have been trying to use the entire field of successful experience in parallel operations and parallel system execution model, unfortunately today still not successful. Its data is not only the program itself generated and the program itself to determine the static determination of this information, but need to have dynamic data, what is called Dynamic Data? such as data from all sensors. You put the problem into a mathematical model and then you program it to take into account the large number of random transactions, Execution model access to the data, so that both of these data can make your system seamless integration. Execution model of data flow without this last year, the data inside the tube is resistant or not resistant, there is no imagination has a temperature of the resistance, this dependency relationship has no way to express. Originally 1970,1971 years, 1972, 1980, which proved that the execution model consistent integrity all these need to be tested again. I mean this thing cannot be forgotten, and it is very important for us to deal with historical experiences on a large scale.

What is innovation? Innovation means human accumulated knowledge don't forget, in the new environment how to adapt it to execution model This is an important part of the innovation. We have a lot of computer systems in the field of contradictions, we often very easy to forget the past, not intentional, is too much, every year chasing, see what next year, I quickly chase, not this time.

Below I use an animation to explain under execution Model, in implementation of the error where? This misunderstanding is the OS's role to misunderstand, my teacher is one of the OS initiator is well-known, he won the biggest prize last year, he two contributions, data flow is the second, the first is his operating system to do the contribution. This animation is Mechine Runtime Syelem. This is not the runtime Syelem, this has a lot of machine models to achieve it, hardware and execution model there will always be some potholes. For example, you want an operation on your execution model, but it's hardware, its instruction system, or its system structure does not directly feedback it, then you have to do a layer of software, its task is to fill the hole. This layer of software is not related to OS, the biggest mistake is to let the OS execute the software. If you pay attention to the recent three years, the United States major research, are emphasized Runtime Syelem and OS relationship, Runtime Syelem is execution model and OS relationship. Not that the OS is useless, but its mission is to work with runtime.

The system software is parallel multi-core, breaking the traditional OS control to break the integration of OS control, supporting High-performance high scalability, low energy consumption, flexibility, facing the fundamental challenges of space. The third item has this system, has the structure, certainly you have the programming model and the optimization technique, I only want to emphasize the current optimization technology concentrates in the static optimization method, our programming model and the optimization technology All is assumes, all must use the chip to do, the optimization also is doing in this. Including my own execution model to do some work is assumed to have chips on the hardware, but are very small scale, now is the most important is the dynamic scheduling, there are concurrent multiple management in runtime here. Professor Li Yonghui's speech this morning, he's the first one I heard it clearly. Even in the Internet fine granularity of monitoring, so that the overall plan into dynamic virtualization, this is actually the same thing, since the adjustment is based on this, the program itself to monitor their own.

Here are some examples of our work, only four example, the first example we set up a data stream for the background dynamic fine-grained multithreaded engine core technology base. The second supercomputer, assume the trap to absorb the degree of multithreading system software overall design and project implementation, successfully used in the world's leading use of nuclear chip technology of the sentence pattern computer (NE Total investment over 30 million usd,2004-2011). Third, the research and development of ultra parallel execution model. An important research topic of engine execution model that takes the background of large model data flow. Develop the hyper-parallel engine and assume the runtime based on the data stream. Research on the major development of system software.

The following preliminary illustration is compared to spark, which is the spark result is tested in China, so this HT is used as the data stream technology, not fully used, but its thought is the idea of data flow. In each test location you can see with the spark ratio, in the often used in a number of different groups of data, there are five sets of data, you can see is that the advantages of it is obvious, if you want to ask why, I am also here you will be able to ask. Not only that, the amount of storage is actually smaller than spark, 5 to 10 times times smaller.

Opportunities and challenges in China, a stone called his mountain can attack Jade, like our high-speed rail, the high-speed rail is not in my doing, high-speed rail in Europe, Japan do, others do good things we can learn from, but must cross its development, here I use red pen to hook out from China to create Chinese.

More highlights, please pay attention to the live topic 2014 China Large Data Technology Congress (BDTC), Sina Weibo @csdn cloud computing, subscribe to CSDN large data micro-signal.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.