The first China cloud Computing conference was opened in June 2013 5-7th at the Beijing National Convention Center. The Conference, with an international perspective, insights into global cloud computing trends, and from the application, explore cloud computing and large data, cloud computing and mobile Internet, cloud security and cloud computing industry applications, such as the focus of the topic. In particular, the Conference has set up a cloud computing services display area, exchange the latest research results of international cloud computing, showcase domestic cloud computing pilot City development achievements, share cloud computing development experience, promote global cloud computing innovation cooperation.
Wave group system software director, Cloud Computing product Department general manager Zhang
On the second day of the fifth session of the cloud Computing Conference, Wave group system software director, Cloud Computing products division general Manager Zhang brought us a "big data age, challenges and Solutions," the theme of the lecture, he on the large-scale data processing encountered in the deep analysis of the problems, and share the tide of the solution.
Zhang points out that while large data and cloud computing have evolved over the years, however, with the increase of data scale, there are still many problems in the transformation of traditional data center to cloud computing data center, and the data center is the fundamental of high performance computing, so it is unavoidable to become another threshold for the transformation of traditional application to large data technology. During the Zhang, the most important point of "safety and usability" in the calculation model is analyzed deeply, and the practice of the wave is shared. Then zhang the data center modularization and resource scheduling, and finally the actual use cases were shared.
The following is a live record:
Let's start with cloud computing, because today's theme is the cloud computing conference, and this year's theme has become the "big data broadband" to drive the application and innovation of cloud computing. Our understanding of the evolution of cloud computing: cloud computing is the gathering of resources to provide services. It's more of a concern before that. Gather resources, distribute resources into a data center, and slowly think about how to do it. And then how do you make the aggregated data play a bigger role? So now the cloud computing has entered a new state of development, is the era of large data.
What is the industry cloud and build
The emergence of large data is no more late than the concept of cloud computing, or the many large data technologies we now see, including many of its application patterns, are no more late than the concept of cloud computing. Wave before this we put forward a concept called industry cloud, we want to promote industry cloud in China's development, promote the application of different industries in China's development. When we present the concept of industry cloud, we think that data is a very important point in the transformation of cloud applications throughout the industry.
Because the whole information technology is the data processing technology, we should collect the data, process it into information, transform it into knowledge, and finally influence the decision. How do I make the data ultimately a service? This is the ultimate goal of cloud computing. It's just a means to put resources together and business.
In this front through some information construction process, we gathered a lot of data. How do you deal with it now? Because of the many new problems in data processing, some new challenges, performance and scalability challenges, a variety of data type fusion challenges, data storage and processing of cost-effective. Before maybe we also do data analysis, do a lot of data mining, such as to use Data Warehouse or more high-end software, high cost. And the need for a final mass response. These days many experts have talked about big data, I believe that the background of large data users have heard a lot. How do we address these challenges to the problems we face? How to make the technology of large data better for the industry application service? Here is a specific look at what we think is now large data from a mature technology to go to the process of application of some of the problems faced.
The challenge of large data technology to practice
Now we are promoting some of the technology to not say whether it is able to completely solve all the problems, at least we have to promote some of the technology is not entirely original. Many are already in many internet companies, research institutions, universities, institutions have been used for several years of technology, many frameworks have been mature. But how do these technologies go to ordinary industry users? Industry cloud is a very important position for the future development of cloud computing in China, which solves the problem of the most information in China. For example, the government, public security, industry and commerce, taxation, people's livelihood and so on, they have a lot of data need to be processed. However, many of the existing large data technology application threshold we feel is still somewhat high, some users may also try to take their own, may toss a half-day also did not toss out. We took the existing software, one of the existing platforms to go over, its performance is not to achieve the best? How to switch the original business, this is a problem, which requires professional companies and institutions to provide mature products and solutions. Help users to better use these new technologies to solve the problem he now faces.
Secondly, we believe that the success of large data applications can not open a few elements: one is to have a good platform, one is to have a good application, the last must have data. The last thing the IT people do is to do the platform, we collect the data to push the user this is very easy. From the wave point of view we have our own large data platform.
The biggest challenges for users to actually use the environment are:
1. Data collection. We think that the Public security Bureau should be a strong department, a lot of data integration should be done well, but the actual understanding of their situation is not optimistic, they have different departments, different links generated data is still dispersed, how to let data aggregation up? Can business data break down existing barriers? This is also a big data application needs to help him to do planning and finishing work.
2. Application of the whole data service. We look back at a lot of data analysis models are not difficult, the model is often very simple. But the generation of these models, the continuous optimization of these models requires a long process. Many users themselves may not be doing math themselves, and no one who does math helps him do it. So how do you make these applications good? How can these models be optimized? Can his data play a bigger role? For example, he collects a lot of video information. For example, there are many cameras on the street, if you can quickly find inside the abnormal situation? This is also the need for professional institutions to support.
3. From the application of large data, the total still need to use the device to the computer, use storage. These devices are always going to be in the data center, and as the scale of the data increases, we see applications that specialize in data processing, including high-performance applications, are processing data. It is still growing in size, and traditional data centers are facing a lot of problems when they are transformed into new data centers. There are a lot of problems with the data center building of cloud computing that still exists. Higher energy consumption and more complex management. In the new situation of this kind of large data processing, it is necessary to transform the requirements of the equipment and storage to the datacenter.
For the above mentioned several problems, from technology to user applications there is such a threshold, so that our industry users how to better use these technologies? We have to solve the premise of the platform to help him to do data planning? How do you plan your application? We have put forward the tide big data solves the way. This is our big data for the application of the introduction of large data integration solutions. Its main features:
software and hardware integrated innovative data processing platform for different applications serialization of product safety considerations
Now a lot of technology is not very new technology, large data from the bottom of the distributed system, to the middle of the algorithm, to the above distributed database and data mining, many things have been quite mature, and may have been in many areas have been applied. For ordinary users how to integrate this thing? This requires an integrated solution. Get such a device into your environment, put it in the room and apply it immediately. This is the integration, wave angle, our most important work or hardware level. What hardware is suitable for large data processing? Many people tend to buy a common server or storage server, more with the hard disk, with more memory, this is not suitable for large data processing? Wave large data processing integrated machine, we are at the hardware level is also for the data processing requirements and storage requirements for such optimization:
First, the first link is data storage. As a large data platform we first save the data. If you can save this data better and faster. There are a number of technologies, including global load balancing technology, dynamically coded multiple replicas, using multiple-step loops, lifting storage speeds, and more.
Second, the cluster. We know that the large data processing platform is still a cluster, in the cluster calculation, computing, storage and network three links tightly together, consider any one of the optimization of each link is indispensable. How do you make the transmission efficiency or data exchange efficiency higher between nodes? We propose a large data interchange chip, how to make the data exchange between the different nodes more efficient? We have communicated with the user before, of course, simple to do a text request may not be so high. But now the big data has slowly expanded to the traditional High-performance computing field, processing more data than we now see in many applications. But for the different data exchange requirements are very high, the data need to be in the inside Non-stop, not only based on network mode is also the key to Ascension.
The concept of large data
The second talk of the concept, now speaking of large data processing seems to be a single thing. We drew a graph of the characteristics of large data and extracted several features. How does a data analyze its application characteristics? To see its total data, which is the first requirement of large data. However, the amount of data can only represent the size of the data may be large, may be small, but do not think that the large amount of data must be very difficult. For example, the whole Chinese people each one task to do, each task is very small. But to see what you do is structured or unstructured, how high is the degree of coupling between people. The other is to update the model, you need banking or trading business constantly do transaction processing, in the original data update or keep the original data constantly superimposed. Another is response processing, such as whether I deal with it once a day or how many Hao seconds to submit a request to return.
Total, updating, and processing of data
From these angles, different data have different characteristics. We probably separated this, drew three laps, not necessarily very strict. From the perspective of our product platform, how to introduce different things to meet different application needs. The innermost circle is the one that everybody sees very much, is the most traditional database application, the application of the bank, the application of civil aviation, similar to the application of third party payment. It is still the primary part of the database, and many of the existing distributed technologies are doing very little here. The most out of this circle is the data, although the scale is large, but is loose, can be fully distributed processing. The traditional high-performance data can be zoned here. And the middle layer is the text of the search data mining a lot of data can be in the middle of this layer.
For different levels or tight coupling of data, there may be loose coupling, may be distributed, and some are not distributed this requires a different device corresponding.
In other words, as large data applications, many people are still staring at text retrieval, image analysis more. But in fact many of the core database applications are still in this direction. So how to have a good platform to meet its needs, the real integration of unstructured, structured data? This is the most powerful device hybrid structure, which can run a database, but also can run a new type of database. This is a large amount of data, or a small amount of data computing capacity requirements.
Safe and reliable-metadata high availability:
Some of our users have come up with the original data scattered in different departments of different units, now ask them to hand over these data, they do not seem to have much opinion. But with this data, there are 3 problems.
Is it more reliable and safer to put it in your place? Will anyone see it in your place?
There are a lot of big data platforms available, and many of the concepts we're talking about here are not new. Includes guaranteed availability through metadata-highly available methods, including access control encryption to ensure control. But the current large data processing platform, this part from the design is not very biased, this is now a large data platform to do a lot of work, will we in the traditional concept of a lot of operating systems, such as different levels of control means, including encryption means to get a large data platform, strengthen the large data platform, Meet the user requirements for data aggregation protection. Includes the high availability of the metadata layer, including some of our hardware-based encryption and storage, and access control, which uses the security technology inside the operating system. Including our software for backup software, offsite redundancy, based on existing large data platforms.
Large Data application development platform:
Now you use a large data platform to meet the trouble is the real industry application of the people know Hadoop very few, understand the new development structure of the few people, many applications are still tied to the original platform, and even the interface is not willing to change. It's also a big drag that's hindering our ability to push the technology from many new areas to the original industry. We are now trying to do some work, including how to get the original business program to go seamlessly into the new platform. In fact, we have used many interfaces to write programs are clear, but these interfaces are often in the new platform does not, this is the first level.
The second level, a lot of data really to a new platform, let him use MPI, now change a train of thought, is very troublesome, this how to provide support at the technical level? On the one hand we want to do some work, and this is also a new application we can promote the important point, I hope that we do together.
So this part of the wave we provide users with professional services. How to do data analysis from the early stage, classify the data, and then divide the data into different types according to the complexity, scale, transaction mode, operation mode and final response requirement. For different data to provide you with different models, solutions, with our hardware or software to eventually do the application.
Another very important thing is to help him do data modeling. Data modeling This is not just about doing computer things, it's about doing math for people. But how can computers plan and rearrange their previous data for a wide variety of new applications? To tell him to put me on the platform for better processing now, adjust your original table and data structure, based on this model, how to use your data better?
The big data has to be put in the data center at the end. And the scale is also very large, we have seen some traditional to do high-performance, data processing, the room size will be very large. Data center development from the previous mainframe era to the PC, the Internet, to the current era of cloud computing, cloud computing data center requirements are increasingly high, requiring us to the whole room is a green, energy-efficient, able to manage efficiently. Of course there are some other professional security and reliable requirements. Wave brings up our industry cloud data Center solution of several key words: modular, professional, intelligent, safe and reliable. We ensure that the data center is an efficient, flexible and reliable data center through our several levels.
Today I would simply introduce you to two levels:
1. The module constructs
This two-year modularity has been a lot of talk about the modular concept from the Internet including Google, Microsoft, including several internet companies in the country. But how to better promote this concept to ordinary users? Wave many users are doing high-performance, they still use the traditional method to build the room, how to do this thing better? We need to keep talking about this concept. Modularity first is the centralization of functionality. The original traditional one of a machine, we have their own power supply, their own fans and cooling system concentration, concentration will bring economies of scale, will make the overall space utilization, overall energy utilization will be better to reduce the overall consumption, reduce our footprint.
At the same time, through the standard design can be stuffed into the module of all the things a full range of products can be placed into the modular data center. Each machine is a small module, each cabinet is a medium module, each data center is a large module. Through our automated management monitoring the entire room inside the energy consumption, including heat dissipation, improve efficiency.
2. Operations Management
As an operational management, the most important thing in cloud computing data center management is how to dispatch resources. This is still needed in the large data age. We are still having trouble deploying a new data processing system. How to combine with the deployment technology of cloud computing, management technology or not? Also let large data processing platform, such as the Hadoop Platform column database processing platform to achieve on-demand combination, on demand, and the concept of cloud computing of course not necessarily with virtualization, can be on the physical machine on the rapid deployment and application switching, to achieve better sharing of resources.
Through our software level tuning, not only satisfies the hardware better large data platform on the foundation, provides some software performance tuning, including the storage distribution algorithm, in the Task scheduling algorithm optimization, enhances the whole large data platform overall performance. Resource management is also through our integrated resource management, monitoring the entire large data platform running state.
Success Stories:
Jinan Public Security Example: Jinan public Security in the traditional industry, the public security industry is a very large amount of data accumulation. Our period to Jinan public security to provide more than 2 p space, because of storage and processing technology limitations, many things may be put away for some time, because can not put, and then a large amount of saving to deal with a lot of trouble. They are using our new large data platform to solve the problems of previous data isolation integration.