October 25, 2012 cloud Computing Architect Summit held in Beijing. In recent years, it technology and the development of the Internet has affected the whole industry pattern, bringing new and fresh business model. In the face of these changes, this Conference invited the Baiyu industry elite for IT technology development and application of practical experience and other hot topics for in-depth discussion. Mr. Feng, director of the China Cloud Computing Innovation Center at Microsoft Asia-Pacific Research and Development Group, delivered a wonderful speech with the theme "New World of Big Data", the following is a transcript of the speech:
First of all, I am very happy to have this opportunity to share with you leaders and the IT industry as a new trend of it. Just now the president of the Sun said a word, it is the world of the chaotic, chaotic out of the hero. I think we all know that as the next generation of it trends as the new IT technology development of the Internet, we are indeed entering the new phase of it, at this stage, I think the most important is three major areas, the first is cloud computing, the second is the network of things, the third is large data.
I would like to emphasize that, in fact, from the cloud, it is the most important to the IT, bring the human three main advantages, one is its economy, one is its fast, one is that it can through the cloud to enable enterprises to achieve more innovation. Here, one of the most important is the integration of resources. We heard that there is a key word in cloud computing, resource pool, it is equal to all the resource data through the cloud computing technology, the concept of cloud computing can be integrated, where the data is very very important. Cloud computing We just heard it. Whether it is IBM's colleagues, Dr. Sun talked about a variety of applications, in fact, all applications, data is one of the main core, whether you are the infrastructure, regardless of your platform, regardless of your application, no data, you are an empty shelf. From the point of view of Internet of things, it is actually collecting all the attributes of the target, tracking, managing and analyzing these attributes. is the data. So, what I want to say in the opening paragraph, all of this is that data is the most important in cloud computing and Internet of things, and one of the main core of the next generation it trend. My main focus today is to focus on big data, probably from a few aspects, one is the concept of large data, what is big data, why is large data, now the importance of large data. Next, from the life cycle of large data, the main links, the main technology to achieve the big data to us it, to the human and enterprise value brought about by Microsoft in this regard, what kind of innovation, we have in the domestic and foreign cases, probably from these several aspects to introduce.
This slide has been played by Dr Xu just now, why are we entering a big data stage in this time period? I think the main reason is a lot of equipment, whether it is wireless devices, public Internet, including a variety of social networking sites, a variety of Internet applications to bring data expansion, including cloud computing, including hardware aspects. With these technologies, with these Internet platforms, as the device continues to mature and constantly expand, the resulting data will grow to a very large extent. At the same time, why do people say that the concept of large data? In fact, 10 years ago, 20 years ago, there was also a lot of data inflation, why do we now mention that large data is an important part of it trends? Because I want to be able to deal with the data more effectively, you also need the hardware, in the calculation, in storage, in all aspects of the good enough, and its cost can be reduced. As Dr Xu said just now, if 10 years ago, 20 years ago, 1G hard disk will cost a lot of money, we do not need to talk about big data. Cloud computing or large data, in fact, the most important to the interests of enterprises and Government is its economic value. From this point of view, I think it is now in such a field, by having such a phase, whether in hardware or in all aspects of the software.
Everyone may have a very common problem, what is the big data? From the big and the data these two words inside everybody may have an intuitive experience, first is the data, the second is big, this big representative what meaning? From several aspects, first, the so-called large data, the main reason is that with the development of it, with the maturity of the Internet, along with the maturity of the various data sources just mentioned, in the world now produces a variety of data, not only the size of its number, but also the variety of its variety, no longer like the traditional, A lot of structured data, now has a variety of unstructured data, which can be video, music, can be a file, can be various versions of the file. These things, with the existing technology, breed out all kinds of data, and it forms a "big" concept of large data, so this is not simply the size of the dimension, it represents the complexity of the data.
As you can see, through this picture, in fact, before the Internet, the main data source is the internal data source, each enterprise has its own IT center, has enterprise-oriented applications, through ERP and other kinds of data. The internet is a very big innovation, the Internet actually provides a huge platform for applications and data, and it breeds a wide variety of data on the Internet, and you now hear a word it consumes, the data that each of us touches, or your origin as a data is inexhaustible. Recently IDC has a statistic, so far, the amount of data in the world has reached 1000 zd,zd is a concept? ZD is equivalent to 1 billion of PD, and this data volume has expanded to very, very large stages. On this basis, at the same time, I would like to according to the IDC report, in the next 10 years, this data will continue to expand, may expand to 1000 zd dozens of times times, which brings the big data it will give us it, will bring to our business what value, this is actually the big data the most concerned about, is also the main core of large data. If I were to define big data, I would go through one of the graphs below, in fact, the big data is an industrial chain, which means that the data is now there, many may be free, there are more than 1000 ZD data, how do you use the data to explore its potential, must have an industrial chain, from data generation to data collection, To the storage of data, to the sending of data, to the processing of data, to the analysis of the data, in the end, through the analysis of the data you produce, how to use it to guide your business, and to formulate a better policy and policies of enterprises, so it is an industrial chain concept, in every link is indispensable.
With the trend of large data and the status of large data, it does bring many problems to the enterprise, including some challenges, big data is there, I also know its value, then how do I go through the big data to discover it to my enterprise can make better policy and policy, how to better deal with real-time data, through the industry chain I just talked about , collecting, storing, processing and analyzing all aspects, and finally how to analyze with better tools, I visualize these data. These are the challenges and innovations facing the enterprise.
Big data is really a lot of opportunities for businesses, for the world, for it, including the government. One of these is that for business decisions, any decision of an enterprise, he through what to ensure that he can make better policy and policies, I think this thing can not just pat the head, according to some solid evidence, now have such a good data resources, how to better use, this is very important. As you can see, nearly 50% of the leaders of every enterprise in the world know how to apply large data to make enterprise planning and strategy. At the same time, big data is an industrial chain, this industry chain is not only the concept of virtual, it also led to not only in the software, IT services, software development, including hardware, I just said, large data can not be separated from the basis of hardware, computing, storage, but also led to the entire IT business development.
Just said some concepts, next I will be from the life cycle of large data, we Microsoft think the main three stages, to give you some specific introduction. These three life cycles, first of all, you have to get these data collected in what way. Second, the data to be stored, what kind of storage technology, to ensure its security, its continuity, sustainability, scalability. On this basis, when you have the data, how to deal with the data, to enrich, to meet the requirements of your business. Finally, how to do the analysis on this basis, through the data that you have already handled, already integrated, and cleaned up, to do the format of the presentation. So, these three I think it is very important to realize the value of large data.
Speaking of storage, large data storage requirements are very very high, I think from a simple technical level, you can save a large amount of data, your data security, not discontinuous, sustainable, extensibility, is the basic technology of these data storage. On the other side, we will face another problem, so now you talk about cloud computing, in general, in an IT application, there are only a few ways to store this data, one is the traditional it way, that is, every enterprise mail its own data center, this data center can be a traditional physical machine model , not using cloud computing technology, can also be a private cloud, and as Dr. Xu also said, Microsoft has its own public cloud, not only Microsoft, as well as other companies in the industry have a public cloud of this model, I can also through the public cloud to store management of my data. What's the difference? I would like to sum up for you, a larger difference, because Dr Xu has also continued to emphasize security, I think the public cloud model, in fact, security, including the speed of the network may be a private cloud and your own data center, is a factor you need to consider. Because once you're on the public cloud, once you're open, once you pass a lot of network, through the net will be very inevitable, especially the data is very large, such as large data, such as large data volume, a variety of complex data, although can help you to save the cost of management and operation, you do not need to manage the operating data center, But there are also its ills. On the other hand, if it's a private cloud within your intranet, or a traditional data center, you can keep it safe by your own control to improve your efficiency, especially in data centers.
From this perspective, a hybrid cloud may be the best way to do it in a mixed mode. In other words, for large data processing, depending on the business characteristics of your business, you can use the advantages of the public cloud in terms of what applications, what data needs to be in the private cloud, and what needs to be placed on the public cloud, so that for the enterprise, the end is to manage the storage of large data in a private cloud way.
From the storage point of view, I would like to go back to the core of the technology, the main there are two points, first, relational data center, second, non-relational data center. For example, Microsoft SQL Server is not simple for structured data, a variety of data types, SQL Server has the ability to process storage, and even customize their own data types, some data types can be provided by the technology to customize, and because it is a relational database, For the large data, complex data, I can use a variety of techniques to query, index, such as the full text of the query, the unstructured properties of the query, can be provided by the various technology to manage, store, to better use the data.
Another point, called the non-relational data storage technology, what is the difference between it and the relationship type? The main point is that I'm just focusing on reading data, and relational data storage is very complex, for non-relational data, such as large video, large audio, you are more concerned about my data how to store, how to take it out, do some simple properties of the query, do not need too complex relational type of operation. The benefits of SQL Server, in terms of performance, efficiency, and simplicity of operation, have an advantage over relational databases. But from this point of view, I think there are two different technologies. From Microsoft, SQL Server is Microsoft's big data platform, especially in terms of storage.
To talk about Hadoop, in fact, the main feature of Hadoop is that it provides a world it-leading technology, how to handle all kinds of data through this parallel technology, the space that is stored by Hadoop itself, or the storage integration with Microsoft, But most of it is the world's leading data processing technology, 1000 ZD data How to deal with it, how to use the existing hardware resources to parallel processing, to achieve intelligent processing, Hadoop is undoubtedly now a very leading IT technology. From Microsoft's perspective, we use the platform of large data with the integration of Hadoop, better to provide users with this flexible and convenient technology.
Let me give you a few examples, through our Active Directory with Hadoop integration, to better improve the data security Hadoop, through cloud computing, the main management tools with the integration of Hadoop, with Hadoop-oriented large data processing to provide a set of the most advanced and most flexible management. With SQL Server, its storage, its business intelligence, and the integration of Hadoop can better present the data that Hadoop has done. So, our integration with Hadoop is also done incisively and vividly, at the same time from Microsoft's point of view, we also provide a Kaiyuan, more open platform, with the world it leading technology has a good combination.
(Responsible editor: The good of the Legacy)