Today, almost no one will doubt the value of large data, people are more concerned about how to really put large data into the application, really play its value. But despite the industry's exploration, it is still in its infancy to objectively see the overall development of large data. In other words, the recognition of large data concepts does not mean that large data can be exploited. Especially in China, although the large data has a good industrial base and development prospects, but the low level of data resources, the protection of data assets and other practical problems are currently facing the main challenges. How to deal with these challenges, drive large data fast to universal application? The "Big Data White Paper" published by the Ministry of Communications, the Institute of Technology, has given a unique answer to the above questions. --Editor
Effective application Mode not found
At present, large data has not yet formed a general application situation. The reason is that most enterprises, especially the traditional enterprises, have not found an effective application model.
--The application of the idea is faster than the data
Data is an asset. This round of big data wave, make big data idea rapid popularization. Although much of the data is not yet available, many companies have used it as an asset to plan, store, or develop their data, or to actively look for buyers, or collaborators.
Telecom operators are most likely to become typical data asset operators. Telecom operators have a wealth of user identity data, voice data, video data, traffic data and location data, the mass of data, pluralism and real-time to make it has the inherent advantage of operating large data. At present, the major telecom operators have been actively exploring the development of their internal large data resources, but from the current application development, telecom operators of large data is still mainly used to support internal customer churn analysis, marketing analysis and network optimization analysis, external application model has not yet formed.
--Large data application is distributed
At present, large data applications have not formed prairie, mainly focused on the internet marketing scene. Although finance, telecommunications, retailing, manufacturing, medical, transportation, logistics, IT and other industries to large data applications show great enthusiasm, but currently in the media and various forums open large data application cases are still very fragmented, which shows that although everyone is very concerned about large data, but to promote practical applications still have some difficulties. The only area where many companies have launched or adopted large data applications is internet-based marketing, companies that have used large data in this area include not only large internet companies, many specialized small and medium sized internet companies, but also offline companies working with internet companies to actively develop the value of this area.
From the view of data sources, the application of large data is still in the era of self-sufficient "small-scale peasant economy". The main reason for the internal data is the openness of data and the fact that the trade has not formed the mainstream form of the market. Taking the domestic main E-commerce trading platform as an example, although many large data applications are currently being introduced, these applications are basically confined to the interior. Due to the unsound legal and data trading mechanisms, these trading platforms remain cautious in opening up and trading data. A Gartner survey shows that, even in the global, the main feature of large data application is the internal data mainly, the most applications are still the enterprise internal transaction data (the proportion of applications is more than 50%, most of the industry application ratio of more than 80%) and log data.
From a technical point of view, the large data is still mainly primary applications, most applications still use traditional analysis processes and tools, but only to expand the source of data, increased the number. The research found that, compared with the traditional data analysis, although the new large data applications began to use unstructured data, but in the actual application process, these unstructured data is only compressed, clean and structured, into the traditional ETL and analysis process. Other large data applications use cloud storage and cloud processing technology to improve the efficiency of data processing, thus increasing the scale of data processing, but these applications still use the original ETL and analysis process. The lack of innovation in application mode makes the current large data application remain in the primary technology stage.
From the application effect, the current large data applications to continue to improve the existing business and products mainly, breakthrough innovative applications are still rare. The most common Internet marketing data application For example, before the rise of large data, precision marketing and personalized recommendation has been the direction of enterprise marketing activities, the rise of emerging data sources and large data technology to further improve their marketing skills, so that their precision marketing ability to further enhance, But this is only to the enterprise old marketing ability improvement. At present, we are talking about more breakthrough innovation such as online small loan business, this business completely changed the past financial institutions lending process, credit evaluation and control of risk, thus greatly reducing the cost of loans, expanding the scope of the loan. But such groundbreaking innovations are rare. Gartner's survey shows that the main purpose of investing large numbers of companies is to improve customer service, process optimization, precision marketing and cost cutting, and that the direction of new product/new business models is not the main purpose of the business.
Different Chinese troubles
At present, large data in the global development is still in the early stage, technology, systems, concepts and so on need to change. In China's case, the data resources are not rich, the technology gap is big and the laws and regulations are not perfect is the current big data development faces the unique problem.
--The data source is not rich enough, the data opening degree is low
Rich high-quality data resources are the prerequisite for the development of large data industry. In recent years, with the rapid development of Internet industry and finance and telecommunication informationization, the total amount of data resources in China has been increasing rapidly, which has reached 13% of the world, but other industries are restricted by information level, and the data reserves are still not rich. There are still some data resources, such as standardization, accuracy, low level of integrity, and high utilization value. At the same time, China's government, enterprises and industries in the construction of information systems are subject to various factors, forming a number of "islands of Information", the degree of data openness is seriously lagging behind. To establish a benign development of data resources storage and sharing system is the primary issue of China's large data development.
Low technical level and poor technology diffusion
China's large data technology development model is similar to the global, the Internet enterprise has the ability to quickly integrate the international advanced Open source large data technology into its own system, and constructs the large-scale system of the single cluster tens of thousands of nodes, but still lacks the original technology, the contribution to the open source community is insufficient, and the influence to the frontier technology route is relatively weak. At the same time, as the local open source community and other industrial organization development lag, domestic leading enterprises in large data technology innovation is difficult to spread to the society.
--The relevant laws and regulations need to be further perfected
As large data mining analysis will become more and more accurate, the application domain is expanding, personal privacy protection and data security become very urgent. In terms of privacy protection, the existing legal system faces two challenges: one is the privacy of the law, which is embodied as "personally identifiable information (PII)", but as technology advances, data that has not previously been PII may become PII, making the scope of protection blurred. The other is that the personal information protection system based on the principles of "clear purpose, prior consent and use restriction" has become more and more difficult to operate in large data scenarios. However, the laws and regulations of personal information protection and data Cross-border flow in China are not perfect, which has become one of the important reasons restricting the healthy development of large data industry. We need to combine the actual situation of our country's rule of law and explore the defects of imperfect legal system through the way of industry self-discipline.
Measures and developing misunderstanding
For the development of China's big data industry, first of all, we need to define strategic objectives and strategic priorities, plan large data application, key technology research and development, industry cultivation, data opening and data protection, market supervision, laws and regulations and other key layout, guide the development direction of large data around the country, avoid the blind development of herd.
In the application of large data, the first is the application of government affairs and Public service, which should focus on improving the service of people's livelihood and urban governance, and actively promote the application of large data integration and integration in the key fields of environmental protection, medical care, education and transportation, and further improve the efficiency of government affairs and public Second, market-oriented application, should focus on the large-scale data applications across the industry to promote policy, promoting the Internet, telecommunications, finance and other enterprises and other industries to carry out large data fusion and application innovation, and promote the whole society to deepen the application of large data.
In the technical innovation, one is to strengthen the direction of large data technology research and development of the forward-looking and systematic, the recent focus on the support of in-depth learning and artificial intelligence, real-time large data processing, mass storage management, interactive data visualization and application-related analysis techniques. The second is to gather research and research with force to form a joint effort in large data platform-level software to achieve breakthroughs, as the core development of open source ecology. Third, innovative research projects to support the way, the open source and open standards as a test indicator, through the direct subsidy or post subsidy mode to encourage enterprises and research institutions to participate in open source technology development, promote the proliferation of large data technology.
In the opening of government data, it is proposed to promote the census of data resources in government and public utilities, and to make a checklist of security and privacy protection in the open government and common data in accordance with relevant laws and regulations, and to strictly control the risk points that may involve national security and citizen's privacy. Based on this, the government and public data are classified according to sensitivity, the open priority is established, and a step-by-step data opening roadmap is developed. At the same time, the Government should actively standardize and guide the commercialization of large data transactions to create favorable conditions for the flow of data resources.
In the protection of personal information, some international institutions should focus on the regulation "From data collection link to data use link". We should pay close attention to the evolution trend of the international legislative idea, combine the technology development trend and our country condition to carry on the prospective research to the related system. At the same time, in order to solve the urgent needs of personal information and data protection, we can rely on the industry organization to summarize the best practice in the industry, gradually form the industry consensus, and promote the implementation of the Standard or laws and regulations after the pilot matures, and escort the healthy development of large data.
Original link: http://www.cnii.com.cn/industry/2014-05/20/content_1365298.htm