Recently, I returned to attend some university seminars, government departments of the symposium and the training activities of enterprises, the theme is large data, nowadays, the major domestic newspapers and magazines are also discussing this hot topic, but I found that the Chinese society on the "big data" concept still some inaccurate understanding or even the concept of misunderstanding, In particular, the strategic significance of large data at the national level is underestimated and needs to be deepened.
Small data to large data
"Large Data" is a new wave of technology, but also gradually formed a historical phenomenon, the specific point is that with the increase of information storage, human beings gradually realize that through the opening, integration and analysis of data, new knowledge can be discovered and new value created, thus bringing "big technology", "big Profit", "Big intelligence" and " Big development "and other new opportunities. The idea of large data dates back to the age of 1980, but the word "data" is different from our traditional understanding.
The traditional meaning of "data" refers to the "based on the number", but after entering the information age, "data" the meaning of the word is expanding, it not only refers to the "number", but also collectively all the information stored in the computer, including text, sound, video and so on. More importantly, with the progress of information technology, its number in the explosion, especially after the emergence of new media, data collection, preservation, maintenance, use and other tasks, has become a cross-cutting phenomenon and challenges.
The "big" of large data is not the "large capacity" of its surface, but its potential "great value". There are many examples to show that we can find great value from the small data we have in the past because of the emergence of new tools. The United States, for example, maps more than 20 of years of crime data and traffic accident data to the same map, and is surprised to find a high degree of overlap, both in the high areas of traffic accidents and criminal activity, and in the frequency of both. This led to a joint U.S. Highway Safety Service with the Justice Department, and the Traffic accident rate and crime rates both fell, through the "black spots" of the common-governance data. For example, some scholars have recently made an electronic record of the White House's more than 200-year president's laundry, and then analysed it, drawing some new conclusions. These data are all authentic small data. This shows that small data as long as there is a certain amount of time on the vertical accumulation, in the horizontal has a detailed record granularity, and other data integration, can produce great value. From this point of view, large data can also be understood as "holographic" data for an object in two dimensions of space-time. This "holographic", in the era of large data also expressed as "multi-source", that is, there are multiple sources in different directions on the same object data records, the data between each other.
In addition, the traditional small data still occupy the absolute beginning from the distribution of global data technology investment. According to the International Data Group (IDG), 2012, the global investment in small data analysis tools for 34.9 billion U.S. dollars, and the Big Data analysis tool Hadoop investment is only 130 million U.S. dollars, less than 1% of the former. IDG concludes that traditional small data software meets 95% of the needs of businesses and organizations. At present, the latest trend of industry development, is "big", "small" data analysis tools tend to integration and in the "cloud" migration.
The strategic significance of large data
The significance of large data is far from being confined to the stories of "beer and diapers", which we now relish in many news reports, through data mining and precision marketing. In fact, data mining is not at the forefront of the big data field, but instead is machine learning. Data mining refers to the automatic analysis of a large number of data through a specific algorithm, which reveals the hidden historical laws and future trends of the data, and provides a reference for decision-makers. The rising machine learning, by virtue of the computer algorithm, however, compared with the data mining, the algorithm is not fixed, but with a self-tuning parameter, that is, it can with the calculation, the number of operations, that is, by giving the machine "feed" data, so that the machine like people through learning to improve self-improvement, Make the mining and prediction function more accurate. This is why the technology is named "Machine learning". This is also the fundamental reason why big data is called a revolutionary phenomenon, because in essence it signifies that our human society is moving rapidly from the information age to the age of intelligence through knowledge.
One or two examples can be cited to illustrate the impact of large data on social formation and the importance of national strategies.
This year, a wave of online education is sweeping the U.S. education sector, a new intelligent learning platform is becoming the focus of innovation and investment in High-tech areas, many of which have been the initial success. Coursera, a renowned online education company, has reached agreements with more than 30 universities around the world, including Princeton, Berkeley, Duke and Hong Kong Polytechnic, to open courses free of charge through its platform. These schools now have courses that allow hundreds of thousands of of people around the world to learn simultaneously. Learners around the world can not only listen to the same teacher at the same time, but also do the same homework, receive the same grades and exams as students do. Some schools have seen the value and potential of this intelligent learning platform and even started to invest in building their own independent platform, and in May 2012 Harvard and MIT announced that it would invest 60 million of billions of dollars to develop a similar platform and be open to the world free of charge.
The rise of this learning platform has aroused widespread concern and intense discussion in the United States. The reason is because the platform is not a lens, a video so simple, but the learner's learning behavior can be automatically prompted, induced and evaluated, thus making up for the lack of teacher face-to-face communication guidance. For example, by recording the mouse click, the computer can record your stay on a slide time, judge you in the wrong answer after a review, found that different people to different knowledge points of different responses, so summed up what knowledge points need to repeat or emphasize, Which type of presentation or learning tool is most effective in which case.
It's not hard to see that the platform is strong because of big data. The data of individual learning behavior seems to be disorganized, but when the data accumulates to a certain extent, the behavior of the group presents an order and law on the data. By collecting and analyzing a large amount of data, we can sum up the order and law, and then provide targeted help to different learners. Harvard University and MIT are free to open their learning platform to the world, the purpose is also want to let more learners to study, to collect more data, with data, they can study the behavior pattern of the world's learners, and then build a better intelligent learning platform.
Data-driven intelligent age
The previous example shows that data is becoming the foundation of the Organization's wealth and innovation, and that big data is indeed creating a smarter society. So how do we understand the smart society we are moving towards?
The key to understanding this problem is that, whether it is information, knowledge or intelligence, in our time, it is the existence of data as a carrier. Data is a record of the objective world, and when we give the data to the background, it becomes information, information is the source of knowledge, when the information is refined out of the law, it rises to knowledge, knowledge is the basis of intelligence, when the computer, the network can use a certain knowledge to make automatic discrimination, take action for the human service, Machine intelligence is produced. At present, the scope of the world around the human record is expanding, in the past, we decided to record what, now and in the future, we have to enter a decision not to record what era, and the increasing ability of data analysis, which will accelerate our progress towards the intelligent age. The characteristic of the intelligent age is that ubiquitous computers and networks will work and serve human beings like intelligent people. In other words, more and more work will be replaced by computers or robots. In addition, due to accurate calculation and prediction, the whole society can be like countless large and small gear bearings, interlocking, teeth matching, day-to-day management through data optimization, various tasks, cooperation can be seamless docking, the cost of social operation can be greatly reduced.
Back to the example above, it is not hard to imagine how this intelligent learning platform will affect the education industry. Schools have been the most important educational resources, good schools are extremely scarce, because of the popularity of this intelligent platform, in the near future, elite schools will be everyone, that is, if properly handled, China's lack of education resources will soon be effectively alleviated. For individuals, learning and lifelong learning will be possible, for example, high school students can try college courses, leave the campus, or log on to the online platform to attend classes with the students. These are the dreams that educators have been exploring for years. But the flip side of the coin is that China's education industry faces more intense global competition and challenges. In the past, students were competing for schools, and in the future, schools could compete for students around the world. The first-class universities in developed countries will squeeze the survival and development of ordinary universities in developing countries, how should ordinary universities attract students? Will they decline? Since the best teaching video and other learning resources can be obtained free of charge, the role of teachers need not be adjusted? How to adjust? These problems are the major challenges of the big data age.
The Intelligent learning platform is just a spray of big data tide in the education field. It is no exaggeration to say that large data will affect all aspects of human social development, optimize the transformation of every industry, its role is difficult to limit. Let's take another popular term, "smart city", for example. In recent years, there has been a wave of building intelligent cities both at home and abroad. According to Guo Wei, chairman of China Digital Board, a leading company in the domestic wisdom City: At present, more than 60 cities in China have built intelligent cities into the "Twelve-Five" plan, he believes that smart city will become the main driving force for sustainable development of China's economy. But from a higher point of view, the construction of intelligent city is in fact a big problem of the comprehensive data management of a city: first, it is to collect data in places where data is not collected before, this is mainly to use the technology of Internet of things; the second is to make the data of different systems effectively docking, which is the task of system integration; We also use the technology of data visualization to reveal and show the hidden knowledge in the massive data, so that the intelligence in the data can flow to the city's managers, decision-makers and the public in an intuitive way. In other words, the data collection, integration, analysis, display is the core of intelligent city, the future of intelligent cities, will be data-driven cities, and large data is equivalent to the brain of the city of Wisdom. Guo Wei also pointed out that the construction of intelligent city is in the use of information technology to solve the problem of social governance, improve the people's happiness Index, which proves that the application of large data and value, is not only in the business area so simple.
In addition to advancing social formation, accelerating corporate innovation and leading a new economic boom, I also pointed out in the book Big Data: The coming data revolution that through open data, large data can also be a sharp weapon to start a transparent government. This is a real reality for China today. And precisely because of these strategic considerations, in March 2012, the U.S. federal government announced a huge investment in the launch of large data research and development task, and the large data mentioned in the history of the Internet, supercomputers as high as the national strategy.
What the government needs to do
First, the government agencies, industry organizations and large enterprises to establish a special data governance institutions to co-ordinate the work of data governance, such as the Data Governance Committee, the large Data management Bureau, the focus of data governance is the consistency of data definition and the quality of data. In a large data age, the integration of data between different systems, and the need for a unified metadata definition, is a challenge not only for China but for the world at the moment. Each field and industry data standards are well established, will have a multiplier effect. In the case of a single enterprise, realize that the future competition is the knowledge production rate rather than the labor productivity competition, the data analysis produces the value possibly to be more fragmented, distributes in the commercial process each link, the data mining investment return also has the uncertainty, but the enterprise leader must have the vision, the data management work together as soon as possible, To enhance the competitiveness of enterprises in the large data age to prepare. In addition, the head of the data governance organization should be the senior leadership of the organization, otherwise the standard cannot advance to the overall situation, nor can it improve the whole industry or organization.
Second, Open data. The key to data increment is integration, but the prerequisite of free integration is the openness of data. Open data refers to the original data and its related metadata in a downloadable electronic format on the Internet, so that other parties free to use. Open data and open data are two different concepts, public is information level, is a piece of, open is the database level, is a piece of. Openness does not necessarily mean free, corporate data can be opened in the form of fees. Openness also has layers that can be open to a group, to an organization, to society as a whole. In the era of large data, the significance of open data is not only to satisfy citizens ' right to know, but also to allow the most important data of data generation to flow freely in order to create innovation, promote the development of knowledge economy and network economy, and promote the economic growth of China from extensive to fine type transformation and upgrading.
Third, to encourage and support data based innovation and entrepreneurship. The traditional method of policy support may be to set up a large data industrial park with the government as the leading position, and to provide convenient or cash support for the start-up companies, which is effective, but the more effective way is to mobilize the power of the whole society. For example, grants support large data open source communities, the construction of civil organizations such as the Association of programmers, through the support of similar civil society, quickly promote the spread and popularization of new technologies and new ideas in the whole society; For example, an application development contest, based on Open data, is held to solicit the views of the whole society on data use and innovation, Organizers can be the government, can also be enterprises, to come up with a certain amount of funds to reward the best applications, inspire the creative power of the private sector.
Four, we must promote the data culture in the whole society. Data culture is a culture of respecting facts, advocating rationality and emphasizing precision. To acknowledge that, looking back to history, China is a country with a lack of data culture, as far as the status quo is concerned, the credibility of the Chinese data is weak, the quality is low, the consistency of data definition is also an indisputable fact. In this regard, the Government should play a leading role in first of all, in the public domain to promote the concept of data governance, to realize that in the large data age, the most important basis for public decision-making will be the system of data, rather than personal experience and officers, the past in-depth mass, field visits work methods, although still effective, but for decision-making, The data collected by the system and the result of scientific analysis are more important. The government should increase the public opinion propaganda of the rule of data, put the knowledge of data into the regular training system of the civil servants, and strive to form the cultural atmosphere and the characteristics of the times in the whole society to "use data to speak, use data to manage, use data to make decisions and innovate with data".
Finally, we should step up privacy legislation around personal data security. Any technology is a double-edged sword, large data is no exception. How to effectively protect the privacy of citizens while promoting data openness will be a major challenge in the big data age.
The new year has just kicked off, I hope the relevant departments of the Chinese Government to formulate and large data-related policies, the introduction of specific measures to seize the historical opportunities and promote the development of Chinese society and progress. 2013, should become China's big data year.
(Responsible editor: The good of the Legacy)