The time is 1948, the location is Northeast China. Liaohsi into the critical phase. For the commander of the four field army Lin Biao, the most important goal after the Jinzhou is to defeat the Kuomintang new six troops. The method used by Lin Biao is to listen to the "intelligence report" every day, and the officers on duty should read out the situation and seizure of the troops.
It's almost uniform data, and it's boring. Until one day, Lin Biao suddenly discovered that in a two-armed encounter in the Hu Shack, the ratio of the spear to the spear was slightly higher than the other battles, and the ratio of the wrecked car to the cart was slightly higher than the others, and the proportion of the captured and killed officers and soldiers was slightly higher
Lin Biao thus concluded that the Kuomintang Army's command post was nearby, and he ordered the immediate pursuit of the defeated fleeing from here. Sure enough, the troops soon seized the Kuomintang side commander Liao, the results of the Chinese Communist Army Liaohsi victory kicked off.
Lin Biao's approach was in line with the simple definition of big data that has been popular around the world in recent years and which has changed profoundly, by discovering valuable information in what others see as boring data and translating it into opportunities.
The overall digitization of human society has led to explosive growth in data volume. The flow of people, capital and goods is presented in a data way. Today, the data generated every day in all sectors of the world is enough to fill more than 200 million DVDs-and there are limitless gold deposits in these seemingly haphazard data.
In recent years, with the development of storage capacity, computing power and transmission capability, it is possible for people to exploit these gold deposits. The data not only become the strategic assets that promote the merging and merging of the industries, but also embody the important component of the comprehensive national strength of a country, and become another kind of national core assets besides land, sea and air rights.
At present, the big data age has come to an undisputed. It has been the result of decades of technology accumulation in the world, but in the last two or three years it has rapidly penetrated all sectors. China is no exception, Baidu, Alibaba, Sina Weibo, Tencent Micro-letter and other core assets is big data, many industries, enterprises began to use and use large data to make changes, China's more than 600 million internet users of the online record is becoming the core content of large data.
In many of the industry's interviews with Caixin reporters, the momentum of big data is even more dramatic than it was more than 10 years ago during the dotcom bubble. Big data is more realistic, easier to land, and the profit model clearer. Industry changes driven by big data will have a disruptive impact on the world as a whole.
For the general public, although in the era of large data, but the big data is still unfamiliar. What are the big figures doing in China? What kind of convenience can a business or an individual have, or what kind of "harm" might be encountered? To understand the big data is the beginning of the confusion.
"National Core assets"
March 22, 2012, the Obama administration announced the investment of 200 million dollars to pull large data-related industries, the "Big Data Strategy" to the national strategy. The Obama administration defined big data as "the new oil of the future" and said that a country's ability to scale, activity and interpret its data would be an important part of the overall national strength, and that the possession and control of the data would be another core asset of the state beyond Lu Quan, sea and air rights.
The US move is another "Hurricane Dash" after the Clinton administration's "information superhighway" plan in 1993. In a time when emerging countries, represented by China, are increasingly challenging the economic and political influence of the United States, increasing U.S. control over data assets through large data studies will help the United States seize a new international strategic commanding position.
But what is the big data, so far, in the industry, academia did not form a recognized scientific definition. Tang Quanlong, director of the Shanghai Software Industry Promotion Center, said in an interview with a Chinese professor at Imperial University in London, what is the big data? And the professor thinks that the data that can be processed cannot be called Big data.
At first, McKinsey, an international consultancy with large data concepts, argues that large data is data that is larger than the size of conventional database tools. IDC, an international data company, uses "gross (volume)" "Diversity (produced)" "Fast Processing (velocity)" and "high value" to define large data.
Because of these characteristics of large data, the traditional data analysis, data mining, processing methods are no longer applicable. The society needs to establish a formalized and structured description method for dynamic, high-dimensional and complex large data, and then to develop large-data processing technology on this basis.
As with the beginning of cloud computing, many people are arguing about definition, and no one cares about the definition of it when IT infrastructure is moving from bandwidth to storage to allow cloud computing to actually go into application, Tang Quanlong said. And the large data concept is the same, it is accompanied by the data processing, storage and sharing ability to rise, from this point of view, the large data is not only the need to deal with the data object, but also includes the processing technology.
If cloud computing provides a place and channel for the storage and access of data assets, then data is truly a valuable asset. With the digitization of human activities and resource and environment information, and with the enhancement of storage and processing capabilities, we are able to obtain valuable information from documents, pictures, videos and even a large number of sensor data from the Internet of things.
Although difficult to deal with, but may obtain the value is higher. In the US, the use of big data can help retailers increase their profits by 60%, helping manufacturing reduce the cost of assembly by 50%, and based on the wisdom of big data, the output is as high as $300 billion trillion.
Yianyang, Deputy secretary general of Zhongguancun Big Data Industry Alliance, said the current trend is to make the data into assets. And that's the value of Facebook's listing, which shows that it doesn't have much physical assets, and that its market capitalisation of $ more than 100 billion trillion comes from intangible assets, and the most important intangible asset is its data.
From China's bat (Baidu, Alibaba, Tencent) three big network giants, they have different big data, are priceless assets. Baidu has user search data and public Web page data. In fact, Baidu, Google itself is a large data companies, they through the global web page data capture and analysis, to help users from a large number of data to find search results, the essence of data acquisition, organization, analysis and mining process.
Alibaba has trading data and credit data. These two types of data are easier to cash out and to tap into business value. Tencent has user relationship data and social data based on this. These data can analyze people's life and behavior, from which the political, social, cultural, commercial, health and other fields of information, and even predict the future.
Industry insiders said, it is also seen the huge value of user data, the microblogging site has now refused to access the web crawler content, which makes it difficult to search the content of micro-blog, and micro-bo themselves to the data packaging for sale.
The chief technology officer of Rui Xianglin Technology Co., Ltd, who specializes in large data analysis, said that soon the data would become an important asset for the enterprise, like technology, equipment and human resources. Unlike other assets, the more open a large data asset is, the more it is shared, which is beneficial to the entire industry and to the well-being of the community.
The nature of the big data, he argues, would be to liberate the data from the application and form the value chain independently. The future will be data definition applications, defining software, defining networks, defining data centers, and defining everything.
According to IDC forecasts, the world will have a total of 35ZB (1zb=1 trillion GB) of data by 2020. McKinsey predicts that the future use of large data products in the Personal location Service market will generate a value of 800 billion of dollars, the future of China's large data products, the potential market size is expected to reach 1.57 trillion yuan, will not only open up a new golden age for IT industry, but also subvert the competitive landscape of various industries.
Digging Bokhary Data
"We're digging." "At the big Data technology leaders ' summit in Dalian on June 21, Ayang, executive vice president of the company, told Caixin News reporter that they were digging into the financial data of small micro-enterprises, analyzing the credit rating of the enterprise, and helping them to get loans."
For the majority of small and medium-sized enterprises, in the absence of credit system in China, only through the guarantee of mutual security to obtain loans, and this is actually a mortgage pledge. And it's not safe for banks to have collateral. In the economic downturn, all the means of guarantee will be ineffective.
"Abroad, there is a complete system to curb the occurrence of the breach of faith, but we do not." "Ayang said. So they thought of another way of proving the ability to create value that could be a necessary condition for bank lending.
That's the big data. They can use large data technology to describe the enterprise's earning power. The source of these data, including the procurement of enterprises, orders, inventory, how much accounts receivable, how much cash, how many wages, pay taxes, and even energy consumption.
"We do not believe in financial statements, but look at the real data of the production and operation of the enterprise, not the total number, but the number of details, because the totals often obscure the true law." "Ayang said. In a statement, for example, a company can make a profit on a bridging loan, but in the raw data, the sudden cash will be found.
From at least two years of raw data, you can restore the enterprise wealth creation process, judge whether the enterprise has enough earning power. In 2010, Minsheng issued its first pure credit loan to a company based on their credit report. To now they have helped more than 800 companies to get more than 4 billion yuan of loans, the largest single loan is 68 million yuan.
Ayang proudly said that there had not been a bad loan in the more than 800 companies. And their large data technology can also help banks to these enterprises after the loan supervision, equivalent to the enterprise wear wearable equipment, at any time to monitor the health of enterprises, the growth of enterprises, stability, activity, can be reflected from large data.
The use of large data for the enterprise to do physical examination, is a large data application of a more alternative. And for the individual to do physical examination, let the 2011 from Silicon Valley back to the United hosts, brought back the first bucket of gold.
When we first started a business in America in 2000, at Stanford, who has just completed his computer and financial studies, has got a research project on a real-time translation system for the military, and after completing the project, his second company wants to use personal medical information to predict his medical costs and to give insurance companies a reference.
"It's technically much simpler than the first project," he said. "he said. Using data mining techniques, they analyzed the medical records and records of all the Stanford staff, and then predicted the annual medical expenses of everyone, which would let the insurance company know who was making money and who was losing money.
But after the project, big companies replaced insurers as their main customers. Because many large companies in the United States are themselves responsible for the medical costs of their employees, they help companies to predict the possible medical expenses of each employee, and then help enterprises to develop personalized fitness programs for employees, in advance to help employees improve their health, but save the medical expenses.
This helps the enterprise to save money, the efficiency of the staff has also improved, the staff's happiness has also improved. So the business has been welcomed by many of America's biggest companies, and now from Stanford to big companies like Cisco and Apple, they are clients. "This project has both economic and social value." "said the Lord Almighty.
August 2013, the city of hosts from Beijing to Shanghai, set up by the Seoul Data Technology Co., Ltd., focusing on the larger data platform for vertical applications, including advertising marketing, computational sociology and financial applications. On the financial front, they wanted to use the analysis of public data to get some investment advice to serve hedge funds. But then they found that the method worked very well, so they didn't want to sell it and they set up a hedge fund.
"It's a trillion-level opportunity. "He said to Caixin reporter, now this wave of big data upsurge, the momentum is no less than the beginning of this century internet bubble period, and momentum more fierce, landing faster, more clear profit model." In his view, data mining can be combined with various industries to create value. He is also an investment partner in broadband capital and is primarily responsible for large data laboratories, which specialize in the use of large data development industry applications.
Shanghai star Red Eucalyptus Data Technology Co., Ltd. is a company that has arrived in Shanghai to invest in the use of large data technology for media analysis of the start-up.
The founding team of the company was split from a ratings survey company. In China, ratings data are the basis for huge TV commercials, but in the past, the ratings survey is through the form of sample household survey, even in Beijing, Shanghai, such as large cities, there are only 500-600 samples, so it is easy to fake, as long as a few sample households, so that they specialize in watching a platform or a program, will have a great impact on the ratings, involving huge economic interests.
With the large-scale digitization of television, it is possible to collect all the users ' data from the backstage, thus avoiding the volatility and uncertainty caused by the sampling survey. Shanghai star Red Eucalyptus Data Technology Co., Ltd. general manager Li Futzen said that the use of these data, not only to provide ratings analysis, but also to the user's playback behavior for in-depth analysis, so that more accurate advertising, the effectiveness of the evaluation more clearly.
And in the new media era, they can include set-top boxes, intelligent television, tablets, mobile phones and other terminals on the performance of the collection, the completion of data analysis and mining, advertising and program value assessment, in addition they can also use these data for intelligent guidance recommendations, film and television drama risk assessment, user loss rate analysis, Embedded advertising analysis and many other scenarios.
It can be said that the only obstacle to preventing large data from penetrating into every industry is the imagination. The researchers summed up the application of nine large, high-value data, including understanding of customers, meeting customer service needs, business process optimization, personal life services, personalized medical care, athlete status monitoring, optimizing machine and equipment performance, improving public service capacity, real-time traffic optimization, using the high-frequency stock trading of social media and Internet news.
Digging ability to win
At present, more than 90% of the world's top 500 enterprises in the important investment and business decisions are based on in-depth data analysis and mining support. 武连峰, assistant vice president of IDC China, has said that the application of large data is promising and will gradually go into the traditional industry. China's future five-year large data market composite growth rate will reach 51.4%.
Yianyang said that the technology change all this is not exaggerated, can be said that there is no solution to the problem, only the imagined needs, the only difference may be the difference between the user experience. In the big data age, who has the superb data mining technology, means that have the key to open the vault. In the data age, the data is structured, but more unstructured, the frequency of updates is not the same, the data sources are more and more.
In the past, people tried to use the traditional structured database to deal with unstructured data, but the result was inadequate. Until Google in the research and development of page retrieval services, the resolution of Web pages, documents such as fast access to data problems, become a pioneer in large data technology. Then a Yahoo development team, Google's results to develop a large data processing of a set of procedural framework, is known as Hadoop.
The practice of these companies, so that all kinds of unstructured data processing difficult to regain confidence in the image, video, audio and other data processing technology is also on the fast track.
Faced with a lot of unstructured data, first of all, these data modeling, from the traditional analysis methods, coupled with wavelet analysis, collaborative filtering, machine learning, and many other complex analysis methods for these data to establish a good regression model, so as to predict according to this data to help enterprises optimize business solutions, Help the bank carry out customer risk management, help advertisers to carry out accurate marketing.
Hal Varian, Google's chief economist, Hall Fanrian that in an era where almost everything can be monitored and measured, "statisticians will be the sexiest profession for the next 10 years." ”
Rui Xianglin says they can now help banks make a 360-degree view of a particular customer and give the bank a full picture of the customer's status. They also set up a complex event-handling model for a larger Asian stock exchange, through real-time analysis of transaction data, predicting possible problems, predictive maintenance for manufacturing enterprises, analysis of the historical data of equipment, analyzing the location of possible problems, and rapid CT diagnosis for physicians.
In Rui Xianglin's view, the big data is not mysterious, but because of the past many knowledge, experience accumulated to today, so that people have the ability to use the data to predict. Of course, there is no need to superstition big data, what it can provide, more is a trend of prediction, is a probability.
The timeliness of data analysis, in many cases, is more important than accuracy. The key is "forecast trend". What does Wal-Mart's profits have to do with satellite imagery? It is because UBS needs more accurate estimate of the profitability of the company, in addition to the traditional method, UBS also bought satellite image data, from the Wal-Mart parking lot data, as a dimension of the model.
It can be said that the key value of large data is to obtain information advantages. The core ability of large data is discovering the law and predicting the future.
For the hyper-graph software that is engaged in geographic information system, the geographic information they target is composed of a large amount of unstructured data. Super Graph Software Vice President Wang Kangyu told Caixin News reporter, geographic information more and more generalized, including satellite information, UAV surveying information, radar remote sensing information, etc., rather than limited to the traditional mapping information, which put forward higher requirements in technology, the need for cloud computing technologies, mobile computing technology.
Of course, a lot of information also brings the innovation of business model. At present, geo-spatial analysis has become an essential part of space planning, geographical factors and meteorological factors are mapped to the price of agricultural products, futures prices, but also with the help of geo-spatial analysis. The current popular concept of GBI (geo-business intelligence) is also the use of large geospatial data.
"There are too many industries to breed. Wang Kangyu said, including product and service assurance, consulting and decision support, the industry is actively exploring, technical reserves.
Dr. Dinzogg, director of the Telecommunications Industry Division of IBM Greater China, said to the Caixin reporter, at present, the large data acquisition of the network has been refined to the user to watch the video waiting time, the number of waiting times, from the location of telecom operators information has been used by insurance companies to assess the risk of drivers, it can be said that now all walks of life are not not the large number of maps, The key is to think of the problem, the user as the center, but no longer the producer-centric.
Who's going to regulate big data?
With the advent of the big data age, digital survival will be truly realized. Network and Digital Life on the one hand to bring convenience, but also make it easier for criminals to obtain information about people, but also have more difficult to be traced and prevent the criminal means, there may be a more sophisticated scam, that is, large data has betrayed you.
It would be difficult to avoid the "big data". MI hosts introduced, AOL in the United States once did an experiment, only according to a person's search records, you can call the name of the person, live everywhere find out. It can be said that as long as the internet will leave traces, as long as the traces left enough, large data technology for this person's description is clear enough.
Yianyang says, in the big Data age, complete privacy doesn't exist. Once entered the Internet, solve the privacy problem is quite difficult, if not into the Internet, and easily marginalized.
A technical director of an online mall told Caixin News reporter, in order to carry out precision marketing, they not only to analyze the user's activities in the site, but also need the user's outbound activity information, and this information can be bought from the major portal sites.
Zhongguancun Large Data trading platform technology Deputy director Xu Tang said, as long as users use the network, there is no absolute privacy. And as long as the data is valuable, there is a deal. In Beijing, the annual real estate data of the underground turnover of up to 600 million yuan, 20G online payment site data, the price of hundreds of thousands of yuan to millions of dollars.
But the question is who the ownership of the data belongs to. There is a more extreme example, the major sites are providing users with a free "network disk", users can store all kinds of information, "network disk" capacity can reach 100G. For users, can be said to save a hard drive of hundreds of yuan, more to the access and sharing of information to provide convenient, but this information, but also become the core assets of these sites.
How to define the ownership and right to use of data assets, how to maintain the privacy of users, how to ensure that the user's data is not used by bad people, are placed in the large data must be clear before the problem, and these problems may not be solved by technology, the need for legal people to participate in the need for the country's top level of design.
Yianyang said that the value of large data, on the one hand, in use value, on the one hand, the exchange of values, the data may not be useful to the owner, but for other industries is helpful. Of course, this exchange should conform to the law.
The exchange value can actually be reflected in cases in which personal information is frequently used in the past. A large number of courier, owner information by the owner at a very cheap price sold to others, but for the latter created a large number of illegal income. In the future, he says, there will be a data-trading market in which data can be used for standardised trading.
The most direct way to prevent data information from being resold is to create a sunlit trading platform that allows data providers, analytics providers, and demand parties to trade face-to-face. Xu will introduce that there are already large data exchange markets abroad, such as Microsoft's 2010 launch of the AnyPoint platform for developers, Japan Fujitsu 2013 launched the enterprise electronic information trading platform "Data Plaza."
In "Data Plaza", the data can be bought including shopping site transactions, smart phone location information, social networking site (SNS) posts. However, these data need to be anonymous to all personal information before being traded, which eliminates the disclosure of privacy while satisfying the need for large data analysis.
According to the Japanese Market Research Association (JMRA), the market size of companies participating in Japanese data transactions is about 220 billion yen per year.
Tang Quanlong that, on the one hand, through the establishment of trading platform, through public listing to overcome underground transactions, the introduction of Third-party supervision. On the other hand, the need to increase the cost of disclosure of privacy, so that users ' privacy is abused, the disclosure of user privacy on the side also need to bear joint and several liability, which requires them to sell data information, shielding personal information.
Xu Tang said that China is now relatively backward in large data transactions, the key is that users are more concerned about the legal validity of the transaction, although the country has not issued a large data national strategy, but should be clear as soon as possible large data transaction legislation, promote large data pricing mechanism, and then realize the securitization of assets. "It may seem distant, but it will not be long. ”
(Responsible editor: Mengyishan)