The five challenges and coping strategies faced by China's large data industry

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Victor Maire-Schoenberg, one of the authors of the Great Data Age: A major change in life, work and thinking, has said that, as telescopes allow humans to perceive the universe, microscopes allow humans to observe microbes, and big data opens up a major transformation of the times.

Large data, the most fashionable word in the IT field, is simply the ability to quickly obtain value information from a variety of data.

The United States was the first country to discover and use the value of large data science. In March 2012, the Obama administration announced that it was investing 200 million of billions of dollars to drive big data-related industries, raising the "Big Data Strategy" into a national strategy, and even defining big data as "the new oil of the future". At the time, the U.S. Government stated that it was considered to be at the same level as the Internet by improving the ability of the United States to extract knowledge and management from large and complex data sets to enhance the nation's competitiveness. Obviously, large data is not only a word, but also a technology, is an industrial era.

And China as the world's most populous, GDP ranked second country, the establishment of large data national team is very timely. The essence of Big data is "big", it's not sampling, it's all kind of, it is not the blind touch of the legs or like the nose, but the entire elephant itself, the essence of large data is the more people with more overweight, through such a vague macro-judgment, can complete a precise individual recommendation, which will make the whole production efficiency greatly improved.

However, as a new field, although large data means great opportunities, has great application value, but also encounter engineering technology, management policy, talent training, capital investment and many other areas of great challenges. Only by solving these basic challenges, can we make full use of this great opportunity, so that large data for the enterprise to give full play to the greatest value and contribution.

Challenge one: Data sources are complex

Rich data source is the prerequisite of the development of large data industry. And China's digital data resources are far lower than the United States and Europe, the annual new data volume is only 7% of the U.S., Europe's 12%, of which the government and the manufacturing industry data resource accumulation lags far behind abroad. There are already limited data resources, there are also standardization, accuracy, integrity, low utilization value of the situation, which greatly reduces the value of data.

Today, almost any scale of enterprises, every moment is also producing a lot of data, but how these data collection, refining has always been a problem. The significance of large data technology does not lie in mastering the huge data information, but in the intelligent processing of the data, analyzing and digging out valuable information, but the prerequisite is how to obtain a large number of valuable data.

In the future, data acquisition is a big market, because the data model of analysis can be done according to demand and thinking, but all the prerequisite is that your data collection should be accurate, now the problem is not collected, one is the collection is wrong, there is a collection of efficiency by the network bandwidth constraints, these few can not be used to the value of data is difficult to use.

In the big data age, we need more comprehensive data to improve the accuracy of analysis predictions, so we need more convenient, inexpensive, automated data production tools. In addition to our online browsers intentionally or unintentionally record the personal information data, mobile phones, smart watches, smart bracelet and other wearable devices are also at all times produce data; even our home routers, televisions, air-conditioning, refrigerators, drinking fountains, The purifier has become more and more intelligent and has the networking function, these appliances in the better service to us, but also in the production of a large number of data; even we go shopping, merchant WiFi, operator 3G network, ubiquitous camera electronic eye, department store self-service screen, bank ATM, Gas stations and credit card machines, all over the convenience stores, are also generating data.

With the rapid development of mobile interconnection, cloud computing and other technologies, no matter when and where, mobile phones and other web portals and ubiquitous sensors, and so on, will collect, store, use and share personal data, all of which are happening without the knowledge of people. Your every move, location, or even a day to go to where, will be recorded, become a mass of disorderly data in a series, and other data for integration analysis.

For example, when you scan a two-dimensional code with a mobile phone and forward it to Twitter, your consumption habits, preferences, and even information about your social circle are captured by the merchant's large data analysis tool. Large data platform in the provision of services, but also at all times to collect the user's various personal information: consumption habits, reading habits and even life habits. These data, on the one hand, brought a lot of convenience, but on the other hand, because the data management still exist loopholes, those released or stored in the vast amount of information, it is also easy to be monitored, stolen.

Large numbers emit incalculable commercial value. But it is disturbing that information collection means more and more superb, convenient and covert, the protection of personal information to citizens, both in technical means or legal support are still stretched. People face not only endless harassment, but also the threat of various criminal acts. In the big data age, who protects the privacy of citizens? It is not only a question that everyone should think about, but also an unshirkable responsibility of government departments.

Challenge two: Data Mining analysis Model establishment

Stepping into the big data age, people are talking about big data, which seems to have evolved into a new trend. Data is more rooted in every corner of our lives than ever before. We try to use data to solve problems, improve welfare, and promote new economic prosperity. People are showing off the high expectations of large data and the great interest in large data analysis techniques. However, with regard to the analysis of large data, the noise of the people advocating its magical value is very high, but it is rare that its practical use of the model and methods. The main reasons for this dilemma are the following two points: first, there is not enough insight into the value logic of large data analysis, and then some important elements or techniques in large data analysis are immature. The massive growth in data and the lack of such large data analysis logic and the development of large data technologies are the challenges we face in the big data age.

Large data, the General people think refers to its data scale of mass. With the technological revolution in data recording, acquiring and transmitting, the data is convenient and low cost, which makes the original data with a finite amount of information describing human attitude or behavior in a high cost way become a huge and massive packet. This is actually a one-sided understanding. In fact, the former big Data age also has a large number of data sets, but because of its single dimension, as well as the human or social organic activities of the stripping, but to analyze and understand the truth of the value is very limited. The real value of large data lies not in its size, but in its comprehensive: the multiple angles of the spatial dimension, the overlapping of multi-level information, and the continuous presentation of information associated with the activity of human or social organism in the time dimension.

In addition, to handle large data in a low-cost and scalable manner, this requires refactoring the entire IT architecture to develop advanced software platforms and algorithms. In this regard, the foreign once again walked in front of us. Especially in recent years, the development of open source model of Hadoop, such as large data processing software platform, and related industries have been initially formed in the United States. But the Data processing technology foundation of our country is weak, generally follow mainly, it is difficult to meet the demand of large scale data application. If the big data is compared to oil, the data analysis tool is the technology of exploration, drilling, refining and processing. In order to transform resources into value, we must master the key technologies of large data. It should be said that to go through this hurdle, open source technology provides us with a good foundation.

So now many businesses are beginning to realize that to really do data analysis, data mining applications in the Hadoop platform, there are two options, either a knowledge of the data, understand the analysis, understand programming and skills of the technical team to operate, or choose a commercial company launched a mature large data platform.

In a word, although the computer intelligence has made great progress, but only for small-scale, structured or class structure of the data analysis, not to mention deep-seated data mining, the existing data mining algorithms in different industries are still difficult to general.

Challenge three: The trade-off between data openness and privacy

The premise of data application is that the data is open, which is already a consensus. Some professionals point out that China has the largest population in the world, but 2010 China's new data storage is 250PB, only Japan's 60% and North America's 7%. At present, some departments and institutions in our country have a large amount of data, but would rather they do not want to provide to the relevant departments to share, resulting in incomplete information or repeated investment. China's data storage capacity reached 64EB in 2012, with 55% of the data needed to be protected at a certain level, but less than half of the data is currently protected.

Let's take a look at the U.S. approach to data openness. The United States Government provides policy and funding guarantees, to make the data Information Center cluster become the national information production and service base, guarantee the continuous data supply, use the network to send the data and information to all the citizens ' desks and families, including scientists, government employees, company staff, school teachers and students in time. Bring the whole society into the information age.

Throughout the country, our government, enterprise and industry information system construction often lacks the unified planning and the scientific demonstration, the system lacks the unified standard, has formed many "the information island", moreover is limited by the administrative monopoly and the commercial benefit, the data opening degree is low, beggar-thy-neighbour, sharing difficultly, this causes the huge obstacle to the data utilization One of the important factors restricting the opening and sharing of data resources in China is the imperfect policies and regulations, the lack of corresponding legislation in large data mining, after all, there is no national law for data sharing in our country, only relevant regulations, regulations, Articles of association, opinions and so on. There is no guarantee of both sharing and abuse, on the one hand lack of policies to promote government and public data, on the other hand, the imperfect system of data protection and privacy protection has inhibited the enthusiasm of openness. Therefore, the establishment of a healthy development of the data-sharing ecosystem, is the development of large data in our country need to go through the same cut.

The balance between openness and privacy is also a major problem. Any technology is a double-edged sword, large data is no exception. How to promote the full openness, application and sharing of data while effectively protecting citizens, corporate privacy, and gradually strengthening privacy legislation will be a major challenge in the big data age.

It is hard to open up and share data in the whole society, which makes the data quality much less. The key to data increment is integration, but the prerequisite of free integration is the openness of data. In the age of large data, the significance of open data is not only to satisfy citizens ' right to know, but also to let the most important means of production and living data flow freely in large data age, to promote the development of knowledge economy and network economy, and to promote the economic growth of China from extensive to fine type. However, the lack of strategic concept, the difficulty of coordination between government agencies, the lack of awareness of data sharing and insufficient investment, and the inability of scientists to meet the large data are the difficulties that large data have to face in the current development of our country.

Challenge four: Large data management and decision making

The technical challenges of big data are obvious, but the challenges of decision-making are even more daunting. The important aspect of large data is that it directly affects how the organization makes decisions and who makes decisions. In an era of limited information, high cost and no digitization, the people who make major decisions within the organization are typically highly weighted people, or an external think-tank with a high level of expertise and a prominent resume. But in today's business world, executives ' decisions still rely more on personal experience and intuition than on data.

The fundamental goal of large data development is to help people make smarter decisions and optimize business and social functioning based on data analysis. The Harvard Business Review says big data is essentially "a management revolution". Decisions in the big data age do not have to be based on experience, but really "talk with data". Therefore, large data can really play a role, a deeper look, but also to improve our management model, need to manage the way and structure with large data technology tools to match. This is perhaps the hardest hurdle we have ever had.

Large data application area is still narrow, the application cost is too high, restricting large data application. The industry that can make use of the big data behind the domestic industry is mainly concentrated in the finance, telecommunications, energy, securities, tobacco and other super large, monopolistic enterprises, other industries to talk about the value of large data is premature. With the increasing volume of information within the enterprise, large data will become a major factor in IT spending, especially the cost of data storage, which is likely to cause the burden of enterprises, or even prohibitive. So the visionary CIO must be prepared beforehand.

Challenge five: Big data talent gap

If the big data represented by Hadoop is a small elephant, then the enterprise must have a trainer who can tame it. When many companies embrace such large data technology, the talent that is proficient in large data technology becomes a big gap.

The Big data construction each link needs to rely on the professional to complete, therefore, must train and create one to understand the command, understands the technology, understands the management the big data construction specialized troop.

It can be said that the real launch of large data in the enterprise and the comprehensive application of society, not only face the problem of technology and tools, more important is to change the business thinking and organizational structure, in order to really dig this large data "gold mine." So in the big data age, what should we do to deal with the strategy to grasp the point of victory?

Integration and openness are the cornerstones

Connotate, a large data services start-up company, surveyed more than 800 business and it executives. The results showed that 60% of respondents said: "It is too early to say that these big data investment projects will certainly bring good returns." "This is because the current big data lacks the necessary openness: the data is in the hands of different departments and companies that are unwilling to share data." Large data is to find the objective law by studying the correlation of data, which depends on the authenticity and universality of the data, how to share and open the data, which is the weakness of the current big data development and the big problem that needs to be solved.

In the 2012 U.S. election, Obama benefited from data consolidation. The Obama campaign has a mysterious data mining team that helped Obama raise 1 billion of billions of dollars by digging up huge amounts of data, raising the efficiency of campaign advertising by 14% through data mining, and by making detailed models of "swing state" voters, 66,000 mock elections are carried out every night to figure out the odds of Obama's "swing state" and to guide the allocation of resources. The Obama campaign has the most advantage over the Romney campaign: consolidation of Big data. Obama's data-mining team is also aware of the world's common problem: data is scattered across too many databases. So in the first 18 months, the Obama campaign created a single, huge data system that integrates information from pollsters, donors, field workers, consumer databases, social media, and key Democratic voters in swing state. Not only can you tell the campaign team how to find voters and get their attention, but also help the data processing team predict which types of people are likely to be persuaded by certain things. As campaign commander Jim Messina says, the assumption that there is no data to support the entire campaign is rare.

In March 2012, the Obama administration announced an investment of 200 million dollars to launch a "Big Data research and development program" that would increase "big data research" to the national will. The ability of a country to have the size of data and the capacity to use it will be an important component of overall national strength. One of the goals of domestic intelligent city construction is to realize the centralized sharing of data.

Therefore, from the social and national spheres, it is urgent for our country to pay great attention to the big data in the country, especially to give strong support from policy formulation, resource input, talent training, etc. on the other hand, the establishment of benign large data ecological environment is the main way to effectively deal with large data challenges and use large data, which requires scientific technology, Industry and government departments work together under the guidance of national policies to establish a harmonious large data ecosystem by removing barriers, establishing alliances, large data quality standards, and establishing professional organizations.

The business model of boosting cooperation and winning together

With the growing sophistication of cloud computing, large data technology and related business environments, more and more "software developers" are using large, cross-industry data platforms to create large data applications for innovative value, and this threshold is declining. Because first, data owners are able to gain additional revenue at minimal cost and increase profitability; second, large data-device manufacturers need to be used to attract consumers to buy equipment, and the partnership to win and develop is more profitable than simply selling the equipment, and some visionary vendors have started to provide funding, technical support , shares and other means to support these "software developers"; third, the industry segmentation market data Analysis application requirements are increasing, for the whole large data industry chain, innovative industry data application developers will be the future of the entire large data industry chain is the most active part.

In the future, there are three kinds of enterprises will be in the "Big Data industry chain" in the important position: Master A large number of effective data enterprises, with strong data analysis capabilities of enterprises, as well as innovative "software developers." Social networks, mobile Internet, information enterprises, telecom operators are huge numbers of manufacturers, Facebook has 850 million users in hand, Taobao registered more than 370 million users, Tencent's micro-credit users breakthrough 300 million, the vast user base provides data, is waiting for the opportunity to release huge commercial energy. It can be predicted that in the near future, a large number of data holders such as Facebook, Tencent, telecom operators, or themselves become data analysis providers, or close docking with IBM, ZTE and other enterprises to become upstream and downstream cooperation enterprises, the big data industry chain will come at a certain time of outbreak, Growing at an astonishing rate.

The lethality of large data needs to be prevented.

Large data age, the traditional random sampling is replaced by "the puzzle of all data", and people's thinking decision mode can be judged directly according to "what", because such conclusion eliminates the interference of individual emotion, psychological motive, sampling accuracy and so on, so it will be more accurate and more predictable. However, because large data is too dependent on the collection of data, once the data itself has problems, it is likely that "catastrophic data", that is, because of the problem of the data itself, resulting in erroneous predictions and decisions.

The theory of Big data is "find a needle in a pile of straw", and if "all the straws look like the needle"? Too much information about authenticity and value, like too little information, is also a hazard for situations that require instant judgment and are likely to have serious consequences once a mistake is made. The "Big Data" theory is built on the basis of "mass data are facts," and what if the data provider is faking it? This becomes more harmful in the age of large data because people cannot control the biases of the data providers and collectors themselves. The Wall Street investment banks and the big ratings agencies, which have the best database and the first to accept the idea of "big data", often make mistakes on major issues, revealing the limits of "big Data".

Not only that, the big data age created a world of ubiquitous databases, and data regulators faced unprecedented pressures and responsibilities: how to avoid data leaks hurting national interests, public interests, and personal privacy? How to avoid the wrong information, the interests of the difficult groups constitute harm? Before effective control of risk, it may be better to keep "big data" in cages.

The economic value of large data has been recognized, the technology of large data is maturing, once the data are integrated and supervised, the era of big data explosion is coming. What we have to do now is to choose our own direction, in order to meet the arrival of large data, prepare ahead of time.

From the perspective of the future, whether it is the government, internet companies, IT companies or industry users, as long as we embrace the "big data" with an open mind and innovative courage, the big data age must have the opportunity to belong to China.

The training of large data talents is imminent

The shortage of large data-related talents will become an important factor affecting the development of large data market. Gartner predicts that by 2015, there will be 4.4 million new jobs around the world with big data, and 25% of organizations will have a chief data Officer position. Large data related positions need to be complex talent, able to mathematics, statistics, data analysis, machine learning and natural language processing and other aspects of comprehensive control. In the future, the big data will be about 1 million of the talent gap, in various industries and fields, the high-end talent in large data will become the hottest talent, covering a large number of engineers, planners, analysts, architects, applications and other segments and professional. Therefore, society, colleges and enterprises need to work together to cultivate and excavate. Companies can work together with schools to cultivate talent, or set up a dedicated team of data scientists, or with professional processing companies to solve the human needs.

When the big data is discussed in full swing, we need to think calmly, how to make the technology solid and effective landing. Although we still have a long way to go in the big Data age, as son last said in Wu Town's speech yesterday: "What I want to say is, we have to have confidence that China will become the world's largest economy in a few years, the future of mankind will be full of opportunities, will be filled with a lot of happiness, there are many bright vision waiting for us.

(Responsible editor: Mengyishan)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More