2014 Zhongguancun Large Data day on December 11, 2014 in Zhongguancun, the General Assembly to "aggregate data assets, promote industrial innovation" as the theme, to explore data asset management and transformation, large data depth technology and industry data application innovation and ecological system construction and so on key issues. The Conference also carries on the question of the demand and practice of the departments in charge of the government, the finance, the operators and so on to realize the path of transformation and industry innovation through the management and operation of data assets.
In the afternoon of the operator @big Data Forum, Asia letter Large data analysis Platform Product manager Wu Hongwei brought "let industry data fusion more efficient, analysis more convenient" keynote speech, four parts of the speech, the first is the Internet to drive the development of data analysis, the second is the wisdom of large data and some of the problems we encountered , the third part is how the Asian letter accelerates the industrial regional fusion analysis, and the fourth shares the story behind the big data.
Wu Hongwei: Hello, everyone. I am a large data Wu Hongwei, this share is mainly four parts, the first is the Internet led to a development of data analysis, the second is the wisdom of large data and we encounter some of the problems of thinking, the third part is how the Asian letter to accelerate the industrial large regional integration analysis, The fourth is to share the story behind the large data on the letter. First of all, we look at the history of China's Internet development, in 1986 to 1990 completed the Internet construction, from 1997 to 2008 phase is the rapid development of the Internet, the number of users and bandwidth are growing rapidly, from 2009 to 2014 you enter the mobile Internet period, with the 3G4G license issued, Users through the Internet can complete access to the Internet, this time the Internet model has been very mature, users through online shopping has direct access to use. In fact, the internet has also led to the development of large data, according to 2013 years of incomplete statistics, the global Internet generated daily data is reached 100 million B, I am equivalent to 1000 PB-class, it needs to have how many disk to burn it, need 180 million pieces. We are also transforming from large data in the large data analysis process. The first is large, analyzed data content from GB to PB or even more. The second is a large scope, from the enterprise's internal structured data to unstructured data transformation, but also access to data outside the enterprise. The third part is the data function is big, before the data or is used for the daily operation, along with the big data drives the enterprise operation and the marketing cost, therefore the enterprise big data has the very big help.
Below we look at large data on the importance of the industry, mainly reflected in the Internet way, large data belong to the top of the pyramid, large data analysis can quickly understand the policy and regulations, understand the specific public opinion of the enterprise, can also carry out some risk aversion to understand the needs of consumers, So large data analysis is very important for the industry. How to do large data analysis? In fact, we face five challenges. The first is the massive data, the massive data is the Internet access, the Internet big data already from the GB type to the PB type transformation, additionally forms the structured data, forms the data diversity. The third challenge is the convenience of data, the previous large data analysis is mainly used for small groups of people, professional IT staff can use the analysis platform and analysis tools, we hope that large data analysis can be supported to the first line of business personnel directly use. The fourth is accuracy, is in the large data analysis process, can accurately let large data run. The fifth is the challenge of big data. The internet process, we tend to do some real-time prediction of the final accounts, such as real-time data analysis, we look at the traditional data analysis limitations, the first is the delay, after all, the construction cycle is relatively long, in the original basis did not consider real-time. The second feature of its limited insight, the traditional data analysis data access or from the enterprise internal, and more can only support the structured data, so its analysis is not particularly wide, the analysis of accuracy is not particularly high. The third is the high threshold, because the traditional data analysis we use more commercial data analysis, such as BU, such a user has some limitations. Four is the high cost we often use commercial high-end server and commercial data analysis when we build the system, but as the technology threshold is getting lower and the data is growing, it will bring more cost investment. So our big data analysis actually has a relative opportunity, we think to make a major change in four aspects, the first to provide real-time, to provide real-time analysis, the second to enhance accuracy, access to third party data, complementary enterprise data information. The third piece we build a large data platform with low cost inputs and high returns. Block four to reduce the use of the threshold, so that large data analysis can not only be used for professionals, but also for the specific front-line business personnel to use.
So let's take a look at the change in the great data of Assyria. We summarize eight aspects, the first connection, the second fusion, the third real time, the fourth one is simple, the fifth is fast, the sixth is professional, seventh is professional, the eighth is open. First connection we want to collect data from various industries to avoid islands of data. For example, a business needs to know the competitor's data, or he needs to develop the next strategy based on the data, he needs to understand the third party data reports, and according to the Internet product use feedback, so we collect data to form large data. So we form Internet data, unstructured data, and structured data. In addition we provide a data protocol that provides interfaces to provide real-time and non-real-time methods. The second we converged on large data because it was very difficult to merge as we were accessing data that was large and structured and unstructured data. We are dealing with two aspects, one is outside the library model, one is the external model of inventory, we complete the process of integration. In the outside of the library, we compute more of the process of flow processing technology and the technology of the memory pipeline, and calculate the traditional data according to the special service. There is also a real-time data processing, real-time processing in the business process is widely used, such as highway conditions, as well as tourism, can participate in the recommendation of customers. The previous three views are mainly about how our data is accessed and how the data is processed. From the fourth point of view is how the data analysis, the fourth point is simple, to achieve the national self-help analysis, reduce the threshold. We and the enterprise also has done the communication, generally can use the data the personnel generally is the front-line personnel, he also wants to know this data also to know how to use the data. But he does not know how to process and analyze data, because in the traditional data has a third-party tool, he needs to have IT staff to do the appropriate support. The first cycle is longer and the second data is harder to cover. Therefore, we combine the data analysis platform based on the business capability realization, so that the business staff directly in the national Data System platform, the first is it data platform, it data into business processing. The second is to do the conversion process. The third is to meet the complex processing of a particular scenario. For example, I came to 100,000 users, I may be based on some of the user's behavior characteristics I gave him two times, the way the document can complete the customer profile, for the customer follow-up to make an effective delivery.
For the user experience is mainly speed, speed is a big data age is very difficult to solve the problem. At that time we tested a scenario where the data exceeded the 1T DataSet, 2000 business fields, the way we realized the idea was to respond within 10 seconds. We do data optimization in the process of data calculation, as far as possible in the data planning, in the data calculation process for efficient data lookup, the corresponding data storage nodes can be a corresponding calculation, so that the use of resources, and improve the efficiency of the data.
The sixth point is mainly professional, this is the depth analysis, we actually let the business people understand the status quo more quickly, focus on the future. In-depth analysis of mining enterprise-related data, according to the Enterprise to do the original data, summed up data for the development of enterprises to indicate the direction. Data mining algorithms, such as PageRank, commodity purchase forecasts, consumption forecasts, through the rapid use of these data in enterprise marketing, in the enterprise marketing can use data analysis results can recommend marketing, enterprise data decision-making can be shared, so there is an open platform, The ability to achieve value in a way that provides and downloads data.
The last point of view is to cloud, through SaaS to help enterprises to increase revenue, high operation and efficiency, because it is in the cloud, in addition to the SaaS data security encryption, so that user privacy is guaranteed. It then calculates the allocation of resources and storage resources based on the user's use of the platform. In terms of user fees, we do this on demand, for example, based on the use of user storage application capabilities. Through the previous eight views we have done the attempt of the Assyrian, its core is a number, that is, the large data of the letter. It is mainly divided into functional domain and user domain from the above view, the common function domain integrates the big data to carry on the analysis, forms the data maximization, opens the data through the open platform, from the user to see us the large data analysis such a ability not only to serve the large and medium-sized enterprises, but also hope to provide services for small micro enterprises through SaaS. Large data analysis is inseparable from cloud computing technology, here are three examples, the first one we implemented through Hadoop, online analysis We use the PSESG way to implement.
Below I share the story of the great data fusion of Asia, such as transportation, finance, service industry and so on. Below I combine four big data fusion analysis four stories to let us review the Big Data analysis function, it and the data analysis is different, the first is the user operation behind the story. This is an enterprise in order to show his users, for example, the past users, end-user information, want to push to large data channels, we have access to large data channels, to the data show, through this story we see the Big data analysis advocates real-time analysis of large data, through real-time analysis can effectively help companies to make decisions, can also achieve rapid and accurate marketing. The second is a number of users on the Internet, through the user information on the Internet, and then users of the information on the Internet, such as this person is 29 years old is it male, is the use of HT mobile phone, he is single, his hobby is hand tour, this is our data and the situation, We put the data into the fragmentation of the corresponding integration into a large map.
The third story is the story behind the user's electric quotient, this story is the depth of mining analysis of the story, we based on the user's recent browsing behavior, and then according to some of the current commodity sales of some of the corresponding personalized recommendations, so we did a valid order access rate reached 87.6%, reduce input 17%. This is actually a larger analysis is to find the value behind the data, we through the depth analysis, through the automatic mining algorithm to explore the value behind the large data, the enterprise's future operations and market decisions to provide an effective help.
The fourth story actually mentioned is also more, is the public opinion analysis. This piece we through the Third party analysis report, through some data of the Internet obtains to the enterprise negative news, the enterprise competitor situation, the commodity sale situation, some customer feedback and so on effectively supports the enterprise operation.
From these stories we can see that the big data analysis can not be separated from the Internet. Although the internet has been transformed from consumer internet to industrial Internet, large data fusion analysis is actually able to promote greater help for industrial Internet. This is my share, thank you!
(Responsible editor: Mengyishan)