"Editor's note" This article is reproduced in the public number "Fu Zhihua", the author has been Tencent social networking business Group data center director and Tencent Data Association president. Before Tencent, he worked in the market consulting, Analysys International, China Internet Association, and served as vice president of Dcci Internet Data Center.
The internet industry in the accumulation of large data and applications to Baidu, Tencent and Alibaba is the most noteworthy. Baidu, Tencent and Alibaba in the application of large data, although there is a common place, but due to their own data sources and business model of different, its large data applications also have different characteristics. This article will analyze the data assets and applications they have to make it easier to understand the big data situation and future strategies of large internet companies.
Baidu, Alibaba and Tencent data assets
Looking at the data type, Tencent data is the most comprehensive, which is related to the Internet business, its most prominent is social data and game data, among them: Social data is the core of the relationship chain data, user interaction data, user-generated text, pictures and video content, the game data mainly includes large-scale online games data, Web game data and mobile game data, the game data is the most core of the game's active behavior data and paid behavior data, Tencent's data is the most characteristic of a variety of social user behavior and entertainment data. Ali most prominent is the electricity merchant data, especially the user in Taobao and the day cat's commodity browsing, the search, the click, the collection and the purchase and so on data, its data biggest characteristic is from browses to the payment form the user funnel type transformation data. Baidu's data to user search keywords, crawler crawl web pages, pictures and video data, Baidu's data is characterized by the search keyword more directly reflect the user's interests and needs, Baidu's data with more unstructured data.
Baidu, Alibaba and Tencent data application scenario
Baidu, Alibaba and Tencent data application scenarios have a common system, the system is divided into seven layers, representing the different levels of the enterprise data value application scenarios, the formation of the enterprise operating data value pyramid:
(1) Data base platform layer. The bottom of the pyramid is also the base layer of the entire pyramid, if the foundation layer is not good, the above application layer is also difficult to play an effective role in enterprise operations, the technical goal of this layer is to achieve efficient data storage, computing and quality management; The business goal is to string all user (customer) data for the enterprise with a unique ID. Including the user (customer) portrait (such as gender, age, etc.), behavior and hobbies, to achieve a comprehensive understanding of the user (customer) purposes;
(2) Business Operation monitoring layer. First of all, this layer is to build the key data system of business operation, on the basis of the data products developed by Intelligent model, monitor the abnormal movement of key data, and through various analysis models, can quickly locate the reason of data moving, and assist operation decision.
(3) User/Customer Experience optimization layer. This layer is primarily to monitor and optimize user/customer experience issues through data. This uses both structured data and unstructured data (such as text) to monitor the experience. The former is more the application of a variety of user (customer) Experience monitoring models or tools to achieve, the latter is more through the monitoring of micro-blog, forum and enterprise internal customer feedback system text to find negative word-of-mouth, and when the optimization of products or services;
(4) Fine operation and marketing layer. This layer is mainly through data-driven business refinement operations and marketing. The main can be divided into four aspects: first, the construction of data extraction and operation based on user tools to facilitate the operation and marketing staff through the crowd orientation to extract customers, so as to customer marketing or operational activities; second, through data mining means to enhance customer response to activities; third, Customer lifecycle management by means of data mining; Four, the main is to use personalized recommendation algorithm based on the user's different interests and needs to recommend different goods or products, in order to achieve the promotion of resource efficiency and maximize the effect, such as Taobao products personalized recommendations;
(5) Data external service and market communication level. The data external service generally serves the customer or user of the Internet enterprise, such as Baidu through the provision of Baidu public opinion, Baidu spokesperson, Baidu Index and other services to its main advertisers customers; Taobao through the data cube, Taobao Intelligence and products in the cloud to serve its customers, Tencent through Tencent analysis and Tencent Cloud Analysis Services to its open business customers. In the market communication level, mainly through the interesting data information map and data visualization products to achieve (such as Taobao Index, Baidu Index, Baidu Spring Festival Migration map).
(6) Business analysis level. Mainly through the analyst to the large data statistics, the formation of empirical analysis of weekly, monthly and quarterly reports, and so on, the user's business and income completion of the situation analysis, find problems, optimize business strategy.
(7) Strategic analysis level. In this respect, it is necessary to combine the internal large data to form the data view of the decision-making level, and to combine the external data, especially the various competitive intelligence monitoring data and the foreign trend research data, to assist the decision making strategy analysis.
Although Baidu, Alibaba and Tencent have common characteristics in the application system of the data value of the enterprise operation, but because the business model of the enterprise and the data assets are different, their overall big data development strategy also has the remarkable difference.
Baidu Big Data Strategy
Baidu Big Data is the most important source is through the crawler collection of more than 100 countries near the trillion web page data, the amount of data is in the EB level scale. Baidu's data is very diverse, its collection of data for unstructured or semi-structured data, including Web page data, video and pictures and other data, but also structured data, such as the user's click behavior data, advertisers pay behavior data.
Baidu Big Data mainly serve three groups of people: one is internet users, through large data and natural language processing technology to make users search more accurate; the second category is advertisers, through large data to make advertisers ads and search keyword matching degree is higher, or and netizens are looking at the content of the Web page match higher; the third category is, Baidu is also in the key to promote a large data engine, the focus is to serve the traditional industry with a certain scale of data enterprises.
Baidu Big Data Engine represents the Internet Enterprise data Service capacity open and cooperation trend, Baidu large data engine consists of the following three aspects:
Open Cloud: Baidu's large-scale distributed computing and hyper-mass storage cloud, open cloud large data is open to the infrastructure and hardware capabilities. Baidu Cloud in the past mainly for developers, large data engine open cloud is for large data storage and processing needs of "big developers." Baidu's Open cloud also has high CPU utilization, high elasticity and low cost, according to Baidu's relevant personnel. Baidu is the world's first large-scale commercial arm of the company, and arm architecture is characterized by small energy consumption and storage density, at the same time, Baidu is the first GPU (graphics processor) applied in the field of machine learning companies, to achieve the purpose of energy savings.
Data Factory: Data factories for Baidu the ability to organize large amounts of data is similar to the role of database software, unlike data factories that are used to process terabytes or even larger data. Baidu Data Factory supports large scale heterogeneous data queries, supports sql-like and more complex query statements, and supports various query business scenarios. At the same time, Baidu Data Factory will also host TB-level large tables of concurrent queries and scans, large queries, low concurrency can be up to hundred GB per second.
Baidu Brain: Baidu's brain will be Baidu before the ability to open in artificial intelligence, mainly large-scale machine learning ability and deep learning ability. They were previously used in voice, image, text recognition, and natural language and semantic understanding, and were opened to smart hardware through platforms such as Baidu inside. Now these capabilities will be used to intelligently analyze, learn, process, use, and open up large data.
By packaging infrastructure capabilities, software system capabilities, and intelligent algorithm technologies, Baidu is open to large data engines, and businesses with large data can access their data to the engine for processing. From the framework, enterprises or organizations can only choose one of the three sets of the use, such as data stored in their own cloud, but to use some intelligent Baidu brain algorithm or data stored in Baidu Cloud, write their own algorithm.
Baidu Big Data Engine role
We can from two aspects to see the specific role of Baidu Big Data Engine:
(1) For government agencies: such as the transport sector has a network of vehicles, things networking, road network monitoring, ship networking, dock station monitoring and other places of large data, if these data and Baidu's search records, the whole network data, LBS data combination, in the use of large data engine Baidu Big data capabilities, you can achieve intelligent path planning and capacity management , the health sector has influenza statutory reporting data, national influenza-like case sentinel surveillance and pathogen monitoring data, which, if combined with Baidu's search records and full web data, can be used for influenza prediction and vaccination guidance.
(2) for enterprises: Many enterprises also have a large number of big data, but many enterprises have a large data processing and mining capacity is relatively weak, if the application of the large-scale data engine Baidu, you can carry out a reliable low-cost storage of huge amounts of information, intelligent of the value mining. As in the April 2014 Baidu Technology Opening day, China Ping An introduced how to use Baidu's large data capabilities to enhance consumer understanding and forecasting, Customer Segmentation Group to develop personalized products and marketing programs.
Alibaba Big Data Strategy
The overall development direction of Alibaba's large data is the development of data age, which is based on the Marvell of productivity, DT (data technology driven). Alibaba big data in the future will be "based on cloud computing data open + Large data tool application" Composition:
(1) Data opening based on cloud computing. Cloud computing enables small and medium sized enterprises to obtain data storage and processing services on the Ali Cloud, as well as to build their own data applications. Cloud computing is the foundation of data openness, and cloud computing can provide a data-working platform for global data developers, Ali distributed storage platform and algorithm tools on this platform, can be better for data developers to use, at the same time, Alibaba also need to do a good job of data desensitization, the business definition of data, each label play clear enough, Enables global data developers to start data thinking on the Alibaba platform, making data available to Governments, consumers, and industries. Ali's big data is open, line of data can be concatenated together, all are data providers, but also data users.
(2) in the application of large data, Ma Yun has in the entire data application has identified two guidelines:
The first policy: from it to DT (data technology), DT is the power to ignite the entire data and stimulate the entire data, be managed by the society, used by the sale, for the manufacturing industry, for the consumer credit. The previous article has analyzed the way, Alibaba's data assets are mainly electricity quotient, among them, Taobao and day cat will produce rich and diverse data every day, Alibaba has precipitated includes the transaction, the finance, the Life service and so on many kinds of data. This data can help Alibaba to carry on the data operation (the following figure).
Another of its most important applications is the financial sector-small microfinance. In the financing of small microfinance enterprises. Since banks are unable to grasp the real business data of small micro-enterprises, not only caused a lot of enterprises can not get loans, but also because the lack of data types lead to a long process of judgment, Ali has been through his electricity business data in the transaction, credit, SNS and other data to determine whether the loan and the amount of loans can be issued.
The second policy: Let Alibaba's data, Alibaba's tools can become China's business infrastructure. Alibaba has begun in the transition, Ali will be directly facing consumers from their own to support the network of consumers, Ali will be based on its existing operations and data experience, the development of more tools to help the network growth, so that the network operators to better use the best tools, services to serve consumers better. As Ma Yun said, "I believe that no net trader wants to have a client of his own, no network operators do not want to know the customer's experience in the end is good or bad, how to sustain these customers, we feel that a country's economy, should be ceded to the entrepreneurial community to do, we feel that the future of Taobao business economy, It should be left to the web operators to decide, not us to make a decision.
Tencent Big Data Strategy
Tencent's big data is now more for Tencent Enterprise internal operations services, compared to Ali and Baidu, data openness is not high. Therefore, for Tencent we mainly introduce Tencent large data in the service enterprise Interior application scenarios and services.
Tencent more than 90% of the data has been centralized management, data concentration in the data platform department, more than 100 products have been centralized management of data, but also is centralized storage in the Tencent self-Research Data Warehouse (TDW). Tencent large data from the data application of different links can be divided into four levels, including data analysis, data mining, data management and data visualization:
(1) The data analysis layer has four products: Self-Service analysis, user portrait, real-time Multidimensional Analysis and positioning tool. The self-service analysis can help the non-technical person to realize the data statistic and the demonstration function through the simple condition configuration; user portrait is a group of users or a user of a business automation of the crowd portrait; real-time Multidimensional Analysis tool is able to achieve a certain index can be real-time segmentation of multiple dimensions, to facilitate the analysis of human A multidimensional analysis of a certain index from different angles, and the intelligent location of the data is realized by the Intelligent positioning tool.
(2) The data mining level of the product applications are: Precision advertising system, user personalized recommendation engine and customer lifecycle management. Precision advertising system, such as a wide range of based on the massive data of Tencent big social platform, through the accurate recommendation algorithm, with the intelligent directional promotion position to realize the advertisement accurate delivery; user personalized recommendation engine according to each user's interest and preference, through personalized recommendation algorithm (collaborative filtering, based on content recommendation, graph algorithm, Bayesian, etc.), to achieve personalized product recommendation requirements; Customer lifecycle management system is based on large data, according to the user/customer's different life cycle of data mining, establish prediction, early warning and user characteristics model, in accordance with the user/customer's different life cycle characteristics of fine operation and marketing.
(3) In the data management level, there are: TDW (Tencent Data Warehouse), Tdbank (data bank), Metadata management platform and task scheduling system and data monitoring. This level is mainly to realize the efficient centralized storage of data, the definition and management of data, the data quality management, the timely scheduling and calculation of computing tasks, and the monitoring and warning of data problems.
(4) In the data visualization level: Self-Help reporting tools, Tencent Compass, Tencent analysis and Tencent Cloud analysis tools. Self-service reporting tools can be self-service to implement a report that is relatively simple and logically relatively simple. Tencent Compass is divided into internal and external version, internal version is to serve Tencent internal users (product managers, operators and technical personnel, etc.) efficient reporting tools, the external version is to serve Tencent partners such as developers of the reporting tool. Tencent analysis is the website Analysis tool, helps the website master to carry on the omni-directional analysis of the website. Tencent Cloud Analysis is a tool to help developers make decisions and optimize their operations.
Overall, Baidu, Alibaba and Tencent three big internet companies have big data, the data of the three big internet giants are used to optimize their business operation effect, from this aspect, its data value application scenario is similar. However, due to the different business and business model of the three data assets, it also determines the three future big data strategy, especially based on large data open and cooperative perspective, Baidu and Alibaba relatively more open. For internet companies that value large data openness and cooperation, what they most expect is to exchange more data with more traditional industries, through a strategy of large data openness, to better enrich their online data, to create synergies between offline and offline data, and to expand new business models, such as smart hardware and large data health.