It is an important goal for us to realize the improvement of enterprise operation efficiency through large data technology, but this work is not easy for every enterprise. On January 21, "1 billion said TalkingData mobile Internet Industry index Data Report", a number of industry experts and senior talkingdata for us to share the release of large data value existing pits, and how we can solve.
The so-called 1 billion said that the TalkingData platform now covers 1.06 billion of mobile smart devices, including iOS, Android's system platform, including smartphones, tablets, smart TVs and so on in different device forms.
The real analysis is still in infancy, experts say
National Mobile Media Committee, China Mobile Mobile Internet Industry Alliance Secretary-General Wu Hongxiao, academician of Chinese Academy of Engineering, China Mobile Internet Development Index Expert Group chief Ni, and vice dean of New Media Institute of Peking University Liu De at the press conference to share their views on large data, This paper points out many challenges of large data application from different angles.
Wu Hongxiao:
At present, a lot of single database is already very large, has surpassed several PB of scale, but at the same time the scale of data is bigger, the processing difficulty is also more and more big. There are two challenges:
The source problem of
data. In the area of data collection, it is necessary to quweicunzhen and contrast the historical data to verify the credibility of the data by marking the time and space of the information on the network and the data source organization. More importantly, it is difficult to form a data pool of sufficient value with one's data. Problems with effective data mining tools. Data processing involves a lot of parameters, its complexity is not limited to the data sample itself, but also the diversity of its sources, the interaction between the entity and space, these are difficult to use traditional statistical methods or mining methods to describe and reading. Finally, our data mining to ensure that the results of the visual rendering, the more intuitive results, the value of data processing is more convenient to use.
Ni:
Just beginning to provide data on the interconnection industry, the operating system, models, brand operators, app and other six aspects, may later add geographical location, coupled with real-time and so on. 1 billion of people's behavior can be real-time monitoring, the potential is very large, but also bring very complex problems, including information security, personal privacy, regulatory improvement. Data to achieve the interconnection of information, sharing shared, but for commercial interests, many units are unwilling to take out the data, the need to coordinate with the relevant departments, so that we are more willing to share information, not only to take care of national interests, but also protect the interests of the unit. Also need to improve the quality of data and analysis tools. For example, the law of discovering the relevance of data can create many new values.
Liu De:
There are a number of big problems with the entire Chinese cloud computing and the big data development process:
publishes more data and uses less, because the data has too much PR. Islands of data. Redundant construction of the cloud infrastructure. Like Shari, the entire cloud storage platform needs to be redesigned on a national strategy, because only useful and valuable mines are the ones we collect. Data analysis capabilities. Now data mining is similar to ERP, CRM and other data display, but the real analysis is very weak, should be in the infancy state (for example, now for the site, including domain name resolution, etc., the method also faces a lot of deficiencies). And we are now in the world compared to the level of developed countries, the gap is widening, not shrinking.
TalkingData said: Large data analysis 2014 mobile internet
For a long time, the distributed computing architecture, the massive data processing and the mining algorithm are used to analyze the data of mobile Internet users from multiple dimensions by using a TalkingData statistical analysis platform of the 1 billion intelligent terminals as the data blueprint. Give the results of the whole industry observation.
Taotingchi, director of the TalkingData data platform, explained in detail the 2014 TalkingData Mobile Internet Data Report in the press conference. The report revolves around the "mobile internet Industry Profile", "Mobile internet User Behavior", "Mobile application Overall inventory", "Mobile internet users offline consumption habits" and other topics, multidimensional analysis of the 2014 years of China Mobile industry overall development situation. (If you are interested in this report, you can download it by visiting the TalkingData official website for free)
The report shows that the mobile internet has passed the "embryonic period" of demand focused on communication and social networking, and the "initial development period" represented by shopping and entertainment, into the "high-speed development period", this period, travel, medical, education, catering and other areas of life closely related to the application of the subdivision has emerged, Diversified life services for users to bring great convenience, online and offline linkage (O2O) into a trend. The continuous emergence of typical applications, but also let the O2O industry ushered in a double upsurge of user growth and capital market financing, mobile end of the consumer closed loop is gradually forming.
Dialog talking data: Technology, DataSource and neutrality are key
After the press conference, TalkingData COO Xu Yi, vice president of TalkingData products Shanhui, TalkingData pre-sales director Damien and TalkingData data Platform Department director Taotingchi received an interview with CSDN reporter, More in-depth analysis of TalkingData's large data practice, how to solve the aforementioned experts mentioned problems, as well as talkingdata technology for enterprises and developers to bring what.
TalkingData that the most important thing about large data is to use the results of the analysis for the entire enterprise Operation services. However, the data of a single enterprise is not enough to reflect the dynamics of the whole industry, and we need to get the best decision based on the analysis of the data of the industry as a whole-this is consistent with the idea that Schoenberg "is not random data, but all data". Not all of the data are reliable, TalkingData also stressed that at present only a neutral Third-party platform to ensure the reference value of data analysis results. Of course, to achieve a neutral analysis of the entire industry data, its basis or to have a reliable large data technology platform, can accommodate the collection of data and diversity of the entire industry, there is a unified interface of data interconnection, but also to ensure the real-time analysis and effectiveness.
Q: How to understand the Chinese big data application is still in the initial stage?
A: Many enterprises do not have the means to use the stored data to the usual sales optimization, service optimization, because they do not have such analytical ability. The key point in using big data is how to turn the business into a big data driver. From this point of view, whether in the traditional industry, even in the internet industry, the big data-driven business this thing is still in its infancy.
For example, many games are run by specific people who are responsible for making decisions, rather than using large data collected to find some regularity to make operations smoother-such as effectively anticipating and saving users who may be lost, and keeping the game revenue. The amount of data is not necessarily large, but the model behind the loss is based on a number of games, including his own game, including other games, in many games we calculate a model, the model used to return to the game, the game to guide the operation.
TalkingData in this way, the first is to do some industry standard things, it is easy to isolate all the data to get through, or summarized in a DMP data management platform, for the enterprise, first to help him establish the first party data, so that he can manage the data, to make a portrait of users. Second, his data could be made available to third party data, and he could use many third party data.
So, the scope of application of large data may be more simple than our original understanding I have a lot of data how to deal with it, more advanced than this, more now will produce in different fields across the boundaries of some data applications.
Q: What are the unique techniques of talkingdata technology?
A: From the data analysis, mining point of view, we are using the OLAP model, according to the index dimension to save, and then repeated cutting, slicing, extraction, this technical model is basically similar to everyone. We have also adopted open source technologies such as Hadoop, Hive, Storm, and Spark. But we have a little technical contribution in the big data industry, because we have our own computing engine and recommended algorithms, and some mining algorithms, as well as our storage model, there are some open source system, we have a code called the system, such systems are often with the market to do the technical framework of the system to do exchanges and learn. For example, there is a system open source called Kirin system, are large data operations. We cannot say that we are in the lead, but we are always willing to share our knowledge in this piece.
Q: What convenience can we offer to business developers?
A: All business is to solve two problems: how to get the guests, and how to operate them. Based on this, we use a wide-industry data platform to help enterprises find customers. The second aspect, we provide a set of large data based on the analysis to the operation of closed-loop products, support enterprises according to customer characteristics to operate customers, to obtain better benefits. This mainly includes:
Provide all operational reports. Portrait of the user in the industry to help companies find the best partners and potential customers. Provide direct operational tools to enable enterprises to separate potential customers and provide targeted operational strategy recommendations to achieve better transformation.
Q: Many large data platforms also provide user portrait technology?
A: Traditionally do user portraits have a suspicion of fraud, such as the user's gender, age, the provinces, the reference is not significant. Our user portrait is more interested in actual execution, and for mobile products, which apps we like to use, we can analyze which users ' interests are more fitting with our positioning. It's not a very good idea to know which shopping malls and commodities they like to visit and what cities they are living in.
Q: Do we provide locally deployed solutions or collect and analyze data through a pure SaaS platform and feedback results?
A: We have two types of customers, such as China Merchants Bank, such as large customers, privacy requirements are relatively high, it is necessary to deploy a whole-dimensional first party DMP platform to the bank, all the data can only be entered into, the public non-commercial data can be directly from our official website to obtain reports; for the second category of customers, Data can be uploaded directly to our platform.
Take China Merchants Bank (pocket life and mobile banking) as an example:
Our first step is to help them build the infrastructure for moving data, to help them gather, clean, store, and analyze the basic data that they operate on early. With these basic data, they can generate insights they didn't have before, help guide them to optimize their products, and help them find the right channel. The second step is to help them build a data management platform. The definition of a data management platform is to help customers build a system and service that integrates data from various channels and supports them in the context of large data and marketing. With this system, customers can base their own multi-channel third party data on to produce a full view of the user, some of these views can be grouped, combined with his marketing system, such as his micro-mail, SMS call center, delivery system and so on, and then we can help him monitor the marketing effect, To form a closed loop of large data marketing to enhance the ROI of his overall marketing. The third step is to help him get through the first DMP and our third party data. TalkingData is a large third party data center, based on our own accumulation of third-party data, there are many users outside the bank's first party data users, the usual online behavior data, offline behavior data, combined with this data can help customers more in-depth insight to find more customer needs.
Q: The data may have some credibility, some are not credible, the results of the final presentation will have errors?
A: First from the industry perspective, or from the perspective of operational analysis, some large companies have launched such a platform, it is also to help developers to do operational analysis and operational tools, from the one-way to analyze the market, the data are all to help a mobile side of the entrepreneurial team to do his business analysis, The accessibility of such data is certainly not particularly large. But looking at it from an industry-wide perspective may not be the same. For example, Baidu to send a report, it combines not only it from the neutral market data collection, more is from Baidu's search, Baidu's map crawl a variety of data, so its use of neutral data may only be a small piece of it. As an industry data report, it will be more emphasis on Baidu some of it, it may not think so, but its sample itself is biased. Similarly, Tencent is the same, they accounted for a large share in the TOP50 application, he felt that his users have been popularized to a degree, itself he made a report that he considered neutral, the result is also relatively biased his user base, there will be a certain deviation. For example, there is a customer, but also with our statistical analysis of the products, just analyze his personal business, also used Baidu, also used Tencent, the amount of data will be different, of course, because we do is the same life.
Q: How do we solve the problem of data sources?
A: TalkingData is the industry's only a neutral large data platform, we provide the SDK plug-in this service, so we have a more accurate hand data, directly to collect and obtain. In the collection, we are very concerned about the feelings of developers and end users, so we will negotiate a better user license agreement, as well as in the privacy of the case, to get some of the data that people will care about, this is the first aspect. In addition, we will use some data exchange and cooperation in the way to obtain more diverse data. This data is the same as the need for compliance, legal, without infringing on the interests of any partner, you can go to get such data. There is another aspect of the data, we will do a lot of online layout, including to the store information, to cloth Wi-Fi point, collect data under the line, such data can be with online data to get through and do a lot of matching. In short, it will form an industry-wide data network to make it more comprehensive and serious.
Q: How to circumvent data privacy issues?
Answer: Data privacy is the basis for data services companies to settle down, data collection and use must pay attention to the legality of the problem:
explicitly asks the developer to tell the collector in the user agreement what data the TalkingData will collect on what basis, and to ensure the user's right to know. For data interaction with third parties, you also need to audit the other person's data source for legality. Within the talkingdata, there is very strict control over what data to collect, what data to handle, and the internal management process of the data. such as the user's mobile phone number, ID number, etc., talkingdata that such data is insurmountable red line. TalkingData has been certified by ISO2701, and the data stored by the company is well protected. Processing of data reports. TalkingData gives industry data, such as the percentage of men and women as a whole, but does not refer to a person's detailed data.