"Csdn Live Report" December 2014 12-14th, sponsored by the China Computer Society (CCF), CCF large data expert committee contractor, the Chinese Academy of Sciences and CSDN jointly co-organized to promote large data research, application and industrial development as the main theme of the 2014 China Data Technology Conference (big Data Marvell Conference 2014,BDTC 2014) and the second session of the CCF Grand Symposium was opened at Crowne Plaza Hotel, New Yunnan, Beijing.
The speech delivered by Changling, general manager of China Mobile Suzhou Research and Development Center, is "research on large data business model of telecom operators", the communication pipeline is an important source of large data, Changling that there are six business models for large data: Data sales, online data access, cloud computing tools store mode, platform and tool leasing , free large data business, charge advertising fees, operate real-time auction advertising platform, intelligence analysis, enhance machine intelligence, auxiliary human brain. The successful mode of operating large data is: Forward mode, directly bring sales revenue is the most convenient way; back-mode, advertising is the most successful business model of the Internet, indirect mode, reduce operating costs, reduce equipment or artificial input, auxiliary mode, provide technology platform and tools.
Changling, general manager of large data products department, China Mobile Suzhou Research and Development Center
The following is a transcript of the speech:
Changling: Good afternoon, ladies and gentlemen! It is a great honor to have this opportunity to share with you some of my thoughts and some of our recent research and development work at the Suzhou Research and Development Center.
My speech is divided into three parts.
We have done many years of technology and product development, followed by a lot of application solutions, after a long time, found that the big data still did not feel the doorway, or a bit muddled, why? It's not a technical problem. Our team has overcome a lot of technical difficulties, but in the business model, what the value of this data, for whom, in the ecological circle each has its own role this is not divided clearly, my speech will be from the recent project to talk about, triggering such a thinking.
Second, I used to consider the large data business model some may already exist to share with you, of course, is not very systematic, but also a little bit, a scene of a scene.
Finally, talk about how the telecom operator's big data should operate and operate in these scenarios.
I have been tracking large data hotspots, even by 2014, the big data is still a bit of a bubble, we did not think clearly what to do, about 2011 2012 Cloud Computing also ran to the peak, and then quickly to Peak Valley, when there was a situation, we put the level of cloud computing clearly. And now, who do not know why big data, is hype.
As a telecommunications operator, the traditional industry, technology in the change, fixed network communications or 256K now 4G, wireless is not very traditional, telecommunications operators have a large data form, with the view of relaxation, in life science, astronomy, high-energy physics research have their own large data requirements model and form, and for operators , his large data form is pipeline large data, of course, we also know that the large data ERP business big data and these large data scale is bucket, the largest data is in the pipeline data. Pipeline data is of low value, it provides user behavior data, the user habits including social network behavior trajectory, preferences and so on characteristics, user behavior strung up, but there is nothing to say like what, the user really like what and today like what, later like what to have serialization of the characterization of the attribute comb clear.
We know that in the wireless communication, the main pipeline to run the signaling data and LTE signaling, the service is not satisfied with the call, multimedia recording, also will analyze, you call this phone is not very opinion, customer service How to answer, to improve the source of business evaluation.
We are thinking clearly, can use several uses, the network optimizes, the accurate marketing and the business innovation, this is the big data in the telecommunication inside uses four domains, even then thinks, the data and four domain function exactly is what kind of relationship has not completely combed clearly.
And then we'll talk about a project, not a big project, about the collection of signaling data, 6 GB, a second there are a lot of data to be associated with the enterprise and enterprise customer data, to provide him with the basis for decision support, but this looks like, the project to get a look can be done, does not look like a great project, Careful to calculate the cost is a problem, the first in the transmission, if it is 6Gb bandwidth to use the Internet to pick their own home, I checked the broadband costs about 1 million of the cost of storage processing processing to generate processing report what kind of cost? Basically you invest one or two servers 10 million, plus the switch infrastructure, consider the operation of the maintenance of electricity, labor costs one months, 100,000 is not much, you have to give customers an analysis of the report, the initial stage, you have to start a report to spend a person analysis of the same level but the customer on a few, a customer 10,000 dollars, Two months output a report two months 200,000 dollars, 500,000 dollars a year, not worth it, do this thing, you get the income can not meet the cost.
Traditional technology can not solve some data processing analysis problems, these problems trigger thinking there is a paradox, say there is value, the value of the premise you can find out the value, put a lot of resources to find out, this input by WHO to pay, customers as long as 10,000 dollars report, you want 10,000 dollars report to spend 10 million to come out, you are not worth the candle.
How to get large data efficiency is also a big problem, to some extent, is not a hot potato, you invest very much, whether you stay him or not to leave him? Then think about why you can make big data or claim to be big data, see how others do big data, with large data to make money, so this business model simply said, I produce what how to meet customer demand, how to generate a cash flow, and then how to let the cash flow is to if I live, rather than fry a bubble, It's not our style to sell a small company.
First of all, data sales model, in the data sales on the basis of a more detailed data access API, based on two ways, Wired online, also includes data to play a package to sell to you or to provide you with API,API need certification according to the quantity of quality, size, access times, this is very intuitive feeling, I put the data after the completion of the synthesis of the letter, about 300, 4 million of the appearance, to users, more than 10 million cheaper. It seems that this way is simpler. Foreign telecom operators or fixed network operators have a history of sales data, you like some companies Hitwise, purchase data generation report, sell to the relevant enterprise users, industry users and then do a consulting method. In addition, we also know that Sina Weibo, micro-blog to do a relatively large, they voted to sell a full amount of micro-blog Access data, they call authorized use, not advertising revenue is about more than 20 million U.S. dollars, not including micro-blog value-added services, there is a small company why notice this matter, there is a small company called Effyis, This company it 2013 years data sales revenue, Sina Weibo full amount of data, can help user analysis to find some of my complaints, users of my products complaints, from Weibo is what, Twitter is what, from other social network is what kind of sales of 7.5 million of dollars, by a Japanese company more than 20 million dollars bought. Micro-Blog released data, monthly active users, the monthly publication of micro-blog, and monthly active users. A data sales market, I set up a sales trading platform, trading platform is the consumer or Consumer-to-consumer, is the data hall, they have consumer-to-consumer business model, upload data, users can upload font pricing, some processed micro-blog data, Big v Weibo information, a count of 1 to 10 MB.
Some small companies juhe.cn, we said that data access, including base station location, tell you the latitude and longitude in what position, you can tell him your car number, your violation of the situation, each query value of 0.1 cents to 1 cents around the appearance of money.
In addition now more than the underground data black market, we have met, the price is very expensive to sell, there are the bank has the data of the hacker crawl, the previous data leaks are related to them.
First, the provision of data sales services, to provide a variety of use under the line offline.
Second, a bit like cloud computing supply store, provide platform tools rental, do not own, you use his platform calculate, calculate you take away. Technology storage, machine query, data mart rental to customers, on demand to pay rent, this it has a feature, the rent is more transparent is a write on the internet price, according to the hour, in accordance with the length of the GB charge.
Now do the largest Amazon database, data collection export import, data storage has a variety of forms, there are hot and cold and real-time online, there are p processing, as well as data Warehouse UC2, such a price everyone online search is very transparent marked very clear.
And Google's bigquery, sent a paper Gmail, basically to one thousand or two thousand containers. Data storage one-month GB on 2.6 cents, processing data including the flow includes a variety of ways, you do a query or do an analysis, a TB charge of five dollars, can make money I do not understand, unless the resource scheduling is very good, using free time to calculate, can earn money.
There are gooddata, Data Mart Dashboard, the beginning of the offer, I checked the 2010 1gb500 U.S. dollars, 50GB is 5000 U.S. dollars, than the simple storage of rent and calculate a lot more expensive.
DataHero, instead of using your own data to get your data from anywhere on the web, you can store it from Google Cloud, or from Amazon, and visualize data, data service patterns, and small data about 10 trillion B is 49 to 59 dollars a month.
and software cloud, I do a variety of software, we can know some open source or some commercial software, deployed in the work cloud, with the Amazon, Tableau, Qlik are this mode, the second mode tool store mode, rent more transparent, the scale effect to increase their own revenue.
There are free large data business, charge advertising fees. Service providers to provide free large data services, for example, search engine, encyclopedia, library, map, music, a variety of search, attached to the back of the latter charges advertising, advertising and advertising agent to pay advertising costs, around the advertising business to extend the rankings, the audience monitoring such services, are attached to the large data service value-added advertising services, The biggest ad is Google, which has a wide variety of searches, music, maps, and more than 1 trillion pages indexed in 2010. 2014 Google Q3 revenue of 2.188 billion U.S. dollars, behind there are RTB advertising mode, who has a large flow of people can survive.
Baidu Series is also similar, there are a variety of search, there are libraries, bar, map, but also 21.88 U.S. dollars. Mainly through thousands of charges, but there are differences, video and site is not the same. In addition, advertising related business, monitoring the effect of advertising, you have done after the ads always have someone to pay for advertising, buy a single person said I am not worth advertising, someone to provide monitoring this is also large data behavior, before helping them do some things. Finally let the user know, China has China's Neilson, Amazon's Alexa also did, the income is not low.
Operating real-time auction advertising platform, the use of user behavior data to build DSP five, there will be an independent Third-party DSP platform, with the ads related, DNP provide user matching and user (cool card) user matching platform, and SSP portal part of the platform with the advertising trading platforms to operate, to provide a RTB service, Claims that more accurate than traditional ads, advertisers ads, the use of RTB will produce more accurate matching, traditional advertising attributes, and then perhaps some other data from other sites not enough, not so accurate, marketing slogan is I more accurate. 2016 14 billion dollars, all year.
Doubleclickad Exchange and Yahoo are the two largest trading platforms, accounting for US RTB traffic 90%
Ali Tanx, they have a feature of user behavior data, search these people interested in what, Ali is an electric dealer, the above browse search shopping behavior combined in a piece is meaningful.
Billion praise as a DSP, ADE, the price per thousand show more expensive than the front, advertisers to buy less, will not be as blind as before, put, this amount of 20 trillion, the price is similar, the promise is better advertising effect. All sorts of other people with data, including Tencent and Baidu, can think of anything else, as long as you have user behavior data including social data can be built RTB platform to generate transactions.
This data is purchased by cooperating with other people's data, mix together the most critical output a pile of user IDs and user ID corresponding behavior, usually the terminal ID or some other related to the ID of the Apple phone, these IDs, advertisers based on the behavior of this ID and the current user to come up with the request to decide whether to advertise, A user to vote for him to put ads, a lot of ads mainly put, there is a bidding problem, 100 milliseconds can be measured, this is the auction advertising is a more common mode is also the mode of producing value.
The information analysis is more complicated. We commonly known as human flesh search, vitality is very strong. It relies on data collection and analysis to get target objects, including individuals, organizations, industries, countries, a number of different particle size unit dynamics, what happens or what its attributes are, what's behind it, and then with his credit situation attribute then to collect, aggregate, analyze, analyze and then do business support, This has been done for decades, the public opinion analysis, General microblogging, forum, news portal, community of various web site data machines, through natural language methods, emotional analysis of the method, found that this person published what kind of speech, guess this person's attributes, guess his character, we often call the user portrait, depicted as a what kind of person, Make a different label.
In addition, now more is the credit service, but he used the banking sector, with behavioral data, location data, consumer data content, the establishment of personal credit files, of course, the premise is to collect the law, the objective record of credit information, according to the disclosure of information, in the above additional value-added services.
Third, public safety, this situation is similar to the discovery and prediction of public security incidents, this information analysis area is not very transparent, intelligence prices are poor, the Public security bureau to catch fugitives, a reward of 100,000 dollars, a message worth 100,000 yuan, but it is not illegal is a matter worth considering.
The last pattern, the enhancement of machine intelligence to assist the human brain, just like the film to oil pollution innovation or business optimization, the main thing to do is to collect some data for data analysis and mining, looking for data table regularity, and gradually use machine intelligence to assist or gradually replace part of the function of people, which was also discussed with my colleagues, Mobile inside there are some work called spam message processing, spam message is not a spam message, first deal with. The invoice is not good to write, add a space, add numbers, people know that the machine does not know, in order to better communication environment, operators identify these numbers, the number of these numbers, how to identify? Text message is not the same, can not rely on simple number, but as long as the human eye is spam messages, you're going to have to stop it. The final method needs data mining methods to find the similarity, after the similarity, if you are old, a number inside, sent five, the threshold is no problem, over the threshold, According to the previous call behavior or never call, found that you have a problem with this number, stop this number, if there is no machine to do these things, so artificial to do, mobile put a lot of customer service this kind of thing, the cost is very big, use machine to identify the words will improve the user experience.
To give a few more examples, Google's unmanned vehicles travel more than 300,000 kilometers, there has been no accident, through a variety of camera methods monitoring, like the car network provides for road recognition, the surrounding objects of the landscape after the collection also to identify integrated together, including speed, speed limit assessment together to let the car open up no problem, China has a car business to do this thing, say better than Google.
There is Google's brain, simulation of learning, the final effect, can be recognized on YouTube face, this does not have much meaning, with 16,000 of computers, this cost when high ah, do not earn money.
In addition, there are evernote-assisted thinking, according to the user recorded in the notes of the user's way of thinking, people write notes to identify the way of thinking, according to the way you recommend relevant content to you, are writing the content recommended to you, simplifying your writing notes this matter. This is the use of some artificial intelligence, combined with large data analysis methods to make the business better, business easier This is the future of large data, but the benefits are not very clear.
The following just talked about the summary, there is a direct sale of data, the most convenient way, the data to play a package, ETL processing, the package sold to the simplest. For social networking sites, it would be good to disclose data without dealing with it.
Second, the latter mode, is the most successful internet advertising model, before is now, the future or. Why the Internet relies on limited advertising to live so well, I do not want to understand that the possible advertising model is still very successful.
Third, the introduction mode, reduce operating costs, reduce equipment or artificial input to obtain income method, introduction of the model.
Four, auxiliary mode, I provide technology and platform and tools, but I do not provide, so a bit like a gold bridge to build a fire shovel. Large Data Services can be DSN and application-level services.
Internet companies to build large data platform for a long period of time, Google began to do this thing, by 2010, Gmail release reached a milestone, Facebook spent three or four years to build a large data platform, Linkedin, a large data unit to build spend 6 years.
Looking from the stage, the function looks, does the matter to look, must experience the independent application, the structure adjustment, the data integration, the data platform several stages, the beginning application is the silo type later will integrate it together.
The third part, the thinking of several business models.
First, the question of decision-making. Operators are source data acquisition, combined with data to provide convergence services, combined with the ability to provide external data services, combined with application development capabilities can become data application providers, the following pipeline collection data, rapid integration after the provision of data services, of course, operators will also do some tools, development, consulting, Provide this service to the scene.
The first model, as a large data collectors do Daas services, all want the data of telecom operators, data processing, collection, analysis to provide one or batch access API, pricing model, according to data size, value frequency charges. Shortcomings, the sale of raw materials, not sufficient cultural value, there is a greater risk of privacy disclosure.
Second, relying on mobile cloud platform, provide PAAs tool services, data not know how to analyze, managed user data, provide computing power, provide data processing tools, pricing model, according to the cloud computing model fees, shortcomings do not use large data to increase the value of cloud computing, is a cabbage price, advantages, do not need to use its own data, No data risk.
SaaS mode too much, I give an example, the construction of data management platform for example to provide Internet advertising services, service object is to provide services to DSP, provide a large number of cookie data, provide DSP bidding, pricing model according to the data scale value frequency charges, the shortcomings have a great privacy leakage risk. Advantages, the cookie library rich, at the same time with IDFA, Iemi and other inventory, can provide accurate location services, the cost is relatively low.
Finally, the advertising page Welcome to China Mobile Suzhou Research and Development Center, thank you!
More highlights, please pay attention to the live topic 2014 China Large Data Technology Congress (BDTC), Sina Weibo @csdn cloud computing, subscribe to CSDN large data micro-signal.