Thank you just now Gross, He, Chen always from different levels, for you to explain the concept of cloud computing and the future direction. Now share my idea of cloud computing with you.
In addition to Issa and PSSA and Sssa three cloud computing models, in fact, cloud computing is also divided into public and private cloud, in the domestic enterprises, private cloud applications More, domestic enterprises are from the data center to the private cloud, and then from the private cloud to the public cloud, such as Intel, million network, etc. That is, the service of the basic provider is constantly improving and innovation is inseparable.
What I share with you is that for companies like us, why do we need a cloud computing platform? And now from the data center to the private cloud conversion, private cloud to the public cloud conversion, what will go through the process, is the details of the introduction.
First I introduce our company, leisurely exchange in China is a relatively leading audience network and digital marketing companies, this noun is very professional, with our general words, we are a "sell people" company, for Chinese internet advertising, most companies are "selling media", according to the properties of the media to pave the advertising , for easy, we want to do is completely different, we want to vinyl what everyone's interest is, what kind of ads interested in what content interested in the Internet, what behavior, for easy, this is the core value. The company, which was founded in 2007, is now the size of 500 people, presumably.
I will introduce you at three levels, the first level, is our user behavior orientation concept, from the concept of enterprises like ours, cloud computing and the needs of large platforms, the second level, the current state of the private cloud, as well as the display of data accumulation and expansion of the application.
As you know, when it comes to internet advertising, in fact, the orientation has been said for many years, the first user behavior orientation is actually Yahoo put forward, that is, Yahoo sales there very early heard, and earlier concepts, user behavior and the concept of the internet produced at the same time, why have no one to do, Or in recent years can have rapid development, and large computing platform is inseparable.
Traditional internet advertising in the launch, we distinguish from media advertising, we can do geographical orientation, time orientation, client environment orientation, as well as the orientation of the site and other basic data, including simple keyword orientation and so on, but these orientations do not vinyl to people, we just directed to Beijing, But Beijing may have more than 17 million of netizens, each of them is interested in what, in fact, we do not know, and now the market demand, with the brand advertisers and the effect of the advertisers, the Internet to launch a rapid growth in the size of the data requirements and crowd selection requirements, there are higher standards. So we need more precise directional means.
Then the trend of competitive advertising, such as we vinyl a lot of people, we need to do a classification of the population to do the launch, this classification is actually a very scary number, according to the current statistics, The number of active netizens in China is between 4.5 and 450 million a day, which is likely to be larger than that. But these netizens will produce huge amounts of data on the internet every day, what web pages they look at, what products they buy, what they consume, what they play, what clients they use, and even what kind of behavior they have on advertising, which produces very A lot of data, this data is in the previous computing platform business model unthinkable, we want to do the competitive advertising orientation is based on these data, is why we need a very large storage, and can be a huge amount of data platform for computing.
The characteristics of user behavior orientation, in fact, can analyze the behavior of every Internet audience, judge their interest and psychological expectations, and targeted advertising, in addition to the massive data storage and analysis, we also require the real-time service, is every user to see the page, this page wants to play ads when , it must be real-time from the background to billion-level library real-time extract attributes out, and to advertise the match, need to be very real-time, with large options to support the platform, then this is the second aspect of the computing platform requirements.
The advantage of user behavior orientation has different advantages for different users. For advertisers, you can put advertisers to really interested people, for agents, can reduce the ineffective exposure, save agents of media resources, for the media, can promote the media value, because the targeted advertising can sell more expensive, And can improve the user experience of the media, in the media can always see interested in the ads, not with the interests of completely inconsistent with even objectionable ads, to the media users, is also a good thing.
For the audience, because advertising is generally forced behavior, the audience accept their favorite content of advertising, than accept their own objectionable advertising content, there will be a better feeling. This is easy multi-dimensional three-dimensional concept, I briefly introduce.
First, we will judge each user to browse the interest keyword, and their buying behavior of E-commerce, as well as their interactive behavior of advertising, the data through modeling to form the interest of each user, we divide the user into 22 categories, 230 small classes, for user analysis, each user on each of the attributes of interest, are calculated from a series of surrounding data, and there is a very complex mathematical model in it.
These points, in fact, illustrate that We actually have to deal with is based on the Internet 460 million netizens, each user of a very wide range of information collection, we take TB data every day and analyze the data to compute and categorize users, not a small group of computers, or an application that a single computing center satisfies, so we build a private cloud pattern. In fact, user behavior analysis orientation, technical difficulties exist in several aspects.
The first is the analysis of user behavior data, for easy interoperability, we and a lot of media will have cooperation, including four major portals, including the vertical industry site each TOP20 site, we will collect very rich media flow resources, each media flow resources, can bring us every cookie, is to browse what kind of Web page data, which is a data base. Second, there's a huge amount of data storage, because first of all, the first problem to be solved is the storage problem, more than a single server or disk array, or storage mode can solve the category, such as our one months of data may be more than 20TB, the data is very large. Third, after we get the data, we solve the storage problem, we also have to continue to analyze the data, correction, modeling, classification, and validation of the results, it requires a very large computing platform, to be able to do very real-time data processing, and in accordance with our needs, the calculation of the mode of change.
After the above calculation and storage problems are resolved, next we want to calculate the data to do application verification, this requires us to combine business unit, business model, and judge the performance of data in real advertising applications, it involves just mentioned real-time for advertising applications to provide services computing capacity.
For four of the difficulties, we introduce the method of separate solution, the first method, is the user collection principle, first we will collect each user different page browsing behavior, the user to the advertisement interaction behavior, because we are casts the household media advertisement, to what advertisement complete playback, will have the replay, and has the continuous click Behavior, We calculate the interactive behavior of advertising to judge interest. Third, is the interactive behavior of E-commerce, we have a lot of E-commerce partners, they formed in E-commerce site what kind of purchase, what products to buy, what is the consumption capacity, three aspects of the data will form the user behavior of the data body. Through our user Behavior Analysis module to form easy to Exchange user Property library, is our valuable part, through the real-time directional API for advertising services to provide support.
In this system, we also have a set of basic support system, is our load search system, because of the user browsing behavior analysis, we are based on the text of the page to do analysis, if you get this data, and the application of computing, we have a load of the search system in the background to do support, More similar to Baidu or Google's search engine. It is also a very high demand for real-time, and requires massive storage and massive data computing platform.
This is our load search system schematic, we will crawl all the content of our cooperative media, and parse out their main body area, ultimately form the results of analysis, there is our interactive area, and the user's browsing behavior combined with the search database for data analysis.
At present, we have included more than 2 billion page information, next I will introduce the user data information. To solve the above problems, we built this architecture, first build our own distributed storage system, is the HDFS system, Google, Baidu, including large private cloud companies, including Taobao, they will build their own storage systems, we use HDFS, this is Kaiyuan, and Yahoo, Facebook uses the same thing. At the same time, we build the map platform based on HDFs, is the bottom of the details of the technology, I will briefly introduce. The concept of Map Redios, in fact, it is the equivalent of distributing a computing task on different computers, the same whole cluster serving a task, this is the concept of the private Cloud Computing Foundation, or the most basic technical method, this is an example diagram, we distribute different tasks to different machines, After the calculation with different machines, then by a summary of the scheduling task, finally summed up the calculation results.
Our own map Redios computation, divided into different business types to do services, including advertising business data analysis, advertising optimization data analysis, as well as user behavior of the mining part of the application, we have undergone hundreds of changes, the first to do is the concept of data center, and later found that the traditional data center concept, There are many drawbacks, such as resource application imbalance, virtualization implementation is not good, and the application of scheduling, expansion is not particularly ideal, we gradually to the private cloud structure to do the conversion.
On this platform, also did some selection work, including low SQL database selection and so on. We have a lot of follow-up research on Low search, like HDFs or the bottom of the traditional open cloud support. This is the architecture diagram of one of our systems and is a very complex pattern of application computing. This year we have upgraded our architecture, more like a cloud computing platform, we use Mai SQL memory cash, and so on, some new technology to join in, so that the cloud computing platform structure will be more complete.
Next I'll introduce the modeling process of our audience behavior, we for each user will be from four dimensions to do modeling, one is the recent browsing the page keyword, has accumulated for a long time, in the user's key words, interested in the interests of the industry, as well as user interest in product categories, From the advertising interaction and the specific behavior of the electronic business site, we will model through the attribute analysis, the user divided into 22 categories, 230 small classes, and advertising services and advertising Analysis Services to do the basis of data support, this is a complete data portrait process. On top of this, we have a specific basic analysis support system, including the SNM analysis system, which is the technology of the user, and the classification algorithm of users ' interest.
And we are also conducting demographic analysis under existing models, and we will continue to add different tags to each user, requiring very scalable computing, or a pattern of application services. We now have a private cloud architecture that has provided advertising orientation services for our businesses, over 400 advertisers, and a 50% to 150% increase in ads that directly reflect the effects of the hits, which we can increase by 280%. This is the application of our data volume accumulation, by the end of June, we can be used for user behavior orientation, active fixed samples, the concept is within one months can be vinyl 10 times, caught more than 10 of the behavior of users, we have stored 260 million, that is, you may be here, your computer may be 50% In our database, we know what you're interested in and we know what kind of pages you've seen.
We collect user access records and other user situations this data entry, has exceeded 20.6 billion, it is the concept of these two data, this is not a simple calculation model can solve the problem, must rely on the concept of cloud computing.
We will be based on this data, for all users to provide audience group attribute analysis reports, what they are interested in, what the media interest, their active time, geography, etc., this report is the only one we can take out in the industry.
The above is the introduction of this piece, in fact, in the next, we have a very simple introduction to the private cloud building, and I hope to work with the major IDC vendors and cloud solutions providers, because our next plan is actually to convert to the public cloud, which hopefully will get everyone's help and attention. , complete the whole process, thank you!
(Speaker: Beijing leisurely Exchange Network advertising company CTO Zhaozheng)
(Responsible editor: admin)