Index
Ad click-through rate estimation is a very important component of the programmatic AD trading framework, and the CTR is estimated to have two levels of indicators:
1. Sort the indicator. The sort indicator is the most basic indicator, it determines whether we have the ability to find the most suitable ads to show to the most appropriate users. This is the basis for monetization, and technically, we measure it with the AUC.
2. Numerical indicators. Numerical indicators are further indicators, is the basis for further optimization of the bidding process, the general DSP comparison fancy this indicator. If we generally underestimate CTR, our bids will be relatively conservative, making the budget less expensive or too slow, and if we are generally overvalued on CTR, our bids will be relatively aggressive, leading to CPC being too high. Technically, we have a Facebook ne (normalized Entropy) that can also be used in OE (observation over expectation).
Framework
Industry uses more than the LR-based CTR Estimation strategy, I think one of the important reasons is the explanatory, when the worse case when the simpler the model the better the debug, the more can be explained, the more targeted to this bad case to improve. But despite this, I have seen the advertising of algorithmic engineers, and few have made use of this benefit of LR to make model improvements, regrets ..... Recently DNN very hot, Baidu announced that DNN do CTR estimate compared to LR produced 20% benefit, I do not know the comparison of benchmark, but on the mechanism said DNN than the original manual feature engineering of LR 20% higher, I'm not surprised at all. But I don't think DNN has any advantage over the LR that now generates an automatic high-order feature that adds FM and GBDT. After all, DNN with linear combinations + nonlinear functions (tanh/sigmoid etc). To do high-order feature generation, GBDT + FM uses trees and FM to do high-order feature generation, and the last layer is a nonlinear transformation. From the scene, it may be better to dnn this higher-order feature on the application of quasi-organisms (such as visual, auditory), and in the context of advertising, I prefer the GBDT + FM approach.
The framework of the entire CTR estimate module contains the logic of the Exploit/explore.
The block diagram of simple click-through method is as follows;
Step-by-Step
1. Data Discovery (exploration)
Mainly the basic features (raw feature/fundamental feature) coarse sieve and regular.
The display ad scene can be described as "in a certain scene , through a certain media to show an ad to a user ", so the basic features in these four areas of the search:
Scene – When and where, when, what device to use, what browser, etc.
Advertising – including advertiser features, the characteristics of the ad itself such as campaign, creativity, type, redirection, etc.
Media – Features of the media (web, app, etc.), characteristics of ad positions, etc.
User – includes user portrait, user browsing history, etc.
There are several ways to select a single feature:
1. Simple statistical methods, the coverage and balance of statistical features, the characteristics of the dominant value phenomenon, to selectively discard the feature or to merge some set of values to a new value, so as to achieve the goal of balance.
2. Feature selection indicators, feature selection is mainly two purposes, one is to remove the characteristics of redundancy, that is, the features may be mutually redundant; the second is to go useless, some characteristics of the CTR estimate this task contribution is very small or not, for this kind of feature choice, to do small, ning insufficient but not excessive, Because of the small contribution of the single feature to the task, it is possible to create very effective combination features with other features when the composite feature is generated later, so it cannot be too much.
A) go redundant. It is mainly the correlation between features, such as Pearson correlations, or exponential regression (it can simulate higher-order polynomial features from the angle of the Taylor theorem).
b) to go useless. Is mainly the information gain ratio.
2. feature combination
Two-faction method:
FM Series- for categorical feature, they are generally encode into one hot form, and feature combinations are suitable for use with FM.
Tree Series- for numerical feature and ordinal feature, feature combinations can be used in decision tree classes, generally with random forest or GBDT. The effect of GBDT should be better, because the boosting method will continuously enhance the ability to differentiate the wrong sample.
For ad click-through estimates, there are three types of features at the same time. So a simple way to do this is to cascade the two methods to better feature combinations.
3. LR
A. owl-qn
This is the method of batch training, which is mainly used to deal with LR optimization under L1.
B. Online Learning (Ftrl and Facebook enhancement)
Online learning, timely feedback Click information, and constantly evolve the LR model, thus faster convergence for new ads.
4. Do you expect CTR to be credible?
Any feature vector input to this CTR prediction algorithm, the algorithm will smarty pants to you output a forecast Ctr. But is this CTR really credible? We know that machine learning is a typical data driven, and when there is a situation in the training data that is insufficient, the predicted value in this case is likely to be skewed by other data. So, there must be a situation where the predicted value is not credible, so how do we judge the credibility of the current forecast Ctr?
Google's proposed Ftrl algorithm, along with a method of predicting CTR credibility, is simple: the more training data, the higher the confidence. The formula refers to the number of training vectors in the feature of the first I-dimensional training set. There are many ways to normalization [0, 1], which need to be finalized based on the amount of total business data and the prior Ctr.
5. tinkering
The latter thing is based on the aforementioned framework, tinkering with the bad case. For example, now think in different click-through section, the impact of the characteristics of click-through rate of the weight is consistent, but the actual discovery is not the same, you can follow the range of click-through rate division, do partition between the model (said Ali used MLR is this east). These are not out of the frame, is the analysis of the data after the refinement, not to escape the "segmented approximation" this big circle.
CTR Estimation in programmatic AD trading