The main content comes from Facebook's paper: Practical Lessons from predicting Clicks on Ads at Facebook 1, basic ideas use GBDT to generate new features based on user feature conversions, Each leaf of each tree acts as a feature and then takes these characteristics into LR. For example: (1) Training GBDT Tree: We now have a sample of M, a total of 6,000 tags, the samples are used to train GBDT, 10 trees, 100 leaves per tree (of course, the number of leaves of each tree may be different), a total of 1000 leaves. The 1000 leaves above will be used as a feature. (2) using the GBDT Tree Conversion feature: or the above M-sample, we will take the M-sample into the 10 trees generated above, the samples in each tree will be classified to one of the leaves, the leaf corresponding to the position of 1, the remaining 0. This will give us the characteristics we have converted. (3) Use the conversion feature to train LR: Use the above converted feature as a sample feature for training LR to get the final model. (4) When a new sample needs to be predicted, use step 2 to generate the transformation feature descendants into the 3rd step of the generated model, to obtain the final prediction results. Attached to a Facebook chart:
2, the problem (1) After using GBDT extraction features, these characteristics should be fixed, that is, the first day GBDT fixed, and no longer training, just use this tree to generate characteristics according to the sample. But our characteristics change, and if a feature is added, this feature will not be added to the model because the GBDT tree no longer changes. The only way to do this is to retrain the GBDT tree, but in this case the LR has to be trained all the way from the initial data, because the characteristics of the conversion are changed. Solution: The original features continue to enter the GBDT, and the new features begin training as a feature. Then retrain the GBDT every once in a while and re-enter LR all over again.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.