1. Preface
We know that sorting is a very core module in many application scenarios, and the most direct application is the search engine. When a user submits a query, the search engine will recall many documents and then sort the documents according to the document, query, and user relevance, how these documents are sorted directly determines the user experience of the search engine. Other important application scenarios include online advertising, collaborative filtering, and multimedia retrieval.
Lambdamart is an algorithm of learning to rank and is suitable for many sorting scenarios. It is the result of Microsoft's Chris burges. It has been very popular in recent years and has repeatedly appeared in various machine learning competitions, Yahoo! This model is used by the winning teams in the learning to rank challenge competition [1]. It is said that Bing and Facebook are also using this model.
This article first briefly introduces the components of the lambdamart model, then introduces several other models related to the model: ranknet and lambdarank, and then focuses on the principle of lambdamart, next, we will introduce lambdamart's open-source implementation package ranklib, and describe the lambdamart application in the personalized recommendation scenario prompted by the Search drop-down list.
2. symbol description
Before the introduction, describe the meaning of the symbols used in this article:
Symbol |
Description |
Q |
User-submitted query request |
D |
Documents to be sorted |
D |
Document set to be sorted for one recall request |
S |
Document score calculated by Model |
(I, j) |
Document and composition of Ordered Pair |
P |
All document pair Sets |
|
Before |
|
A set of pair Subscripts for each document. |
|
Prediction probability before |
|
The true probability of the previous line. If the actual line is before, it is 1; otherwise, it is 0. |
|
True order relationship with. value range: {0,}: 0 indicates the same correlation. It does not matter who is ranked first. 1 indicates more relevance than the previous one.-1 indicates the opposite, indicates that |
3. lambdamart
The lambdamart model can be divided into lambda and Mart, indicating that the underlying training model uses the Mart (Multiple additive regression tree). If the Mart looks unfamiliar, now we are familiar with gbdt (gradientboosting demo-tree). That's right, Mart is gbdt. Lambda is the gradient used in the mart solution process. Its physical meaning isThe direction (up or down) and intensity of the next iteration of a document to be sorted. Combining mart and lambda is the lambdamart we will introduce.
4. Magic Lambda
Why does lambdamart well apply to sorting scenarios? This mainly benefited from the use of Lambda gradients. Previously, Lambda introduced the significance of quantifying the direction and intensity of a document to be sorted during the next iteration.
However, Lambda was initially not born from lambdamart, but was proposed in the lambdarank model, which was improved based on the ranknet model. It can be seen that the relationships between ranknet, lambdarank, and lambdamart are very unusual. It is a mysterious group of friends. Next we will analyze the relationships among the three friends one by one [2].
5. ranknet
Ranknet [3] is a pairwise model, which converts the Sorting Problem Into comparing the sorting probability of a (I, j) pair, that is, comparing the probability at the top. It calculates the score of each document, and then calculates the sort probability of the document Pair Based on the score:
As you can see, this is actually the sigmoid function of Logistic Regression [4]. because it affects the shape of the sigmoid function and has little impact on the final result, it is simplified by default with = 1. Ranknet proves that if you know the probability of sorting between two adjacent documents in the arrangement of the documents to be sorted, the probability of sorting between each document can be calculated through derivation. Therefore, for a document sequence to be sorted, you only need to calculate the sorting probability between adjacent documents, and do not need to calculate all pair to reduce the calculation workload.
Then we use cross entropy [5] As the loss function to measure the fit degree:
This loss function has the following features:
1) When two documents with different correlations calculate the same model score and the loss function value is greater than 0, the pair will still be punished so that their sorting locations are separated.
2) The loss function is a linear function that can effectively reduce the impact of abnormal sample data on the model, so it is robust.
The final goal of ranknet is to train a score function S = f (x: W) to minimize the loss of all pair ranking Probability Estimates:
Ranknet uses a neural network model to optimize the loss function and uses the gradient descent method [6] to solve the problem:
Evaluation Indicators for sorting problems include ndcg [7], err [8], map [9], and MRR [10]. These indicators are not smooth and discontinuous, the gradient cannot be obtained, so the gradient descent method cannot be used directly to solve the problem. The innovation of ranknet is that it does not directly optimize these indicators, but indirectly converts the optimization goal to a probability-based cross entropy loss function that can calculate the gradient. Therefore, this method can be used for any model that uses the gradient descent method to optimize the target function. ranknet uses a neural network model, and other models such as boosting tree can also be used to solve the problem.
6. lambdarank
Figure 1 pairwise error
1. Each line indicates the document, blue indicates the relevant document, and gray indicates the irrelevant document. ranknet calculates the cost in pairwise error mode, and the cost in the left figure is 13, the picture on the right shows that cost is reduced to 11 by downgrading the first relevant document to three positions, and the second document to the last five positions, however, evaluation indicators such as ndcg and err only focus on sorting the top K results. In the optimization process, lowering the positions of the relevant documents is not what we want. Figure 1 The Black arrow on the left of the right shows the direction and intensity of the next round of ranknet, but what we really need is the direction and intensity of the red arrow on the right, that is to say, pay more attention to the improvement of the sorting position of relevant documents in the previous position. Lambdarank [11] evolved based on this idea. Lambda refers to the Red Arrow, representing the direction and intensity of the next iteration optimization, that is, the gradient.
Inspired by lambdanet, lambdarank performs a line-based decomposition as follows:
Where
The above formula
Order
For the pair document, as a result, there are
Therefore, for each document, its Lambda is, that is, the direction and intensity of the next order of each document depends on all other different labels of the same query.
Meanwhile, lambdarank also introduces the evaluation index Z (such as ndcg and err) in Lambda, and regards the change of the evaluation index caused by the location of the two documents as one factor, experiments show that the model effect has been significantly improved:
It can be seen that lambdarank does not solve the Sorting Problem by defining the loss function and then finding the gradient. Instead, it analyzes the physical meaning of the gradient required for the Sorting Problem and directly defines the gradient, the loss function of lambdarank can be reversed:
Compared with ranknet, lambdarank has a higher training speed after factorization, and considers evaluation indicators to directly solve the problem, which is more effective.
7. lambdamart
Lambdarank redefined the gradient and assigned a new physical significance to the gradient. Therefore, all models that can be solved using the gradient descent method can use this gradient. Mart is one of them, the combination of gradient Lambda and Mart is the well-known lambdamart [12].
The principle of Mart [13] [14] is to solve the function directly in the function space. The model result is composed of many trees. The fit target of each tree is the gradient of the loss function, lambda is used in lambdamart. The specific algorithm process of lambdamart is as follows:
We can see that the lambdamart framework is actually Mart. The main innovation is that the gradient of intermediate computing uses lambda, Which is pairwise. The parameters that Mart needs to set include: the number of trees m, the number of leaf nodes l, and the learning rate V. These three parameters can be obtained through the verification set adjustment.
Mart supports "Hot Start", that is, you can continue training on the basis of the trained model. At the beginning, you can load it through initialization. The following describes how lambdamart works in each step:
1) The training of each tree first traverses all the training data (pair of different label documents), calculates the metric changes caused by each pair interchange location, and lambda, that is, calculate the lambda of each document, and then calculate the derivative wi for the next Newton step to solve the value of the leaf node.
2) create a regression tree to fit the first step. The standard for dividing Tree nodes is mean square error, and a regression tree with a leaf node count of L is generated.
3) for the regression tree generated in step 2, calculate the value of each leaf node and use Newton step to solve the problem, that is, for the document set that falls into the leaf node, use the formula to calculate the output value of the leaf node.
4) Update the model, add the learned regression tree to the existing model, and perform regularization with the learning rate V (also called shrinkage coefficient.
Lambdamart has many advantages:
1) applicable to sorting scenarios: the sorting problem is solved directly instead of the traditional classification or regression method.
2) loss function export: by converting the loss function, the IRR evaluation indicators that cannot be obtained, such as ndcg, are converted to export functions, which has the practical physical significance of gradients, the mathematical explanation is very beautiful.
3) incremental learning: because each training can continue training on an existing model, it is suitable for incremental learning.
4) combination features: Because tree models are used, different feature combinations can be learned.
5) Feature Selection: because it is based on the mart model, it also has the advantage of Mart. You can learn the importance of each feature and select features.
6) suitable for data with unbalanced Positive and Negative sample proportions: because the model training object has a document pair with different labels, rather than predicting the label of each document, it is not sensitive to the imbalance of positive and negative samples.
8. ranklib open-source toolkit
Ranklib [15] is an open-source Learning torank toolkit that implements many learning to rank algorithm models, including lambdamart. The algorithm implementation process of its source code is roughly as follows:
The data format defined by this Toolkit is as follows:
Label qid: $ ID $ feaid: $ feavalue... # Description
Each row represents a sample, and the qid of the sample in the same query request is the same. label indicates the correlation between the sample and the query request. description indicates the document to be sorted for the sample, used to differentiate different documents.
This toolkit is implemented in Java. It seems inefficient in space usage, but the overall design is still quite good. ranker interfaces are well designed and worth learning.
There are also many other open-source lambdamart implementations. For more information, see [16] [17] [18].
9.Lambdamart Application
Finally, we will introduce the application of lambdamart in a practical scenario. Currently, many search engines have a drop-down prompt function, academic name: QAC (query auto-completion, query auto-completion ), the main function is to output a series of queries that match the prefix of the query entered by the user in the search engine input box for the user to select and reduce the user input, allows users to search more conveniently.
Milad shokouhi [19] found that there are obvious user base preferences in query popularity. For example, when different users input I, young female users tend to search for Instagram, male users tend to search for IMDB, so they can sort the query drop-down prompts individually.
Milad shokouhi uses the lambdamart model as a personalized sorting model, and uses features such as the user's long-term history, short-term history, gender, age, region, and original sorting location of the prompt query, the final effect is improved by 9%, and the effect is very obvious.
Milad shokouhi's work instruction lambdamart can be applied to personalized sorting, and the effect is very good.
10. Summary
Based on some related books, paper and open-source code, this article simply sorts out the ins and outs of lambdamart. A brief summary: Lambda was released in ranknet, upgraded in lambdarank, and carried forward in lambdamart, the mathematical derivation and actual effects of the model are very beautiful. As long as the scenario involved in sorting can be applied, it is the "golden oil" of the sorting scenario ".
References:
[1] Learning to rank using an ensemble oflambda-gradient Models
[2] From ranknet to lambdarank to lambdamart: anoverview
[3] Learning to rank using gradient descent
[4] Wikipedia-sigmoid function
[5] Wikipedia-cross entropy
[6] Wikipedia-Gradient Descent
[7] Wikipedia-ndcg
[8] expected reciprocal rank for graded relevance
[9] Wikipedia-Map
[10] Wikipedia-MRR
[11] Learning to rank with nonsmooth costfunctions
[12] adapting boosting for information retrievalmeasures
[13] greedy function approximation: A gradientboosting Machine
[14] the elements of Statistical Learning
[15] ranklib
[16] jforests
[17] xgboost
[18] GBM
[19] Learning to personalize queryauto-completion
Reprinted please indicate the source, this article is transferred from: Http://blog.csdn.net/huagong_adu/article/details/40710305
Learning to rank lambdamart's past and present