Notes Learning to rank algorithm introduction: RANKSVM and IR SVM

Source: Internet
Author: User
Tags svm

Previous blog: http://www.cnblogs.com/bentuwuying/p/6681943.html briefly introduced the basic principles of learning to rank, and also talked about learning to Rank's several common methods: Pointwise,pairwise,listwise. This blog is an introduction to the pairwise methods commonly used by many companies in practice, and first we introduce relatively simple RANKSVM and IR SVM.

1. RANKSVM

The basic idea of RANKSVM is to transform the sorting problem into a classification problem of pairwise, and then use the SVM classification model to learn and solve.

1.1 Sorting problems into classification problems

For a Query-doc pair, we can use a feature vector to represent it: X. With the sort function f (x), we decide which Doc is in front and which Doc is behind, based on the size of f (x). That is, if F (xi) > F (XJ), then XI should be in front of XJ and vice versa. Can be expressed in the following formula:

Theoretically, f (x) can be any function, and for the sake of simplicity, we assume that it is a linear function:.

If this sort function f (x) is a linear function, then we can turn a sort problem into a two-tuple problem. The reasons are as follows:

First, for any two feature Vector XI and XJ, the following relationships are present under the premise that f (x) is a linear function:

You can then consider the binary classification problem for the difference vectors of the XI and XJ. In particular, we can assign a label to it:

1.2 SVM model to solve sorting problems

After the sorting problem is transformed into a classification problem, we can use the common classification model to learn, here we choose the linear SVM, similarly, can be extended by the kernel function method to nonlinear SVM.

As shown in the left-hand image below, there are two sets of query and their corresponding recall documents, with the degree of relevance of documents divided into three files. The weight vector w corresponds to the sorting function and can be graded and sorted on the Query-doc pair.

The right image below shows how to turn the sorting problem into a classification problem. In the same group (under the same query) the different degree of correlation of Doc's feature vector can be combined to form a new feature vector:x1-x2,x1-x3,x2-x3. Similarly, the label is re-assigned, such as the label of the x1-x2,x1-x3,x2-x3 feature vector, which is the positive label of the assigned ingredient class problem. Further, in order to form a standard classification problem, we also need to have negative samples, here we use several new positive feature vectors of the inverse direction vector as the corresponding negative samples:x2-x1, X3-x1,x3-x2. Also, it should be noted that when we combine the formation of a new feature vector, we cannot use the two feature vectors that are at the same similarity level in the original sorting problem, or two feature vectors under different query.

The solving process of 1.2 SVM model

After the conversion to the classification problem, we can use the general method of SVM to solve. First we can get the following optimization problems:

By bringing the constraints into the relaxation variables of the original optimization problem, we can further transform to the unconstrained optimization problem:

The first item of sums represents the hinge loss, and the second represents the regular term. Primal QP problem is more difficult to solve, if the use of a common QP solution is time-consuming and laborious, we can convert it to dual problem, get a form that is easy to solve:

The sorting function can be represented by the corresponding parameters after the final Solution:

Thus, the steps of the RANKSVM method to solve the sorting problem are summed up as shown in the following:

2. Modification of IR SVM2.1 loss function

The basic idea of the RANKSVM described above is to transform the sorting problem into a pairwise classification problem, then use the SVM classification model to learn and solve. So it is used in the learning process to use the 0-1 classification loss function (although it is actually used to replace the loss function hinge loss). and the optimization target of this loss function is with the evaluation Common index of information retrieval (not only the relative order relation between each doc is correct, but also the order relationship between the top doc) or gap exists. Therefore, the researchers have studied this, through the transformation of the loss function in RANKSVM so that the optimization goal is better consistent with the common evaluation Index of information retrieval problem.

First, let's illustrate some of the problems that RANKSVM encountered when applying to text sorting, as shown in the example.

The first problem is that using RANKSVM directly will treat doc of different similarity levels as if it were not differentiated. There are two different forms of this in the specific question:

1) The 3 pairs of Example 1, 3 vs 2 and 1 vs two are equally regarded in 0-1 loss function, that is, the reversal of the order of any of them is the same as the increment of the loss function. This is obviously unreasonable, since the reversal of the order of 3 vs 1 is obviously more serious than the reversal of the order of 3 vs 2, which needs to be differentiated by different weights.

2) Example 2, ranking-1 is position 1 vs position 2 of two Doc's position upside down, ranking-2 is position 3 vs position 4 The location of the two doc is reversed, both cases in 0-1 The loss function is equally regarded. This is obviously not reasonable, because the IR problem in the top doc is particularly important, ranking-1 problem is more serious than the ranking-2 problem, but also need to give different weights to distinguish.

The second problem is that RANKSVM does not differentiate between doc pair under different query. The number of doc under different query is quite dissimilar. As shown in Example 3, Query-4 's Doc bibliography is more, so during the training process, the training data of each doc pair in query-4 is obviously more influenced by the model than the individual doc pair under Query-3, So the model of the final result will be bias.

The IR SVM solves the above two problems by using the cost sensitive classification instead of the 0-1 classification, which is the transformation of the usual hinge loss. Specifically, it gives different loss weight for Doc pairs from different grades, or doc pair from different query:

1) for top doc, that is, a pair with a higher degree of similarity to the doc, give a larger loss weight.

2) for a small number of doc query, the following doc pair is given a larger loss weight.

The solution process of 2.2 IR SVM

The optimization problem of IR SVM can be expressed as follows:

This represents the loss weight value of instance belonging to the grade pair of the K-file. The determination of this value has an empirical method: for the two doc that belongs to this file grade pair, randomly exchange their sort position, see the reduction value of [email protected], the average of all the reduction value is obtained this loss weight. It can be imagined that the greater the value of this loss weight, this pair of doc for the overall evaluation of the impact of the index, so the importance of training time is correspondingly large, this situation generally corresponds to the top doc, this is to make the training results pay particular attention to the ranking of the top doc position problem. Vice versa.

This parameter corresponds to the normalization factor of query. It can be expressed as the reciprocal of the number of doc under the query, which is well understood, if the number of doc under this query is less, then the relative importance of RANKSVM training process will be lower, this time by increasing this weight parameter, you can properly improve the doc under the query The importance of pair, so that the model can be trained in different query under the doc pair degree of attention.

The optimization problems of IR SVM are as follows:

Similarly, it needs to be converted into dual problem for solving:

The sorting function can be represented by the corresponding parameters after the final Solution:

Thus, the steps of the IR SVM method to solve the sorting problem are summed up as shown in the following:

Copyright Notice:

This article by the stupid Rabbit should not all, published in Http://www.cnblogs.com/bentuwuying. If reproduced, please specify the source, without the consent of the author to use this article for commercial purposes, will be held accountable for its legal responsibility.

Notes Learning to rank algorithm introduction: RANKSVM and IR SVM

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.