Introduction to Learning to Rank

Last Update:2018-12-05 Source: Internet

Author: User

Tags sorts svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Document directory

1.1 Relevance sorting Model (Relevance Ranking Model)
1.2 Importance sorting Model)
1). PointWise L2R
2). Pairwise L2R
3). Listwise L2R
References:

During my internship last year, I got into touch with Learning to Rank (L2R) because of the project needs, and I felt very interesting and had great application value. L2R applies Machine Learning Technology to sorting, and puts forward some new theories and algorithms, which not only effectively solve the sorting problem, but also some algorithms (such as LambdaRank) the idea is very novel and can be used for reference in other fields. Given the core position of sorting in many fields, L2R can be widely used in information (document) retrieval, collaborative filtering, and other fields.

This article will give a more in-depth introduction to L2R, mainly referring to the articles [1, 2, 3] by Liu Tianyan, Li Hang, and others. We will introduce L2R in the following aspects: why do we need to use machine learning for sorting, L2R feature selection, L2R training data acquisition, L2R training and testing, and L2R algorithm classification and introduction, l2R performance evaluation.

1. Existing sorting Model

Ranking has always been the core research issue of Information Retrieval. There are a lot of mature methods, which can be divided into the following two types: Relevance sorting model and importance sorting model.

1.1 Relevance sorting Model (Relevance Ranking Model)

The Relevance sorting model sorts documents based on the similarity between queries and documents. Common models include: Boolean Model, Vector Space Model, Latent Semantic Analysis, BM25, and LMIR.

1.2 Importance sorting Model)

The importance sorting model does not consider queries, but judges the document authority based on the graph structure between webpages (that is, documents). Typical authoritative websites include Google and Yahoo! . Common models include PageRank, HITS, HillTop, and TrustRank.

2. Why do I need to use machine learning for sorting?

For traditional sorting models, a single model can only consider one aspect (relevance or importance), so a single model cannot meet the requirements. Search engines usually combine multiple sorting models for sorting. However, how to combine multiple sorting models to form a new sorting model and adjust these parameters, is a big problem.

Using machine learning, we can take the output of various existing sorting models as features, train a new model, and automatically learn the parameters of this new model, this makes it easy to combine multiple existing sorting models to generate a new sorting model.

3. L2R Feature Selection

Different from text classification, L2R considers sorting of the document set for a given query. Therefore, the features used by L2R include not only some features of document d (such as whether it is Spam), but also the relevance between document d and the given query q, and the importance of documents on the entire network (such as PageRank values), that is, we can use the output of the relevance sorting model and importance sorting model as the features of L2R.

1) the output of the traditional sorting model includes both the output of the relevance sorting Model f (q, d) and the output of the importance sorting model.

2). Some features of the document, such as whether it is Spam.

4. Obtain L2R training data

The training data of L2R can be in three forms: for each query, the absolute correlation values of each document (very related, comparison related, irrelevant, etc.); For each query, relative correlation values between two documents (document 1 is related to document 2, document 4 is related to document 3, and so on); For each query, list of all documents sorted by relevance (document 1> document 2> document 3 ). The three types of training data can be converted to each other. For details, refer to [1].

There are two main methods for obtaining training data: manually tagging [3] and mining from log files [4].

Manual tagging: first, some queries are randomly selected from the search records of the search engine, and these queries are submitted to multiple different search engines. Then, the first K results returned by each search engine are selected, finally, professionals will mark these documents according to the relevance of the query.

Mining from logs: Search Engines all have a large number of logs to record users' behaviors. We can extract L2R training data. Joachims proposes an interesting method [4]: Given a query, the search engine returns the result list L, and the set of documents clicked by the user is C, if a document di has been clicked, another document dj has not been clicked, and dj is ranked before di In the result list, di> dj is a training record. That is, the training data is: {di> dj | di belongs to C, dj belongs to L-C, p (dj) <p (di)}, where p (d) the position of document d in the query result list. The smaller the value, the higher the front.

5. L2R Model Training

L2R is a supervised learning process.

For each given query-document pair, extract the corresponding features (including the relevance between the query and the document, it also includes the characteristics and importance of the document itself). In addition, the real sequence of the document set under a given query is obtained by manually tagging or mining methods from the log. Then we use various L2R algorithms to learn a sorting model so that the output document sequence is as similar as the actual sequence.

6. L2R algorithm classification and introduction

L2R algorithms include PointWise, PairWise, and ListWise algorithms.

1). PointWise L2R

The Pointwise method only considers the Absolute Relevance of a single document under a given query, regardless of the relevance between other documents and the given query. In other words, given a real document sequence for querying Q, we only need to consider the degree of relevance between a single document Di and the query CI, that is, the input data should be in the following form:

The Pointwise method mainly includes the following algorithms: pranking (NIPS 2002), OAP-BPM (emcl 2003), ranking with large margin principles (NIPS 2002), constraint ordinal regression (icml 2005 ).

The Pointwise method only uses the traditional classification, regression, or ordinal regression method to model the relevance of a given query order document. This method does not take into account some sorting features. For example, the sorting results between documents are for the document set under a given query, while the Pointwise method only considers the Absolute Relevance of a single document. In addition, in sorting, the first few documents have an important impact on the sorting effect. Pointwise does not consider this impact.

2). Pairwise L2R

The pairwise method considers the relative relevance between two documents under a given query. That is, given a real document sequence for querying Q, we only need to consider the relative relevance between any two documents with different relevance: Di> DJ, or di <DJ.

Pairwise includes the following algorithms: Learning to retrieve information (SCC 1995), learning to order things (NIPS 1998), ranking SVM (ICANN 1999), rankboost (jmlr 2003 ), LDM (SIGIR 2005), ranknet (icml 2005), Frank (SIGIR 2007), MHR (SIGIR 2007), round robin ranking (ecml 2003), gbrank (SIGIR 2007 ), qbrank (NIPS 2007), mprank (icml 2007), and irsvm (SIGIR 2006 ).

Compared with the Pointwise method, the Pairwise method sorts documents by considering their relative correlations. However, Pairwise's loss function based on the relative relevance between two documents may be very different from some indicators that really measure the sorting effect, sometimes even negatively correlated, as shown in (pairwise's loss function and NDCG present a negative correlation ):

In addition, some Pairwise methods do not consider the importance of the first few of the sorting results for the entire sorting, the impact of the size of the document set corresponding to different queries on the query results is not considered (however, some Pairwise methods have improved these, for example, ir svm is an algorithm improved by Ranking SVM based on the preceding disadvantages ).

3). Listwise L2R

Unlike the Pointwise and Pairwise methods, the Listwise method directly considers the overall sequence of the document set under a given query, and directly optimizes the document sequence output by the model so that it is as close as possible to the actual document sequence.

Listwise algorithms include LambdaRank (NIPS 2006), AdaRank (SIGIR 2007), SVM-MAP (SIGIR 2007), SoftRank (LR4IR 2007 ), CCA (SIGIR 2007), RankCosine (IP & M 2007), ListNet (ICML 2007), ListMLE (ICML 2008 ).

Compared with the Pointwise and Pairwise methods, the Listwise method directly optimizes the sequence of the entire document set under a given query, so it effectively solves the defects of the above algorithms. LambdaMART (an improvement on RankNet and LambdaRank) in the Listwise method has the best performance in Yahoo Learning to Rank Challenge.

7. L2R Performance Evaluation

L2R is sorted by machine learning. Therefore, the index for evaluating the L2R effect is the index for evaluating the sorting, which mainly includes the following:

1) WTA (winners take all) for the given query Q. If the first document in the result list returned by the model is related, WTA (q) = 1; otherwise, it is 0.

2) MRR (mean reciprocal rank) for a given query Q. if the location of the first relevant document is R (Q), MRR (q) = 1/R (q ).

3) map (mean average precision) for each real-related document D, consider its position P (d) in the model sorting result and calculate the classification accuracy of the document set before this position, take the average value of all these accuracy values.

4) ndcg (normalized discounted cumulative gain) is an indicator that comprehensively considers the relationship between the model sorting result and the real sequence. It is also the most commonly used indicator to measure the sorting result, for details, see Wikipedia.

5) RC (Rank Correlation) uses Relevance to measure the similarity between the sorting result and the actual sequence. The common indicator is Kendall's Tau.

References:

[1]. Learning to rank for Information Retrieval. Tie-Yan Liu.

[2]. Learning to rank for information retrieval and natural language processing. Hang Li.

[3]. A short introduction to learning to rank. Hang Li.

[4]. Optimizing search engines using clickthrough data. Thorsten joachims. sigkdd, 2002.

[5]. Summary of learning to rank

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More