Learning sort Learning to Rank summary

Source: Internet
Author: User
Tags manual relative svm idf
Learning sort (learning to Rank)
LTR (learning torank) Learning sequencing is a sort of supervised learning (supervisedlearning) method. LTR has been widely used in many fields of text mining, such as the documents returned in IR, the candidate products in the recommendation system, the user sequencing, the candidate translation results in machine translation, and so on. The traditional method of sorting in IR domain is usually by constructing the correlation function and then sorting according to the correlation degree. There are many factors affecting the correlation, such as the tf,idf,dl mentioned above. There are a number of classic models to accomplish this task, such as Vsm,boolean model, probabilistic models, and so on. For the traditional sorting method, it is difficult to fuse a variety of information, such as vector space model with TF*IDF as the weight to build the correlation function, it is difficult to use other information, and if the parameters of the model is more, it will make the parameter is very difficult, and it is likely to appear over the fitting phenomenon. So it was natural to think of machine learning (machines learning) to solve this problem, so there was learning to rank. Machine learning method is easy to fuse many characteristics, and has a mature and deep theoretical basis, the parameters are optimized by iterative, a set of mature theory to solve the problem of sparse, over-fitting and so on.


The Learning Sort system framework is shown in Figure 2.1:






Figure 2.1 Sequencing Learning System framework


For the annotation training set, selecting the Ltr method, determining the loss function, and optimizing the target by minimizing the loss function can get the relevant parameters of the ordering model, which is the learning process. The prediction process will result in the result of the input to the order model, the results of the relevant score, the use of the score to order to get the final sequence of the results to be predicted.


LTR generally has three types of methods: the Single Document Method (Pointwise), the document offset method (pairwise), and the Document list method (Listwise).


1 pointwise
Pointwise the processing object is a single document, after converting the document into Eigenvector, it is mainly to turn the sorting problem into a general classification or regression problem in machine learning. We are now using a multi-class classification as an example: Table 2-1 is a manual annotated part of the training set, each document with three characteristics: Query and document BM25 similarity, query and document Cosin similarity, and page PageRank value, query and Di correlation is multivariate, The label is divided into 5 levels, {Perfect,excellent,good,fair,bad}. As a result, 5 training instances with a label are generated, and then we can learn by using any of the multi-class classification algorithms of machine learning, such as maximum entropy, support vector machines, and so on.





Document ID


Inquire


BM25 Similarity degree


Cosin Similarity degree


PageRank value


Label


1


Apple Cook


0.15


0.13


5


Bad


2


Apple IPad


0.32


0.24


7


Good


3


Microsoft products


0.22


0.19


2


Good


4


Google Mobile


0.55


0.53


3


Perfect


5


Baidu Strategy


0.35


0.28


1


Excellent


After the model parameters have been studied, the model can be used to judge the correlation, and the new query and document can be obtained by using the scoring function of the model.


Pointwise is calculated entirely from the classification angle of a single document, without considering the relative order of the documents. And it assumes that correlation is query-independent, as long as (QUERY,DI) is the same correlation, then they are divided into the same level, belong to the same class. In fact, however, the relativity of correlation is related to the query, such as a common query it will have a lot of related documents, the query and its relative to the label label level of the document can be more than a rare query and its few highly related documents of the label standard level higher. This results in inconsistent training samples and cannot be relatively sorted between documents that are predicted to be of the same label level.


Pointwise commonly used methods have Mcrank and so on.


2 pairwise
Pairwise is a popular method at present, relative to pointwise he will focus on the document order relationship. It mainly attributed the sorting problem to two-dollar classification problem, when the machine learning methods are more, such as boost, SVM, neural network and so on. For the relevant document set of the same query, a training instance (DI,DJ) can be obtained for any document of two different labels, and if DI>DJ is assigned a value of +1 and Vice-1, then we get the training sample required for the two-tuple training, as shown in Figure 2.2. At the time of testing, a partial order relationship of all the documents can be obtained by classifying all pairs so that sorting is achieved.




Figure 2.2 pairwise sorting method schematic


Although Pairwise has made improvements to pointwise, there are obvious problems with this approach:


A. Only the relative order of two documents is considered, regardless of where they appear in the search results list. The preceding document is even more important, and if the previous document is judged wrong, the penalty function is significantly higher than the judgment error in the following line. It is therefore necessary to introduce positional factors, each of which has a different weight depending on its position in the result list, and the higher the weight in the front, the greater the penalty to be punished if the order is wrong.


B. For different queries related to the number of document sets varies greatly, after conversion to the document pair, some queries may have only more than 10 document pairs, and some of the query may have hundreds of corresponding document pairs, which the evaluation of the effectiveness of the learning system brought a bias. Suppose query 1 corresponds to 500 document pairs, query 2 corresponds to 10 document pairs, assuming that the machine learning system corresponding to query 1 can determine the correct 480 document pairs, the corresponding query 2 can determine the correct 2. For the total document on the system accuracy is (480+2)/(500+10) =95%, but from the point of view of the query, the accuracy of two queries are: 96% and 20%, the average is 58%, and the total document to determine the accuracy of the difference is huge, which will make the model biased to the relevant document Set large query.


Pairwise has a lot of implementations, such as ranking Svm,ranknet,frank,rankboost.


3 listwise
Listwise is different from the above two methods, it is the list of all search results corresponding to each query as a training sample. Listwise the best scoring function f according to the training sample training, corresponding to the new query, scoring F for each document, and then according to the score from high to low order, that is, the final sort result.


corresponding to how to train the optimal scoring function F, this paper introduces a method of training based on the probability distribution of the search result permutation combination. As shown in Figure 2-2, the corresponding query q, assuming that the search engine returns results A, B, C Three documents, these three documents can be produced 6 in the arrangement, corresponding to the scoring function F, three documents to the relevance of the score, get F (A), F (B), F (C), Based on these three values, you can calculate the probability values for each of the 6 permutation combinations. For different scoring functions F, the probability distributions of six permutations are different.






Assuming that the scoring function g is the standard answer corresponding to the standard standard of the scoring function, it is what we do not know, we are trying to find a scoring function f, so that F produces the same score and the human score as possible. Assuming that there are two other scoring functions H and F, their calculation method is known, the corresponding search permutation combination probability distribution as shown, through the KL distance, F ratio h is closer to the virtual optimal function G. The training process is to look for the function that is closest to the virtual function g in the possible function, and the scoring function is used to rate the prediction.


The listwise approach is often more straightforward, focusing on its own goals and tasks, directly optimizing the results of document sequencing, and therefore often the best results. Listwise commonly used methods have Adarank,softrank,lambdamart and so on.


Acquisition of LTR Training data
1. Manual labeling. If you need a lot of training data, manual labeling is not realistic


2. For the search engine, the user can click on the record to get training data. The search results returned by the query will be clicked on some of the pages, assuming that the user first clicks on a page that is more relevant to the query. Although this assumption is often not true, practical experience shows that this access to training data is feasible.




LTR Feature Selection
When using LTR, a series of textual features are selected, and the machine learning method blends nicely into a sort model to determine the order of the final results, each of which we call a "feature". For a Web page text, the document area where feature is located can include the Body field, Anchor field, Title field, URL field, whole document field, and so on.


The feature of a document can be divided into two types: one is the characteristics of the document itself, such as PageRank value, content richness, spam value, number of slash, url length, InLink, outlink number, Siterank and so on. The second is the characteristic of Query-doc: the correlation degree of the document correspondence query, the TF of each domain, the IDF value, the BOOL Model,vsm,bm25,language model correlation and so on.


Combining the two types of document feature and the different domains in the document, we can combine many feature, of course some feature are positive correlation some are negative correlation, this need we through the learning process to select optimization.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.