The concept of RankBoost is relatively simple. It is a common idea of binary Learning to rank: by constructing a target classifier, objects in pair are in a relative size relationship. In layman's terms, a pair of pair objects, such as r1> r2> r3> r4, can constitute pair :( r1, r2) (r1, r3 ), (r1, r4), (r2, r3) (r3, r4), the pair value is positive, that is, the label value is 1, and the remaining pair value is (r2, r1) the value must be-1 or 0. This sorting problem is cleverly converted to classification. Recently, many CV circles have used this learning to rank idea to identify problems (the earliest was this article "Person Re-Identification by Support Vector Ranking"). that is, to convert recognition into sorting and then into classification.
The Pairwise sorting method mainly uses RankSVM and RankBoost. Here we mainly talk about RankBoost, which is a Boost framework as a whole:
Note that the data distribution is also different from that of conventional Boost when the group is updated. Here we can see that for the final sorting value, that is, the ranking score, the value has no practical significance, and the relative order makes sense. For example, the ultimate score of r1 and r2 is 10 and 1, and the final score of r1 and r2 is 100 and 1, the amount of information difference is not big, we can conclude that r1 should be placed before R2.
Unlike the traditional Boost goal, the solution also requires a very clever method, mainly to define the Loss function of the classifier:
Specifically, as well as the loss of distribution D we can get:
Therefore, the goal is minimized.
So far, the traditional Boost linear search strategy has been able to solve, but there are more clever ways. Function:
Therefore, for the x in the range [-1], Z can be approximately:
In this way, the Z hour can be directly used. At this time, it is converted to the problem of maximizing | r |.
The following is a piece of RankBoost code:
function [ rbf ] = RankBoost( X,Y,D,T )%RankBoost implemetation of RankBoost algoritm% Input:% X - train set.% Y - train labels.% D - distribution function over X times X, it the form of 2D matrix.% T - number of iteration of the boosting.% Output:% rbf - Ranking Function.rbf = RankBoostFunc(T);% w - the current distribution in any iteration, initilize to Dw = D;for t=1:T tic; fprintf('RankBoost: creating the function, iteration %d out of %d\n',t,T); WL = getBestWeakLearner(X,Y,w); rbf.addWeakLearner(WL,t); rbf.addAlpha(WL.alpha,t); alpha=WL.alpha; %update the distribution %eval the weak learnler on the set of X and Y h=WL.eval(X); [hlen, ~] = size(h); tmph = (repmat(h,1,hlen) - repmat(h',hlen,1)); w=w.*exp(tmph.*alpha); %normalize w w = w./sum(w(:)); toc;endend
One obvious problem is that RankBoost needs to maintain a very large | X | * | X | matrix. The program runs very memory-consuming and often throws an Out of memory error. So
tmph = (repmat(h,1,hlen) - repmat(h',hlen,1));
The following method is not recommended for operations such:
% tmph = (repmat(h,1,hlen) - repmat(h',hlen,1)); %w=w.*exp(tmph.*alpha); [rows, cols] = size(w); sumw = 0; for r=1:rows for c=1:cols w(r,c) = w(r,c)*exp((h(r)-h(c))*alpha); sumw = sumw + w(r,c); end end %normalize w %w = w./sum(w(:)); w = w./sumw;
(Reprinted please indicate the author and Source: http://blog.csdn.net/xiaowei_cqu is not allowed for commercial use)