The problem stems from the same experimental results as I did in the paper when I reproduced an information retrieval thesis. The experiment of this paper used p@10,map,ndcg@10 three kinds of indexes. I first used the calculation tools provided by Galago, and found that except for the p@10 of an indicator, map,ndcg@10 two were very different. It was observed that although the data of the experimental results were different, the trend of the experimental results was the same (the experiment was to evaluate several ranking algorithms, although different data were obtained, but the algorithm could be ranked by using these data). Interestingly, the
I have also used the MAP,NDCG calculation tool provided by Galago, and another paper cited in this paper has reproduced the experiment in the same way, and the result is exactly the same. So I think that while using the same type, there are some differences in the calculation of this indicator, even if the same two papers in the same field of the top conference, the calculation method is different. Then I began to write my own code to calculate the MAP,NDCG, constantly to modify the attempt, finally tried out the difference in where. 1, p@10 Before I say the difference between the same indicator calculation, let me say a few common evaluation indicators of information retrieval.  P@10 means: Returns the accuracy of the first 10 results. P English is precision. Look directly at the example below.
to be exact, p@5 is shown here. The 5 answers shown in the previous figure are returned in a single query, where the 1th, 3, and 5 answers are correct, and the 2nd and 4 answers are wrong, so the p@5=0.6 of this query. For a system, evaluate the accuracy of it, often several different kinds of queries to test, each query to calculate P@10, take these query p@10 average as the system p@10. This calculation is relatively fixed, there is no ambiguity, so the calculation of p@10 in various papers I see the same. 2, Map map for mean Average Precision, which means to average a thing called Average Precision (AP). Detect a system performance, often many different kinds of queries to test it, the results of each query can calculate an AP value, the average of all APS is the system map.
Then the question becomes how to calculate the AP value for a query result. The AP value is actually an extension to p@n. The above p@10 is n fixed to 10, while the AP calculation is average p@1,p@2 ... P@n all the values. An example is shown below:
The image above is a two-time query to a system. For the result 1,ap= (1.0+0.67+0.75+0.8+0.83+0.6)/6=0.78, for the result 2,ap= (0.5+0.4+0.5+0.57+0.56+0.6)/6=0.52.
For a query, the AP value can judge the merits and demerits, but if it involves the overall effect of a system, you need to use the map (Mean Average Precision), simply, the map is calculated by the search query results AP Value of the mean value. As shown above, if the map = (0.78+0.52)/2 is represented.the difference in calculation in different papersThe calculation of the map is also very simple, but it is important to note that the value of the n is ap@n. Map is not as clear as p@10 to use the first 10 of the returned results to calculate, most papers do not specify how many of the previous calculation map. The paper I came across was the first 1000 answers to the results, but it was just the first 100 to calculate the map value, though it was simple, but it would be confusing if you didn't notice.3, Ndcg@nThe NDCG can be disassembled into four parts, namely N (normalization) normalization, D (discounted) reduction, C (cumulative) accumulation, and G (Gain) gain. Four sections represent NDCG by the following formula.
where x represents a query, n means that the ndcg,i of the query is computed with the first n answers returned. G can be understood as a return answer for the quality of this query plus points. The size of G is independent of I and depends only on the quality of the answer. D can be understood as an appropriate reduction in the addition of a score. Because the more forward the answer should be more points, the more the more the answer to the less points, plus G is not relevant to the position of the answer, so you need to control the size of the addition by D. So d is a quantity that increases with the answer position I increases. C is the g/d of 1 to n positions to accumulate, get the quality score of this query. N is the normalization of the score, which can be understood as N being the ideal score and the highest possible score.calculate the difference between NDCG NDCG is more complex than the first two indicator formulas, so there is a greater likelihood of a difference in how it is calculated. In addition to C is the accumulation of no dispute, N, D, G Three calculations may be different g difference is larger, some directly take the correlation score rel as the value of G, some take 2^rel-1 as the value of G, Of course there are other ways of expressing it. The same is the correlation score is rel = {0,1,2 ...}. the same for D is the value in log (i), obviously I=1 d=0, not as the denominator. So there are two different ways to find out. First, when I=1, D takes 1 and the remainder takes log (i). The second type of D=log (1+i). Two calculations are also found for N. The same is calculated using the same DCG method, and the difference is in which values are calculated. First, the optimal ordering of the first n of the current return result is calculated DCG as the value of N. For example, a group of ndcg@5 has a correlation of x={1,0,2,2,1}, changing it to x={2,2,1,1,0} to calculate the value of DCG as N. In other words, the value of the collection x must appear in the answer. But assuming that the first n of the returned correlation score is 0,n and 0, the answer will be an error. This is mostly the way computing is calculated on the network. Second, the optimal n answers in the entire search space are formed into sets X, which are ranked from high to low to calculate the value of DCG as N. The value of collection x is not required to appear in the answer returned by the system. This is what I came across in my paper.