[Switch] concepts related to ndcg Measure
18:04:20| Category:Default category| Tag: |Font Size LargeMediumSmall Subscription
Normalized discounted cumulative gain
A pair of search engines or relatedProgramEffectiveness measurement.
Hypothesis:
- The more relevant documents appear in the front of the result list (the higher the rank), the more useful
- Strong and weak documents are more useful than irrelevant documents.
There are several progressive concepts below:
Graded relevance:
Several Levels of relevance measurement, such:
Highly relevant: 2
Marginally relevant: 1
Irrelevant: 0
Generally, it is determined by human judge. Indicated as rel in the following formula
Cumulative gain:
Without considering the order information in the result set, the hierarchical relevance is simply added.
The cg value of the p result in the result set is:
Changing the positional relationship between any two results before P does not affect the CG value of P.
In the above assumptions, DCG is better than CG
Discounted cumulative gain
If a highly relevant document is ranked back, it should be punished.
The formula is not unique. Theoretically, it only proves the smoothness requirements of the logarithm discount factor.
Another DCG formula emphasizes correlation more.
If the grading correlation is set to 0 and 1, the two formulas have the same effect.
Normalized discounted cumulative gain
The length of the result list varies depending on the query. Therefore, normalization is taken into account during this period.
Idcgp (ideal DCG) is the maximum DCG value of P in a perfect order.
In this way, no matter what the query is, ndcg can get an average value, so the efficiency of different queries can be compared.
Perfect sortingAlgorithmSo that dcgp and idcgp are the same, so that ndcgp is 1, and the value of ndcg is between 0 and 1.
Example:
The six documents in the result list, D1, D2, D3, D4, D5, D6, determine that their correlation is, then:
An ideal sorting should be:, so