Discussion on the descent characteristics of Precision-recall curve

Source: Internet
Author: User

Because of the retrieval, I need to use the accuracy and recall rate (Precision-recall) index to measure the validity of the retrieval algorithm. As we all know, the precision-recall curve usually shows a downward trend, and even if the same retrieval method is used to retrieve different retrieval cases, the higher the accuracy rate of the search results, the lower the recall rate will always be. I am curious about this phenomenon, have been trying to understand the truth, but also on the Internet to check the relevant information. But for the time being, there is no systematic discussion of the reasons, most of them are rehash-type description: There is no inevitable connection between the two, but in large-scale search will always appear this feature ...

To this end, I after some thought, did a preliminary discussion, have some experience, now share the following. If the reader has a better opinion, may wish to actively propose OH ~ ~

Definition: Precision = Retrieves the correct number of results/results, recall = Retrieves the correct number of results/the correct number of results in the database

Set


Then: precision = A/(a+b), recall = A/(A+c), where: A, B, C, D, precision, recall are not negative

The Precision-recall curve features are explained as follows:

In all searches, the setting only discusses the top-most fixed number of search results. In this case, if a retrieval of a increases, then B decreases, so that the a+b remains unchanged to conform to this setting. Obviously, the precision increases at this time.

and recall = A/(a+c) = 1/1+ (C/A). The characteristics of the Precision-recall curve tell us that when the same retrieval method is used for different searches, if the precision increases, the recall usually decreases. And at this time a increases, if you want to make even a increase, recall instead of falling, there is only one possibility, that is C increase, and its increase in proportion than A is also greater!

Because C refers to the number of results that are not retrieved, but are also correct, are in the database. The characteristics of the Precision-recall curve can be converted to the argument that when a increases and precision increases, c also increases, and the increase is greater than a. This actually reflects a phenomenon: the retrieval method is stable, that is, if using the same retrieval method for a certain retrieval, the correct number of results (A) is more, often not because the retrieval method in the retrieval of such use cases when the effect is good, but simply because such use cases in the database contains the correct number of results (A+C) More, The number of correct results (A) that can be returned when retrieving such a use case is pushed higher.

As to why C's increase is greater than a, this can be achieved by observing the characteristics of the objects in the database that are relevant to each of the various use cases. In general, databases belong to the correct result for a class of use cases, because even though they belong to the same class of use cases, each has its own distinct characteristics, making it worthwhile to reserve more storage shares for such use cases. The higher the degree of diversity, the more difficult it will be to retrieve the correct results of different features in the same class of use cases when retrieving the use cases. The general search algorithm does not do this satisfactorily. As a result, when the a+c of a certain kind of use case changes a long time, the same retrieval algorithm retrieves the class case, the increase of a is less than the increment of C. This leads to a downward trend in the Precision-recall curve.

Of course, the Precision-recall curve does not strictly conform to the monotonically decreasing law, but it shows a trend in large-scale retrieval. The above can only be interpreted visually to explain this trend, not with the deterministic conclusion of the mathematical proof process.

Discussion on the descent characteristics of Precision-recall curve

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.