Beginner Information Retrieval 5: accuracy rate-recall rate and search engine Evaluation

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article briefly introduces the evaluation methods of search engines. The best way to evaluate the search performance of a search engine from the user's perspective is to calculate the number of documents that the user has browsed when finding a satisfactory document. However, in practice, the query is ever-changing and the document is ever-changing, so this method is not feasible. People put forward the following concepts and established an evaluation standard.

There are three common concepts: accuracy, accuracy, and recall.

Accuracy(Precision, P for short) is defined as: P = number of relevant documents in the returned results/number of returned results.

Accuracy(Accuracy, referred to as a) is defined as: a = number of documents with correct judgment results/number of all documents.

Recall rate(Recall, R for short) is defined as: r = number of relevant documents in the returned results/number of all relevant documents.

	Actual number of related documents	Actual number of irrelevant documents
Number of returned documents (Search Engine considers relevant)	TP	Fp
Number of documents not returned (not considered relevant by the search engine)	FN	TN

Based on the definitions of accuracy, accuracy, and recall rate:

P = TP/(TP + FP)

A = (TP + Tn)/(TP + FP + FN + Tn)

R = TP/(TP + FN)

There are two different concepts: accuracy rate and accuracy rate.

What if the search engine uses the accuracy rate to evaluate the search engine's performance? The accuracy rate is used in the evaluation of the second-class classifier. The evaluation of the second-class classifier is very effective, but it cannot be used in the evaluation of the search. The second-class classifier uses this concept to measure the correct score of the classifier, and the evaluation of the search is to measure the user's desired percentage. Since the number of documents in a document set that are not relevant to the query is more than 99%, as long as you simply think that all documents are irrelevant to the query, the accuracy rate of over 99% is obtained, only 1% of them are what the user wants, so the accuracy rate cannot be used.

What if I only use the recall rate to evaluate the search engine performance? No. As long as all documents are simply returned, we get a 100% recall rate, and users only want 1% of the total. Therefore, both the accuracy and recall rate are used to evaluate the search engine. Among them, the most famous is the 11-point-Pr curve.

This figure shows that the recall rate of search engines ranges from 0% ~ 100% indicates the accuracy of the result. This graph is used to measure a search engine. For example, the effect of ir2 is worse than that of ir1. How can this curve be obtained?

For this reason, people have established a standard test set that contains a certain number of documents, queries, queries, and relevance between documents.

Test-set = <D, Q, R <q, D>, test-set indicates the test set, D indicates the sample document set, and Q indicates the query sample set, r <q, D> is the correlation judgment between each query and each document, which must be determined manually beforehand.

The system then processes the sample query, and the system returns the sorted List of documents to the user based on the retrieval model. Because you know the correlation between the query and the document time in advance, you can view the document in the starting order and calculate the correct rate at different recall rates. This curve is obtained.

In fact, a search engine is measured not only by the accuracy rate and recall rate, but also by the response latency and interface friendliness of the search engine. These indicators are all taken into account from the user's perspective. There are also system index building overhead, update overhead, and so on, which are indicators to evaluate the system performance. These are not described in this article.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Beginner Information Retrieval 5: accuracy rate-recall rate and search engine Evaluation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Beginner Information Retrieval 5: accuracy rate-recall rate and search engine Evaluation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support