[IR Course note] Probabilistic retrieval model

Source: Internet
Author: User

Several symbolic meanings:

R: Related Document Set

NR: Unrelated Document Set

Q: User Query

DJ: Document J

1/0 Risk situation

PRP (probability ranking principle): A probabilistic sequencing principle that uses probabilistic models to estimate each document and demand-related probabilities, and then sorts the results.

Bayesian optimal decision making, based on minimum loss risk, returns documents that are more likely to be relevant than the unrelated possibilities:

the principle of probability sequencing based on retrieval cost:

CRRP (r| D) + CRNP (nr| D) < CNRP (r| D) + CNNP (nr| D)

How to calculate probabilities

Document D can be represented as a vector (d1,d2,..., dn)

Pi = P (di=1| R) 1-pi = P (di=0| R

Qi = P (di=1| NR) 1-qi = P (di=0| NR)

to take the logarithm of this equation:

How to get the initial R and NR

pi=c, C usually takes 0.5

Qi=ni/n NI represents the number of documents that have di appearing, and N indicates the total number of document sets.

Improve it:

For a query Q, according to the initial R and NR, you can get the first K return results. Then add the K results to the R concentration. At this point, the probability calculation method is:

pi = P (di | R) = si/t

Qi = P (di | NR) = (ni-si)/(N-T)

Si represents the number of di contained in a T document

Smooth

Pi = (si+0.5)/(t+1)

Qi = ((ni-si+0.5)/(n-t+1))

Weighted

Change the di to Wi.di to indicate that the word di appears to be 1, or 0 if it does not appear.

BM25 Weighted method

[IR Course note] Probabilistic retrieval model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.