First, the development
Origin: The statistical language model originated in Ponte and Croft papers published in the Sigir of 1998
Application: There are many applications of language models:
Corsslingual Retrieval
Distributed IR
Expert finding
Passage retrieval
Web Search
Genomics Retrieval Genomics Search
Topic Tracking
subtopic Retrieval
Second, basic model
1, Ponte and Croft
Core idea: Query likelihood scoring
Algorithm:
Two core questions: (1) How to define ΘD?
(2) How to calculate θd?
Multiple Bernoulli models mutiple Bernoulli Model: The word appears (=1) or does not appear (=0), only two cases
The above formula does not consider TF, if considered, as follows:
2, BBN and twenty-one in TREC-7
Essence: Unigram Model
Formula:
Smoothing the above formula, as follows:
The score of the document is calculated using the following formula:
Iii. Summary of basic model variants
(1) Bernoulli is not multinomial popular, because the former does not consider TF, timely consideration, it is not natural
(2) The hypothesis of Bernoulli is: whether the term appears or not is independent from other term
The multinomial hypothesis is that the number of term occurrences is independent of the other term and is in time the same term that appears in different places
The basic model of language Model--basic model language models