Language model:
I. Basic IDEAS
Unlike most other retrieval models, from queries to documents (that is, given user queries, how to find relevant documents),
The language model consists of a document-to-query, which establishes a different language model for each document, which is judged by the document generation user
The probability of a poll, and then follow this generation probabilities from high to low, as search results.
Ii. Generating Query probabilities
To create a language model for each document, the language model represents the distribution of words (or word sequences) in the document
Condition. For the words in the query, each word has a decimation probability, the extraction probability of these words is multiplied by the text
The probability that a document generates a query.
Iii. Existing problems
Because a document text content is limited, so many query words have not appeared in the text, the probability of generating 0, will cause
The probability of generating the whole query is 0, which is called the data sparse problem of language model, and it is the problem that the language model approach needs to be solved.
Iv. Solutions
Generally, data smoothing is used to solve data sparse problem. The method of language model retrieval is to introduce a back to all words
The probability of the scene to do data smoothing.
NLP Language Model