First, the language model
Designed to: Calculate a joint probability for a sentence or a group of words
Role:
- Machine translation: Used to differentiate translation results
- Spelling correction: A misspelled word is more likely to be the word, so the correction
- Speech recognition: The probability of speech recognition coming out of this sentence is greater
- Summary or question answering system
Related tasks: On the basis of the original sentence, the conditional probability of a new word is computed, and the probability is closely related to P (W1W2W3W4W5).
Any one model calculates the above two probabilities, which we all call a language model LM.
Second, how to calculate the probability
Method: Chain rule of dependence probability
Thus there are:
Question: How to estimate these probabilities
Method One: Counting and subdivision
But it can't be done!
Reason: The number of sentences is too large; it is never possible to have enough data to estimate these (corpora can never be complete)
Method Two: Markov hypothesis
Or:
That
So:
Three, Markov model
1. Unigram model
Its hypothetical words are independent of each other
2. Bigram model
3. N-gram Models
But not effective because the language itself has long-distance dependencies
For example, "the computer which ... crashed" word crash itself is actually dependent on the subject computer, but in the interval of a very long clause, in the Markov model it is difficult to find such a dependency
But in practical application, it is found that n-gram can solve this problem to some extent.
The language model of "Nlp_stanford classroom"