MIT Natural Language Processing Third lecture: Probabilistic language model (Part I)

Natural language Processing: Probabilistic language model

Natural Language processing:probabilistic Language Modeling

Author: Regina Barzilay (Mit,eecs Department, November 15, 2004)

Translator: I love natural language processing (www.52nlp.cn, January 16, 2009)

Previous main content review (last time)

Corpus processing (corpora processing)

Zipf (Zipf's Law)

Data sparse problem (sparseness)

Main content of this lecture (today):

Probabilistic language model (probabilistic language Modeling)

First, Brief introduction

A) predicting string probabilities (predicting string probabilities)

I. That a string more likely or more in line with syntax which string is more likely? (which string is more grammatical?)

1. Grill doctoral candidates.

2. Grill Doctoral updates.

(Example from Lee 1997)

Ii. the method of assigning probabilities to strings is called a language model (Methods for assigning probabilities to strings is called Language Models.)

b) Motive (motivation)

I. Speech recognition, spell checking, optical character recognition and other fields of application (Speech recognition, spelling correction, optical character recognition and other applications)

II. Let E be a physical evidence (. Unsure of the translation), we need to decide whether the string w is a message that has e encoded (let E was physical evidence, and we need to determine whether the string W is the Coded by E)

Iii. using Bayesian rules (use Bayes rule):

P (w/e) ={P_{LM} (W) p (e/w)}/{p (E)}

Where P_{lm} (w) is the language model probability (where P_{LM} (W) is language model probability)

Iv. P_{lm} (W) provides the necessary disambiguation information (P_{LM} (W) provides the information necessary for isambiguation (esp. when the physical evidence is Not sufficient for disambiguation))

c) How do I calculate (how to Compute it)?

I. Naïve method (Naive approach):

1. Using the maximum likelihood estimate (use the maximum likelihood estimates (MLE))--the value of the number of occurrences of the string in corpus s is normalized by the corpus size (the numbers of times that the string occu RS in the Corpus S, normalized by the corpus size):

P_{MLE} (Grill~doctorate~candidates) ={count (grill~doctorate~candidates)}/delim{|} S {|}

2. For unknown events, maximum likelihood estimation p_{mle}=0 (for unseen events, p_{mle}=0)

--The data sparse problem is more "scary" (dreadful behavior in the presence for data sparseness)

D) Two well-known sentences (two Famous sentences)

I. "It's fair to assume that neither sentence

"Colorless green ideas sleep furiously"

Nor

"Furiously sleep ideas green colorless"

... has ever occurred ... Hence, in any statistical model ... these sentences would be ruled out on identical grounds as equally "remote" from 中文版. Yet (1), though nonsensical, is grammatical, while (2) is not. " [Chomsky 1957]

Ii. Note: This is the 9th page of Chomsky's syntactic structure: The following two sentences have never appeared in an English conversation, from a statistical point of view from the same "distant" English, but only sentence 1 is grammatical:

1) Colorless green ideas sleep furiously.

2) furiously sleep ideas sleep green colorless.

"Never appeared in an English conversation", "from the statistical perspective of the same ' distant '" to see from which angle to see, if the specific vocabulary, from the form of the angle of view, I am afraid that the sentence 1 of the statistical frequency is higher than sentence 2 and appeared in English.

To be continued: Part Two

Attached: Courses and courseware PDF download mit English page address:

http://people.csail.mit.edu/regina/6881/

Note: This document is published in accordance with the MIT Open Course authoring and sharing specification, reproduced please specify the source "I love Natural Language processing": www.52nlp.cn

from:http://www.52nlp.cn/mit-nlp-third-lesson-probabilistic-language-modeling-first-part/

**mit Natural Language Processing Third lecture: Probabilistic language Model (Part II)**

Natural language Processing: Probabilistic language model

Natural Language processing:probabilistic Language Modeling

Author: Regina Barzilay (Mit,eecs Department, November 15, 2004)

Translator: I love natural language processing (www.52nlp.cn, January 17, 2009)

Second, the language model constructs

A) language model issues (the Language Modeling problem)

I. Starts with a set of words (start with some vocabulary):

ν= {The, a, doctorate, candidate, professors, grill, cook, ask, ...}

II. Get a training sample with the vocabulary set V-off (get a training sample of V):

Grill Doctorate candidate.

Cook professors.

Ask professors.

......

Iii. hypothesis (assumption): The training sample is characterized by some hidden distribution p (training sample is drawn from some underlying distribution p)

Iv. Objective (GOAL): Learning a probability distribution P prime as far as possible with P approximation (learn a probability distribution P prime "as close" to p as possible)

Sum{x in V}{}{p Prime (x)}=1, P Prime (x) >=0

P Prime (candidates) =10^{-5}

{P Prime (ask~candidates)}=10^{-8}

b) Acquisition of language models (deriving Language model)

I. Assigning probabilities to a set of Word sequences w_{1}w_{2}...w_{n} (Assign probability to a sequencew_{1}w_{2}...w_{n})

II. Apply the chain rule (apply chain rule):

1. P (w1w2...wn) = P (w1| S) ∗p (w2| S,W1) ∗p (w3| S,W1,W2) ... P (e| S,W1,W2,..., WN)

2. Model based on "history": We predict future events from past events (we predict following things from past things) history-based

3. What scope of context do we need to consider (how much context does we need to taking into account)?

c) Markov hypothesis (Markov assumption)

I. For any length of Word sequence P (wi|w (i-n) ... w (i−1)) is more difficult to predict (for arbitrary long contexts P (wi|w (i-n) ... w (i−1)) difficult to estimate)

Ii. Markov hypothesis (Markov assumption): The first word WI only relies on the first n words (WI depends only on n preceding words)

Iii. ternary grammatical model (also known as Ishimarkov model) (Trigrams (second order)):

1. P (wi| Start,w1,w2,..., W (i−1)) =p (Wi|w (i−1), W (i−2))

2. P (w1w2...wn) = P (w1| S) ∗p (w2| S,W1) ∗p (w3|w1,w2) ∗ ... P (E|w (n−1), WN)

D) A language computing model (a computational model of Language)

I. A useful concept and practice device (a useful conceptual and practical device): "Coin toss" model (coin-flipping models)

1. Generate sentences from a random algorithm (a sentence is generated by A randomized algorithm)

--The generator can be one of many "states" (the generator can be a several "states")

-Toss a coin to determine the next state (Flip coins to choose the next)

--Toss other coins to decide which letter or word output (Flip another coins to decide which letters or word to output)

Ii. Shannon (Shannon): "The states would correspond to the" residue of influence "from preceding letters"

e) Word-based approximation (word-based approximations)

Note: The following is a randomly generated sentence after the training of Shakespeare, can refer to the "Natural language processing comprehensive theory"

I. Unary syntax approximation (here the MIT courseware is wrong, not the first-order approximation (First-order approximation))

1. To him swallowed confess hear both. Which. of Save

2. On trail for is AY device and rote life has

3. Every enter now severally so, let

4. Hill he late speaks; or! A more to leg less first you

5. Enter

II. Ternary grammar approximation (here the courseware is wrong, not the third-order approximation (Third-order approximation))

1. King Henry. what! I'll go seek the traitor Gloucester.

2. Exeunt some of the watch. A Great Banquet serv ' s in;

3. Would you tell me how I am?

4. It cannot be and so.

To be continued: Part III

Attached: Courses and courseware PDF download mit English page address:

http://people.csail.mit.edu/regina/6881/

Note: This document is published in accordance with the MIT Open Course authoring and sharing specification, reproduced please specify the source "I love Natural Language processing": www.52nlp.cn

from:http://www.52nlp.cn/mit-nlp-third-lesson-probabilistic-language-modeling-second-part/

**mit Natural Language Processing Third lecture: Probabilistic language model (Part III)**

Natural language Processing: Probabilistic language model

Natural Language processing:probabilistic Language Modeling

Author: Regina Barzilay (Mit,eecs Department, November 15, 2004)

Translator: I love natural language processing (www.52nlp.cn, January 18, 2009)

Iii. Evaluation of language models

A) Evaluate a language model (evaluating a Language models)

I. We have n tests for the string of words (We have n test string):

S_{1},S_{2},..., S_{n}

II. Consider the probability of this string of words under our model (consider the probability under US model):

Prod{i=1}{n}{p (S_{i})}

or logarithmic probability (or log probability):

Log{prod{i=1}{n}{p (S_{i})}}=sum{i=1}{n}{logp (S_{i})}

Iii. degree of perplexity (perplexity):

perplexity = 2^{-x}

Here x = {1/w}sum{i=1}{n}{logp (s_{i})}

W is the sum of the total number of words in the test data (W is the words in the.

Iv. perplexity is an effective "branching factor" methodology (perplexity is a measure of effective "branching factor")

1. We have a word set of size n, model predictions (we have a vocabulary V of size N, and model predicts):

P (W) = 1/n for all words in V (for all the words in V.)

V. What is the degree of perplexity (what is about perplexity)?

perplexity = 2^{-x}

Here x = log{1/n}

So perplexity = N

Vi. Assessment of human behaviour (estimate of human Performance (Shannon, 1951)

1. Shannon Games (Shannon game)-people guess the next letter in a text (humans guess next letters in text)

2. pp=142 (1.3 bits/letter), uncased, open vocabulary

Vii. evaluation of ternary language models (estimate of Trigram language model (Brown et al. 1992))

pp=790 (1.75 bits/letter), cased, open vocabulary

Continued: Part Fourth

Attached: Courses and courseware PDF download mit English page address:

http://people.csail.mit.edu/regina/6881/

Note: This document is published in accordance with the MIT Open Course authoring and sharing specification, reproduced please specify the source "I love Natural Language processing": www.52nlp.cn

from:http://www.52nlp.cn/mit-nlp-third-lesson-probabilistic-language-modeling-third-part/