This section is derived from the understanding of the Coursera online course NLP (by Michael Collins). Course links are: https://class.coursera.org/nlangp-001
1. Tagging Problems
1.1 POS Tagging
Problem description
Input:profits soared at Boeing Co., easily topping forecasts on Wall Street, as their CEO Alan Mulally announced first qua Rter results.
output:profits/n soared/v at/p boeing/n co./n,/, Easily/adv topping/v forecasts/n on/p Wall/N Street/N,/, As/P Their/POS S ceo/n alan/n mulally/n announced/v first/adj quarter/n results/n./.
Ps:n = Noun; V = Verb; P = preposition; ADV = adverb; ADJ = adjective;
Given the training set, (x (i), Y (i)), where x (i) is the sentence X1 (i) ... nix (i), Y (i) is the tag sequence, and NI is the length of the first sample. XJ (i) is therefore the first J word in sentence X (i), YJ (i) is the tag of XJ (i). For example, Penn WSJ's Treebank notation corpus. The difficulty of POS includes (1) a word polysemy (that is, the word can be labeled as a variety of tags in different contexts), (2) The processing of uncommon words (i.e., the words not appearing in the training Corpus), and the statistical characteristics of words, such as common speech and grammatical knowledge (such as "quarter", are considered in POS process). Generally as nouns appear instead of verbs, D-N-V is more common in sentences than D-V-n structures.
1.2 named-entity Recognition
Problem description
Input:profits soared at Boeing Co., easily topping forecasts on Wall Street, as their CEO Alan Mulally announced first qua Rter results.
Output1:profits soared at [company Boeing Co.], easily topping forecasts on [Locationwall Street], as their CEO [person Al An Mulally] announced first quarter results.
The output is the result of a named entity recognition, such as person, location, company ...; Unlike POS, each word is either labeled Na (excluding named entities) or labeled as part of a named entity (e.g. SC starts with company name, CC is middle of company name ...) That is, the following results are output:
Output2: Profits/NA soared/NA at/NA Boeing/SC Co./CC ,/NA easily/NA topping/NA forecasts/NA on/NA Wall/SL Street/CL ,/NA a S/na their/na ceo/na alan/sp mulally/cp announced/na first/na quarter/na results/na./NA
Ps:na = No entity; SC = Start Company; CC = Continue Company; SL = Start location; CL = Continue location; ..
2 Generative Models
2.1. Hidden Markov Models
Training Example: (x (1), Y (1)) ... (x (m), Y (M)), we want to get the function f:x→y by training the sample
Method One: Conditional model
Given the test sample x, the model output is:
Method Two: Generative model
Apply joint probability distribution p (x, Y) and P (x, y) =p (y) p (x|y)
where P (y) is a priori probability, p (x|y) is the conditional probability of a given label Y.
So we can use Bayesian rules to get conditional probability P (y|x):
which
So
2.2 Generative Tagging Models
v: Word set such as: V ={the, dog, saw, cat, laughs,...}
K: Callout Collection
S:sequence/tag-sequence pairs <x1,... xn,y1,... yn>
Given generative Tagging Model,the tag result of X1...XN is Y1...yn as:
2.3 Trigram Hidden Markov Models (trigram HMMs)
Q (s|u,v): Bigram marked as (u,v) after the probability of marking S, to Trigram (u,v,s), s belongs to {k,stop},u,v belongs to {k,*};
E (x|s): The probability of observing the result x in S state, x belongs to V,s belongs to K;
S: All sequence/tag-sequence to <x1...xn,y1...yn+1>,yn+1=stop
ps:y0=y-1=*
For example: If n=3,x1x2x3= the dog Laughs,y1y2y3y4=d N V STOP, then:
The model is Noisy-channel, the Ishimarkov process, which is labeled as a priori probability of the D N V stop, which is the conditional probability P (the dog laughs| D N V STOP).
Tagging Problems & Hidden Markov Models---NLP learning notes (original)