Tagging Problems & Hidden Markov Models---NLP learning notes (original)

Last Update:2014-12-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This section is derived from the understanding of the Coursera online course NLP (by Michael Collins). Course links are: https://class.coursera.org/nlangp-001

1. Tagging Problems

1.1 POS Tagging

Problem description

Input:profits soared at Boeing Co., easily topping forecasts on Wall Street, as their CEO Alan Mulally announced first qua Rter results.

output:profits/n soared/v at/p boeing/n co./n,/, Easily/adv topping/v forecasts/n on/p Wall/N Street/N,/, As/P Their/POS S ceo/n alan/n mulally/n announced/v first/adj quarter/n results/n./.

Ps:n = Noun; V = Verb; P = preposition; ADV = adverb; ADJ = adjective;

Given the training set, (x (i), Y (i)), where x (i) is the sentence X1 (i) ... nix (i), Y (i) is the tag sequence, and NI is the length of the first sample. XJ (i) is therefore the first J word in sentence X (i), YJ (i) is the tag of XJ (i). For example, Penn WSJ's Treebank notation corpus. The difficulty of POS includes (1) a word polysemy (that is, the word can be labeled as a variety of tags in different contexts), (2) The processing of uncommon words (i.e., the words not appearing in the training Corpus), and the statistical characteristics of words, such as common speech and grammatical knowledge (such as "quarter", are considered in POS process). Generally as nouns appear instead of verbs, D-N-V is more common in sentences than D-V-n structures.

1.2 named-entity Recognition

Problem description

Input:profits soared at Boeing Co., easily topping forecasts on Wall Street, as their CEO Alan Mulally announced first qua Rter results.
Output1:profits soared at [company Boeing Co.], easily topping forecasts on [Locationwall Street], as their CEO [person Al An Mulally] announced first quarter results.

The output is the result of a named entity recognition, such as person, location, company ...; Unlike POS, each word is either labeled Na (excluding named entities) or labeled as part of a named entity (e.g. SC starts with company name, CC is middle of company name ...) That is, the following results are output:

Output2: Profits/NA soared/NA at/NA Boeing/SC Co./CC ,/NA easily/NA topping/NA forecasts/NA on/NA Wall/SL Street/CL ,/NA a S/na their/na ceo/na alan/sp mulally/cp announced/na first/na quarter/na results/na./NA

Ps:na = No entity; SC = Start Company; CC = Continue Company; SL = Start location; CL = Continue location; ..

2 Generative Models

2.1. Hidden Markov Models

Training Example: (x (1), Y (1)) ... (x (m), Y (M)), we want to get the function f:x→y by training the sample

Method One: Conditional model

Given the test sample x, the model output is:

Method Two: Generative model

Apply joint probability distribution p (x, Y) and P (x, y) =p (y) p (x|y)

where P (y) is a priori probability, p (x|y) is the conditional probability of a given label Y.

So we can use Bayesian rules to get conditional probability P (y|x):

which

2.2 Generative Tagging Models

v: Word set such as: V ={the, dog, saw, cat, laughs,...}

K: Callout Collection

S:sequence/tag-sequence pairs <x1,... xn,y1,... yn>

Given generative Tagging Model,the tag result of X1...XN is Y1...yn as:

2.3 Trigram Hidden Markov Models (trigram HMMs)

Q (s|u,v): Bigram marked as (u,v) after the probability of marking S, to Trigram (u,v,s), s belongs to {k,stop},u,v belongs to {k,*};

E (x|s): The probability of observing the result x in S state, x belongs to V,s belongs to K;

S: All sequence/tag-sequence to <x1...xn,y1...yn+1>,yn+1=stop

ps:y0=y-1=*

For example: If n=3,x1x2x3= the dog Laughs,y1y2y3y4=d N V STOP, then:

The model is Noisy-channel, the Ishimarkov process, which is labeled as a priori probability of the D N V stop, which is the conditional probability P (the dog laughs| D N V STOP).

Tagging Problems & Hidden Markov Models---NLP learning notes (original)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Tagging Problems & Hidden Markov Models---NLP learning notes (original)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Tagging Problems & Hidden Markov Models---NLP learning notes (original)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support