Natural Language Processing 1-Markov chain and Hidden Markov Model (HMM)

Source: Internet
Author: User

Statistical-based language models have a natural advantage over rule-based language models, while (Chinese) word segmentation is the basis of natural language processing, next, we will introduce statistics-based Chinese Word Segmentation and part-of-speech tagging. To this end, make the following arrangements: first introduce the basic concepts involved in Chinese processing, and then analyze some open-source Chinese Word Segmentation principles based on statistics.

The basic concepts involved in Chinese Word Segmentation include Markov chain, Hidden Markov Model (HMM), Ngram model, maximum entropy Markov Model (memm), and Conditional Random Field (CRF ).

1. Markov Chain

In general, Markov chains refer to the fact that in a State Space sequence, the current State is only prior to n (n = 1, 2 ,......) Status.

The specific definition is as follows:

Markov chains are random variables with Markov properties, such as x1, x2, X3 ,... Sequence, the upcoming status is only related to the current status, but not to the past status.

The mathematical formula is as follows:

Pr (xn + 1 = | X1 = x1, x2 = x2 ,..., Xn = xn) = Pr (xn + 1 = x | xn = xn)

Xn (n = 1, 2, 3 ,...) Indicates a set of all possible values, known as "state space", while the XN value is in the state of time n.

A Markov chain is usually described as a directed graph. The State indicates the vertex of the graph, and the State Transfer Probability indicates the edge of the graph. 1.

Figure 1-Markov Chains in two states

2. Hidden Markov Model

Hmm Definition

A hmm is a triple.

Initial state probability Vector

A = (AIJ): State Transfer Probability; PR (XI | XJ)

B = (BIJ): confusion matrix; PR (yi | XJ)

Among them, all the State transfer probabilities and obfuscation probabilities remain unchanged throughout the system. This is also the most impractical assumption in HMM.

An HMM model is mainly represented by two States and three sets of probabilities.

Two statuses: observation and hiding

Three sets of probabilities: initial probability, state transition probability, and two-state probability.

Example

We use a part-of-speech tagging example to illustrate the principle of HMM.

Observation status: He is a computer doctor.

Hidden state: pronoun, verb, noun

Assume that the conversion between the two States of the hidden state is as follows based on the corpus. We also call it the State transfer probability matrix.

 

Pronoun

Verb

Term

Pronoun

0.5

0.25

0.25

Verb

0.375

0.125

0.375

Term

0.125

0.625

0.375

 

Based on the corpus, we can also obtain the probability matrix of the two States, that is, the confusion matrix, as shown below:

 

He

Yes

Computer

Doctor

Pronoun

0.60

0.20

0.15

0.05

Verb

0.25

0.25

0.25

0.25

Term

0.05

0.10

0.35

0.50

At the same time, we assume that the initial probability is as follows:

Pronoun verb noun

[0.63 0.17 0.20 〕

So far, we have trained a part-of-speech tagging HMM Model Based on corpus statistics.

What can we do with the HMM model?

(1) evaluation is to find the probability of an observed sequence based on known hmm. For example, we can evaluate (he is a computer doctor) the probability of appearance. We can use the Forward Algorithm algorithm to obtain the probability that the observed state sequence corresponds to a hmm.

(2) find the hidden state sequence that generates this sequence based on the observed state sequence, for example, you can find the corresponding "pronoun verb noun" Sequence Based on the "He is a computer doctor" sequence. We can solve this problem through viterbialgorithm.

How can we evaluate and decode the Hidden Markov Model Based on the trained hidden Markov model?

This is the most difficult issue related to hmm. Based on an observed sequence (from a known set) and a hidden state set related to it, estimate the most suitable hidden Markov model (HMM), that is, to determine the most appropriate (, a, B) Triple for the description of known sequences. When matrices A and B cannot be measured directly (estimated), forward-backward algorithms (forward-backward algorithm) are used for learning (parameter estimation ), this is also common in practical applications.

Because the accuracy of learning directly using the forward-backward algorithm is not very high, the common practice is to generate hmm by manually tagging the corpus. However, note that manual corpus tagging requires a large amount of work.

Disadvantages of Hidden Markov Model:

There are two assumptions in the HMM Model: one is that the output observed values are strictly independent, and the other is that the current State is only related to the previous state (the first-order Markov Model) during the state transition process ).

In the following example

Observation status: He is a computer doctor.

Hidden state: pronoun, verb, noun

For example, when calculating the probability of Pr (Doctor | term), the context information "computer" of "doctor" is not considered "; in addition, the probability of the first "noun" appearing is only related to the probability of the previous "verb" appearing. In this way, the ability to indicate context information is limited.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.