"NLP" revealing Markov Model mystery series article (iii)

Source: Internet
Author: User

forward algorithm for solving the likelihood of hidden Markov model

Bai Ningsu

July 11, 2016 22:54:57

absrtact: The definition of the earliest contact Markov model stems from Mr. Wu's book "The Beauty of Mathematics", which at first felt esoteric and useless. Until we learn the natural language processing, we really use the hidden Markov model, and realize the magic of this model. Markov model is a powerful function in the process of sequence classification, such as: Speech tagging, voice recognition, sentence segmentation, Word suyin, local parsing, block analysis, named entity recognition, information extraction, etc. It is also widely used in Science, Engineering Technology, biotechnology, utilities, channel coding and many other fields. This article is written in the following ways: The first introduction of Markov profiles and Markov chains; the second chapter introduces three problems (likelihood degree, coding and parameter learning) of Markov chain (explicit Markov model) and hidden Markov model. The third to fifth chapter introduces three problems related algorithms: ( forward algorithm, Viterbi algorithm, forward-backward algorithm); Finally, thanks to Mr. Feng Zhiwei's Natural language processing tutorial, Feng Lao Studies Natural Language for several 10, in this field do not have achievements . ( This article original, reproduced annotated source : forward algorithm to solve the hidden Markov model likelihood degree problem )

Directory

"Natural language Processing: Markov model (i)": initial knowledge of Markov and Markov chains

"Natural language Processing: Markov model (ii)": Markov model and hidden Markov model

"Natural language Processing: Markov model (three)": A forward algorithm to solve the likelihood problem of hidden Markov model

Markov personal Profile

Andre Markov, Russian, PhD in physics-Mathematics, academician of St. Petersburg Academy of Sciences, representative of the School of Mathematics, is known for his work on number theory and probability theory, and his main works are "probabilistic calculus". 1878, won the gold medal, 1905 was awarded the title of Merit Professor. Markov is the representative figure of the School of Mathematics in Petersburg. Known for its work in number theory and probability theory. His main works are "probability calculus" and so on. In number theory, he has studied the continuous fraction and the two-time indefinite theory, which solves many problems. In probability theory, he developed the Matrix method and extended the application scope of the law of large numbers and the central limit theorem. The most important work of Markov is the Markov chain, a general scheme which can be used to study the natural process by means of mathematical analysis in 1906-1912 years. At the same time, the study of a random process-Markov process with no effect is initiated. It is found that the state of the nth conversion in a system is often determined by the results of the previous (n-1) test, as observed by several experiments. After the study of Markov, it is pointed out that, for a system, the transfer probability exists in the conversion process from one state to another, and the transfer probability can be deduced from the former state, which is independent of the original state of the system and the Markov process before the transfer. Markov chain theory and methods have been widely used in natural science, engineering technology and public utilities in modern times.

1 Forward Algorithm principle description

forward algorithm solved : Problem 1 (likelihood problem): Give a hmmλ= (a, B) and an observation sequence O to determine the likelihood of the observed sequence p= (o|λ).

For the Markov chain, the surface observation and actual hiding is the same, only need to mark the number of ice cream "3 1 3" state, the weighted (ARC) of the corresponding probability multiplied. But the hidden Markov model is not so simple, because the state is hidden, we do not know what the hidden state sequence is?

To simplify the problem : if we know the hot weather and know the amount of ice cream that Xiao Ming eats, we observe the sequence likelihood. For example, for a given hidden state sequence "hot cold" we calculate the output likelihood of the observation sequence "3 1 3".

How is it calculated? First of all, in the hidden Markov model, each hiding state only produces a single observation that is one by one mappings, the hidden state sequence is the same length as the observation sequence, i.e. given this one-on mapping and the Markov hypothesis, the likelihood of a particular hidden state sequence and an observation sequence observation sequence is:

So from the hidden state "hot cold" to the ice cream observation sequence "3 1 3" The Forward probability is:

P (3 1 3|hot hot Cold) =p (3|hot) *p (1|hot) *p (3|cold) =0.4*0.2*0.1=0.008

In fact, the hidden state sequence "hot cold" is our hypothesis, we don't know the hidden state sequence, we have to consider all possible weather sequences, so that we will calculate all possible joint probabilities, the calculation will become particularly complex.

Let's calculate the combined probability of the weather sequence Q producing a specific ice cream event sequence o:

If only one of the hidden sequences is "hot cold", then our ice cream Watch "3 1 3" and a possible hidden state of the joint probability of "hot cold" are:

P (313|hothotcold) =p (Hot|start) *p (hot|hot) *p (hot|cold) *p (3|hot) *p (1|hot) *p (3|cold) =0.8*0.7*0.30.4*0.2*0.1= 0.001344

P (3 1 3) = P (313| cold Cold Cold) + P (313| cold Cold Hot) + p (313| hot and Cold) + p (313| cold hot Cold) + p (313|hot cold C Old) + p (313| hot Hot) + p (313| hot cold Hot) + p (313| cold Hot)

For the N hidden states and the T observation sequence, there will be a possible hidden sequence, in practice, T is often very large, such as text processing may have tens of thousands of hundreds of thousands of words, the computational amount will be exponential rise. In the hidden Markov model, a forward algorithm effectively replaces the exponential growth complex algorithm, which greatly reduces the complexity. The experiment proves that the complexity of the forward algorithm is.

2 example analysis of forward algorithm

The forward algorithm is a dynamic programming algorithm that uses a table to store intermediate values when the probability of observing a sequence is obtained. The forward algorithm also calculates the observed probabilities using the method of summing the probabilities on the paths that generate all possible hidden states of the observation sequence. The view sequence is represented horizontally in the forward algorithm, and the state sequence is represented vertically.

is an example of a forward grid for a given hidden state sequence "hot cold" to calculate the likelihood of a "3 13" observation sequence. which

Transverse: Observation sequence on time, longitudinal: State Sequence on Space box: Observing State Circle:         hiding State: Invalid transfer       value on solid line: weighted probability

Each cell represents the probability of a given automaton λ, after the preceding T-observation, in the state J:, wherein the t state is the probability of State J. such as: the probability that the state 1 is the number 3 o'clock, Q1.

The 3 factors of the above formula:

The forward grid is as follows:

Forward probabilities in time 1 and State 1:

(The likelihood of eating 3 ice cream from the state cold 0.02)

Forward probabilities in time 1 and State 2:

(The likelihood of eating 3 ice cream from the state hot 0.32)

Forward probabilities in time 2 and State 1:

(from start to cold to cold and from start to hot to cold weather conditions, eat ice cream 3 1 observation likelihood 0.54)

Forward probabilities in time 2 and State 2:

(from start to cold to hot and from start to hot to hot weather conditions, eat ice cream 3 1 observation likelihood 0.0464)

In the same way, we can calculate the forward probability of time step 3 and state Step 1, as well as the probability of time step 3 and state Step 2, and so on, until the end. It is obvious that using the forward algorithm to calculate the observed likelihood can represent the local observation likelihood. This kind of local observation likelihood is more useful than the global observation likelihood expressed by using joint probability.

3 forward algorithm definition

Recursive definition of the forward algorithm:

4 references

"1" The basic christopher.manning of natural language processing, such as the Law of Wan Chun

"2" A concise tutorial on natural language processing Feng Zhiwei

"3" The beauty of mathematics Wu

"4" Viterbi algorithm analysis article Wang Yachang

Statement : Regarding this article each chapter, I take the comb main, the smooth bright writing technique. A reference to the relevant information two according to their own understanding to comb. Avoid miscellaneous unclear, each article reader can clear core knowledge, and then find relevant literature system reading. Also, learn to extrapolate and not stare at the definition or an example. For example: This article examples of ice cream Quantity (observations) and weather (hidden values), the reader begs to ask what is the use of this? We change the amount of ice cream into Chinese text or voice (observation sequence), changing the hot and cold weather into English text or phonetic text (hidden sequence). To solve this problem is not to solve the text translation, speech recognition, natural language understanding and so on. Solve the natural language recognition and understanding, and then apply to the present robot or other equipment, not to achieve practical and contact the purpose of real life? This article original, reprint annotated source : forward algorithm to solve the hidden Markov model likelihood degree problem

"NLP" revealing Markov Model mystery series article (iii)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.