Conditional random field (CRF)-4-learning methods and predictive algorithms (Viterbi algorithm)

Source: Internet
Author: User

Statement:

1 , this article is for individuals on the li Hang . Statistical Learning Methods . pdf "Study summary, not used for commercial, welcome reprint, but please indicate the source (ie: this address).

2 , because I learned at the beginning of a lot of math knowledge has been forgotten, so in order to understand the contents of a lot of information, so there should be a reference to other posts in a small part of the content, if the original author can see a private message I, I will be your post address to the following.

3 , if the content is wrong or inaccurate, please correct me.

4 , it would be great if I could help you.

Learning methods

The conditional random-field model is actually a logarithmic linear model defined on time series data, and its learning methods include maximum likelihood estimation and regularization.

There are improved iterative scale methods, such as IIS, gradient descent method and Quasi-Newton method, for the optimization algorithm.

Improved iterative scale method (IIS)

Known training data set, with empirical probability distribution

Model parameters can be obtained by the logarithmic likelihood function of the maximal training data.

The log likelihood function of the training data is

When PW is a

When the conditional random field model is given, the logarithmic likelihood function is

IIS continuously optimizes the lower bound of the logarithmic likelihood function by iterative method, and achieves the purpose of maximal logarithmic likelihood function.

Assuming that the model's current parameter vector is w= (w1,w2, ..., WK) T, the increment of the vector is δ= (δ1,δ2, ..., ΔK) T, the update parameter vector is W +δ= (w1+δ1, W2 +δ2, ..., WK +δk) T. During each step of the iteration, IIS obtains δ= (Δ1,δ2, ..., ΔK) T by solving the following 11.36 and 11.37 at a time.

The updated equation about transfer feature TK is:

The update equation for the state feature SL is:

Here t (x, y) is a synthesis of all the features that appear in the data (x, y):

So the algorithm is organized as follows.

Algorithm: An improved iterative scale method for the study of conditional random airport model

input : Feature function T1,t2, ..., tk1,s1, S2, ..., sK2; experience distribution

output : parameter estimate; model.

Process :

Quasi-Newton method

For the conditional random airport model

The optimization objective function of learning is

Its gradient function is

The BFGS algorithm of quasi-Newton method is as follows:

Algorithm: BFGS algorithm of conditional random-airport model learning

Predictive algorithms

The conditional random field prediction problem is to give the defined conditions to the airport P (y| x) and the input sequence (observation sequence) x, the output sequence (marker sequence) with the most conditional probability is y*, that is, the observed sequence is labeled.

The prediction algorithm of conditional random field is the famous Viterbi algorithm .

Before introducing the Viterbi algorithm, I first described it in popular language.

Let's say we're having this problem:

In college you ask your roommate to bring you a meal (if you go to college, don't tell me you don't do it ...), and your roommate asks what you want to eat? You answer: "What do you eat I eat, but come back with a bottle of Sprite, thank you." So the interesting thing happened: your roommate brought you back rice and sprite and cheerfully said: "I went to the canteen for the chef, the cashier of the kiosk replaced by a beautiful sister!!" "Then you ask him:" which cafeteria and canteen did you go to? He wanted to answer, "you guessed." ”

Okay, guess what?

I guess your sister. (╯‵-′) ╯︵┻━┻

Well, don't panic. The table, no matter how your roommate helped you bring food, so we will meet his little mischief, as is to give him a hard money to run errands.

PS: Suppose your school has 2 kiosks and 2 canteens.

So, Mission start!

First, ask him: you go to the canteen first? He replied: Yes.

OK, the order to buy things is done.

Then you start to think:

The first step: from the dormitory to the kiosk A and B are almost the same distance, bad judgment;

The second step: from the Kiosk A, b to the canteen 1, 2 There are four routes (A1, A2,B1, B2), and then these four routes to the canteen 1 the shortest is B1, to the canteen 2 the shortest is A2;

Step three: Look at the food he brought, uh ... This meal I remember canteen 1 sell, canteen 2 do not know, if not, then assume he went to the canteen 1;

Fourth step: Since he went to the canteen 1, according to the habit of the goods, Cliff Select the nearest canteen, so he will choose from the canteen 1 the nearest kiosk B;

Fifth step: Said to him: "You go to the canteen B and then go to the cafeteria 1," he said: "My times, how you know."

Well, for example, let's take a look at the Viterbi algorithm , in fact, the Viterbi algorithm is the above process: the largest possible (from the starting point to the first set of destinations) before the first step, Each possibility from the first destination collection to the second destination collection, and so on), and then after determining the end point in turn selects the previous path (the final destination is cafeteria 1, that's because the roommate chooses to go to the cafeteria 1 route B1 is more likely than the route A1, so follow the route, the end of the previous destinations collection, The roommate chooses the Kiosk B, and so on), and finally determines the optimal path.

Now that you have a concept of what the Viterbi algorithm is, let's look at its mathematical description.

In the example above you will find that the determination of "each path possibility" is very important. That being the case, we have to look at the "every path possibility" of the airport, i.e. the local eigenvector of the condition with the airport .

By

You can get:

Therefore, the conditional random field prediction problem becomes the best path problem to seek the most non-normalized probability.

Here, the path represents a sequence of tokens, where

Note that it is only necessary to calculate the non-normalized probabilities, without having to calculate the introduction, which can greatly improve the efficiency.

In order to solve the optimal path, the 11.52 format is written as follows:

which

is the local feature vector.

The Viterbi algorithm is described below .

The non-normalized probabilities of each marker of position 1 j=1,2,..., m are first calculated:

Ps:m refers to the number of positions 1, such as the above example in position 1 only 2 (kiosk 1 and Kiosk 2), that M is 2. Here's the same.

In general, there are recursive formulas that find the maximum values of the non-normalized probabilities of the l=1,2,..., m for each marker of position I, along with the path to record the maximum value of the denormalized probability:

Terminates until I=n. At this point the maximum value of the non-normalized probability is:

The end of the optimal path

Return from the end of this optimal path

Finding the optimal path

Y*= (y1*, y2*, ..., yn*) T.

To sum up, we get the Viterbi algorithm with the forecast of the conditional airport.

Algorithm: Viterbi Algorithm for conditional random field prediction

The Viterbi algorithm is illustrated below with an example.

Example

For the previously used example 11.1 (for example), the Viterbi algorithm for the given input sequence (observation sequence) x corresponding output sequence (marker sequence) y* = (y1*, y2*, y3*);

Solution :

Here are the steps, which I'll explain later.

explanation :

First, example 11.1 is actually a narrative:

First Step initialization :

Since Y1=1 and y1=2 are original, it is not necessary to consider the transfer situation (in fact there is no "expression from y0 to Y1" of the T Function (transfer function)), directly from the state function (s function), found that S1 and S2 respectively corresponding to y1=1 and y1=2, indicating Y1=1 and y1= 2 This state is present, and the weights of S1 and S2 are 1 and 0.5 respectively, and the S functions listed above and the T function values are 1, so the probability of y1=1 and y2=1 are 1 and 0.5 respectively.

Therefore, the maximum number of non-normalized probabilities to reach Y1 is: δ1 (1) = 1,δ1 (2) = 0.5.

Second step recursion :

i=2 (up to the second destination collection {y2=1, y2=2}):

The first is the route (which only describes the situation where the y2=1 is reached):

The route to Y2=1 is as follows:

Route 1: Depart from y1=1----(via T2)----> Arrive y2=1;

Route 2: Depart from y1=2----(via T4)----> Arrive y2=1;

Then there is the state (only the case where Y2=1 is reached):

According to the topic: i=2 state function only S2 and S3, and y2=1 corresponding function is S3

So the maximum number of non-normalized probabilities to reach y2=1 is:

Δ2 (1) = max{1+λ2t2 + u3s3,0.5 +λ4t4 + u3s3}= 2.4

PS : I've added a few more to the original question here. U3S3 , because using the idea above can explain all the other Δ I If there is a problem with this idea, please let us know and appreciate it.

The path to the non-normalized probability maximum value is:

ψ2 (1) = 1

Δ2 (2) in the same vein.

I=3 is the same (just for Δ3(1) in the u5s5, I think it should be u3s3, first not to say S3 the corresponding is y3=1 in the original question, and there is no S5 function).

Part III termination :

This step is simple, in δ3 (L) δ3 (1) = 4.3 maximum, so y3 1 is the most likely, that is y3*=1.

The Fourth step returns :

And then the reverse pushes:

Which value from Y2 to Y3 is the most likely? In the second part has been solved: ψ3 (1) = 2, that is, Y2 reached the y3=1 of the route of the most weighted value is y2=2, that is, y2*=2.

Similarly, from Y1=1 to y2=2 the most likely, namely y1*=1.

Finally, we get the tag sequence.

Y*= (y1*, y2*, y3*) = (1, 2, 1)

Conditional random field (CRF)-4-learning methods and predictive algorithms (Viterbi algorithm)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.