The predictive problem of CRF is that given model parameters and input sequence (observation sequence) x, the output sequence (marker sequence) with the most conditional probability is $y ^*$, that is, the observed sequence is labeled. Conditional random field prediction algorithm with HMM or Viterbi algorithm, according to the CRF model can be:
\begin{aligned}
y^* &= \arg \max_yp_w (y|x) \ \
&= \arg \max_y\frac{\exp \left \{w \cdot F (y,x) \right\}}{z_w (x)} \ \
&= \arg \max_y \exp \left \{w \cdot F (y,x) \right\} \ \
&= \arg \max_y \ w \cdot F (y,x)
\end{aligned}
Therefore, the conditional random field prediction problem becomes the best path problem to seek the most non-normalized probability.
\[\arg \max_y \ w \cdot F (y,x) \]
Note that it is only necessary to calculate the non-normalized probabilities, without having to calculate probabilities, which can greatly improve the efficiency. To solve the optimal path, the optimization target is written in the following form:
\[\max_y \sum_{i=1}^n w \cdot f_i (y_{i-1},y_i,x) \]
which
\[f_i (y_{i-1},y_i,x) = \left \{f_1 (y_{i-1},y_i,x), f_2 (y_{i-1},y_i,x),..., f_k (y_{i-1},y_i,x) \right \}^T\]
is a local feature vector.
The Viterbi algorithm is described below. The non-normalized probabilities of j=1,2,... and M are calculated first for each marker of position 1:
\[\delta_1 (j) = W \cdot f_1 (y_0 = start,y_1 = j,x) \]
Generally, by the recursive formula, the individual markers of position I are calculated $l =1,2,... The maximum value of the non-normalized probability of the m$, and the path to the maximum value of the denormalized probability is recorded:
\begin{aligned}
\delta_i (l) &= \max_{1 \le j \le m} \left \{\delta_i (l-1) + w \cdot f_i (y_{i-1} = j,y_i = L,x) \right\}, &\ l= 1 , 2,..., m\\
\psi_i (L) &=\arg\max_{1 \le J \le m} \left \{\delta_{i-1} (L) + w \cdot f_i (y_{i-1} = j,y_i = L,x) \right\}, & \ L =,..., m
\end{aligned}
Until I = n is terminated. At this point the maximum value of the non-normalized probability is
\[\max_y (w \cdot F (y,x)) = \max_{1 \le j \le m} \delta_n (j) \]
And the end of the optimal path
\[y_n^* = \arg \max_{1 \le j \le m} \delta _n (j) \]
From the end of the optimal path, the optimal value of each moment is constantly found:
\[y_i^* = \psi_{i+1} (y^*_{i+1}), \ i = n-1,n-2,..., 1\]
The above is an optimal path, the optimal path is obtained:
\[y^* = (y_1^*,y_2^*,..., y_n^*) ^t\]
This is the Viterbi algorithm that is predicted by the conditional with the airport.
Getting Started with conditional airport (iv) prediction algorithm for conditional random Airport