Overview
background
In the actual sequence learning scenario, the label of the sequence is often encountered in the data with no segmentation and noise, such as the need to convert the acoustic signal into words or sub-words in speech recognition.
In this paper, a new RNN training method is proposed, which can generate the corresponding label for the sequences without segmentation. the problem of traditional methods
There are some problems in traditional methods, such as HMM,CRF, which can solve the problem of sequence tagging, but they also have the corresponding problem:
1. A large amount of domain-related knowledge is usually required, such as defining the relevant label, selecting the input characteristics of the CRF, and so on.
2. The need is based on a certain correlation hypothesis, for example, the observed sequences in hmm are independent of each other.
3. For Hmm, training is produced, even if the sequence tagging problem is a discriminant problem. The RNN Advantage does not require any prior knowledge of the data, except to select the format for defining input and output. It has strong feature expression ability, can model the problem well, and can represent discriminant model. The noise problem is handled more robustly. Models can be modeled using long-range seququence. effect
The effect is better than hmm and HMM-RNN model. Specific Ideas
The output of the network is expressed as the probability distribution of all possible label sequences, and the probability distribution can be used to define the probability that the objective function will directly maximize the correct label sequence.
Because the objective function can be guided, the standard BP method can be used for training. Temporal Classification Mark
s S represents a training sample extracted from a distributed dxxz d_{x \times Z}
The input space x= (Rm) ∗x = (r^m) ^* is the real vector of all M-M dimensions.
The target space z=l∗z = L ^* is a sequence space consisting of all the label (finite).
We call the elements in the l∗l^* as label sequences or labellings.
Any of a sample s∈s,s= (x,z) s \in s,s= (x,z),
Target sequence z= (z1,z2,..., zU) z = (z_1, z_2, ..., Z_u),
Input sequence x= (x1,x2,..., xT) x = (x_1, x_2,..., x_t)
of which u<=t U
The goal is to use S S to train a temporal classifier s.t. H:x−>z h:x-> Z Label error Rate metric Error
Given a test set S′⊂DXXZ s^\prime \subset d_{x \times Z}
Defines the label error rate for h h (LER = label error Rate) is the editing distance between the standardized classification results and the target on the s′s ^ \prime DataSet.
LER (h,s</