In daily life, the distance we use most often is undoubtedly a Euclidean distance. However, in some special cases, Euclidean distance has obvious defects, such as time series, for example, if the Euclidean distance is used, that is, if distance [I] [J] = (B [J]-A [I]) * (B [J]-A [I]) is used for calculation, the total distance and the distance should be 128. It should be said that the distance is very large, but in fact the image of this sequence is very similar, in this case, someone began to consider finding a new time series distance calculation method, and then proposed the DTW algorithm. This method plays an important role in speech recognition and machine learning convenience.
This algorithm is based on the concept of dynamic programming (DP) and solves the issue of template matching with different pronunciation lengths. In short, it is to build an adjacent matrix to find the shortest path and.
The above two sequences are used as examples. When 10 in A and 2 in B correspond to 2 in A and 10 in B, distance [3] and distance [4] must be very large, which directly leads to the expansion of the final distance and. In this case, we need to adjust the time sequence, if we make 10 in A and 10 in B
Corresponding
In a, 1 corresponds to 2 in B, and the final distance and sum will be greatly shortened. This method can be seen as a time distortion, I believe someone may ask why we cannot use the problem that 2 in A corresponds to 2 in B. In this case, the distance and the distance must be zero. The distance should be the smallest, however, this is not allowed, because 10 in a occurs before 2, while 2 in B occurs before 10, if the corresponding method is crossed, it will lead to time confusion and does not conform to the causal relationship.
Next, we record the DTW distance between A and B with output [6] [6] (all record subscripts start from 1 and all are set to 0 at the beginning, A Brief Introduction to the specific algorithm is actually a simple DP, the state transfer formula is output [I] [J] = min (output [I-1] [J], output [I] [J-1]), output [I-1] [J-1]) + distance [I] [J]; The final output [5] [5] is the DTW distance we need.
Dtw c language implementation
# Include <iostream> <br/> # include <string. h> <br/> using namespace STD; <br/> # define num 5 // For the sake of simplicity, suppose there are as many sample points as the two sequences. <br/> # define min (a, B) (a <B? A: B) </P> <p> int main () <br/> {<br/> int I, J, K; <br/> int A [num], B [num]; <br/> int distance [num + 1] [num + 1]; <br/> int output [num + 1] [num + 1]; </P> <p> memset (distance, 0, sizeof (distance )); <br/> memset (output, 0, sizeof (output); <br/> for (I = 0; I <num; I ++) cin> A [I]; <br/> for (I = 0; I <num; I ++) CIN> B [I]; <br/> for (I = 1; I <= num; I ++) <br/> for (j = 1; j <= num; j ++) <br/> distance [I] [J] = (B [J-1]-A [I-1]) * (B [J-1]-A [I-1]); // calculate the Euclidean distance between a vertex and a vertex </P> <p> for (I = 1; I <= num; I ++) <br/> {<br/> for (j = 1; j <num; j ++) <br/> cout <distance [I] [J] <'\ T'; <br/> cout <Endl; <br/>}// output matrix of the entire Euclidean distance <br/> cout <Endl; <br/> for (I = 1; I <= num; I ++) <br/> for (j = 1; j <num; j ++) <br/> output [I] [J] = min (output [I-1] [J-1], output [I] [J-1]), output [I-1] [J]) + distance [I] [J]; <br/> // DP process, calculate the DTW distance </P> <p> for (I = 0; I <= num; I ++) <br/> {<br/> for (j = 0; j <num; j ++) <br/> cout <distance [I] [J] <'\ T'; <br/> cout <Endl; <br/>}// output the final DTW distance matrix. Output [num] [num] indicates the final DTW distance and </P> <p> return 0; <br/>}