Basic introduction
Levenshtein distance is a string measure (string metric) that calculates the degree of difference between two strings. We can assume that the Levenshtein distance is the minimum number of times required to edit a single character (such as modify, insert, delete) when modifying from one string to another. Russian scientist Vladimir Levenshtein introduced this concept in 1965.
Simple example
Modifying the string "Kitten" to the string "sitting" requires only 3 single-character edit operations, as follows:
- Sitten (K-s)
- Sittin (e-i)
- Sitting (_-G)
Therefore, the Levenshtein distance between "kitten" and "sitting" is 3.
Realize the idea
How to implement this algorithm programmatically? Many people try to use matrices to explain, but in fact the matrix is the final visual tool, with understanding "why" is more convenient, but from the matrix is more difficult to think of "How to do."
We tried to find the sub-solution structure of the problem of "modifying from string $a$ to String $b$". Of course, "Modify from string $b$ to string $a$" and it is the same problem, because the deletion of a character from $a$ to match $b$, is equivalent to inserting a character in $b$ to match $a$, the two operations can be converted to each other.
assuming that the character sequence $a[1\dots i]$, $B [1\dots j]$ are string $a$, $B $ of the former $i$, $j $ Characters of the substring, we get a sub-problem that is "modified from string $a[1\dots i]$ to string $b[1\dots j] $ ":$$\left[\begin{matrix}\begin{align*}&a:&&a[1]&&a[2]&&\cdots&&a [I-2]&&a[i-1]&&a[i]\\\\&b:&&b[1]&&b[2]&&\cdots&&b[j-2] &&b[j-1]&&b[j]\end{align*}\end{matrix}\right]$$
① insert Operation :
-
- When modifying $a[1\dots i]$ to $b[1\dots j-1]$ requires an operand of $op_1$, then I insert a character $a[i ']=b[i]$ to $a[i]$ and $a[i+1]$ to match $b[i]$, so $a[1\dots i]$ The number of operations required to modify to $b[1\dots j]$ is $op_1+1$. $$\LEFT[\BEGIN{MATRIX}\BEGIN{ALIGN*}&&\CDOTS&&\COLOR{RED}{A[I-2]}&&\COLOR{RED}{A[I-1]} &&\mathbf{\color{red}{a[i]}}&&\mathbf{\color{blue}{a[i ']}}&&\\\\&&\cdots& &\color{red}{b[j-2]}&&\mathbf{\color{red}{b[j-1]}}&&\mathbf{\color{blue}{b[j]}}&&\ phi&&\end{align*}\end{matrix}\right]$$
② Delete operation :
-
- When modifying $a[1\dots i-1]$ to $b[1\dots j]$ requires an operand of $op_2$, then I delete the character $a[i]$ can also $op_2+1$ the operand to make two substrings match: $$\left[\begin{matrix}\begin {align*}&&\cdots&&\color{red}{a[i-2]}&&\mathbf{\color{red}{a[i-1]}}&&\mathbf{\ COLOR{BLUE}{\PHI}}&&\\\\&&\CDOTS&&\COLOR{RED}{B[J-2]}&&\COLOR{RED}{B[J-1]} &&\mathbf{\color{red}{b[j]}}&&\end{align*}\end{matrix}\right]$$
③ Modify the Operation :
- If $a[1\dots i-1]$ is modified to $b[1\dots j-1]$ the required operand is $op_3$, I replace the character $a[i]$ with $a[i ']=b[j]$, and the operand of $op_3+1$ can be completed: $$\left[\begin{ matrix}\begin{align*}&&\cdots&&\color{red}{a[i-2]}&&\mathbf{\color{red}{a[i-1]}}& &\mathbf{\color{blue}{a[i ']}}&&\\\\&&\cdots&&\color{red}{b[j-2]}&&\mathbf{ \color{red}{b[j-1]}}&&\mathbf{\color{blue}{b[j]}}&&\end{align*}\end{matrix}\right]$$
- However, if the character $a[i]==b[j]$ at this time, no modification is required and the operand is still $op_3$.
In summary, we change the string $a[1\dots i]$ to string $b[1\dots j]$ the required action is $min\{op_1+1,\ op_2+1,\ op_3+1_{(a_i\neq b_i)}\}$, where $1_{(A_i\neq b_i }$ represents the value $1$ when $a_i\neq b_i$, otherwise the value is $0$.
Mathematical definition
Mathematically, we defined the Levenshtein distance between the two strings $a$ and $b$ to $lev_{a,\ B} (a,\ B) $, where $a$, $b $ were string $a$, $B $ length, and $ $lev _{a,\ B} (i,\ j) =\left\{\ Begin{matrix}\begin{align*}&i&&,\ j=0\\&j&&,\ i=0\\&min\left\{\begin{matrix}lev_{a,\ b} (i,\ j-1) +1\\lev_{a,\ B} (i-1,\ j) +1\\lev_{a,\ B} (i-1,\ j-1) +1_{(a_i\neq b_i)}\end{matrix}\right.&&,\ otherwise\end{align*}\end{matrix}\right.$$
Please refer to wikipedia-levenshtein_distance for more information.
C + + code
With the state transition equation, we can happily DP up, time complexity $o (MN) $, Space complexity $o (MN) $.
1#include <stdio.h>2#include <string.h>3#include <algorithm>4 usingstd::min;5 intLena, LenB;6 Chara[1010], b[1010];7 voidRead () {8scanf"%s%s", A, b);9Lena =strlen (a);TenLenB =strlen (b); One } A - intdp[1010][1010]; - voidWork () { the for(intI=1; i<=lena; i++) dp[i][0] =i; - for(intj=1; j<=lenb; J + +) dp[0][J] =J; - for(intI=1; i<=lena; i++) - for(intj=1; j<=lenb; J + +) + if(a[i-1]==b[j-1]) -DP[I][J] = dp[i-1][j-1]; + Else ADp[i][j] = min (dp[i-1][j-1], Min (dp[i][j-1], dp[i-1][J]) +1; atprintf"%d\n", Dp[lena][lenb]); - } - - intMain () { - read (); - Work (); in return 0; -}
Several small optimizations
1. If the $a[i]==b[j]$ (subscript starting from $1$) is satisfied, you can actually take the $lev (i,\ j) =lev (i-1,\ j-1) $ directly. Because the characters are the same at this time, no editing action is required. This optimization can also be derived from the unequal relations of the above-mentioned transfer equations.
2. If you use a scrolling array, the spatial complexity can be reduced to $o (2*max\{m,\ n\}) $. However, you can also save $lev (i-1,\ j-1) $ to reduce the complexity of the space to $o (max\{m,\ n\}) $, as follows:
1 intdp[1010];2 voidWork () {3 for(intj=1; j<=lenb; J + +) Dp[j] =J;4 intT1, T2;5 for(intI=1; i<=lena; i++) {6T1 = dp[0]++;7 for(intj=1; j<=lenb; J + +) {8t2 =Dp[j];9 if(a[i-1]==b[j-1])TenDP[J] =T1; One Else ADp[j] = min (t1, min (dp[j-1], Dp[j]) +1; -T1 =T2; - } the } -printf"%d\n", Dp[lenb]); -}
The above is the basic introduction of the Levenshtein distance algorithm, if you like, please order a recommendation ~ ~ If you have valuable comments, welcome to the comments below the area proposed OH ~
This article is based on the Creative Commons Attribution-NonCommercial use-Shared 4.0 International License Agreement published, welcome to quote, reprint or deduction, but must retain the attribution Blackstorm and this article link http://www.cnblogs.com/BlackStorm/p/ 5400809.html and cannot be used for commercial purposes without permission. Please contact me if you have any questions or authorization to negotiate.
string editing distance (Levenshtein distance) algorithm