String Matching:levenshtein Distance
- Purpose:to use as little effort to convert one string into the other
- Intuition behind the method:replacement, addition or deletion of a charcter in a string
- Steps
Step |
Description |
1 |
Set N to is the length of S. Set m to is the length of T. If n = 0, return m and exit. If m = 0, return n and exit. Construct a matrix containing 0..m rows and 0..N columns. |
2 |
Initialize the first row to 0..N. Initialize the first column to 0..M. |
3 |
Examine each character of s (i from 1 to n). |
4 |
Examine each character of T (J from 1 to M). |
5 |
If s[i] equals t[j], the cost is 0. If s[i] doesn ' t equal t[j], the cost is 1. |
6 |
Set Cell d[i,j] of the matrix equal to the minimum of: A. The cell immediately above plus 1:d[i-1,j] + 1. B. The cell immediately to the left plus 1:d[i,j-1] + 1. C. The cell diagonally above and to the left plus the cost:d[i-1,j-1] + cost. |
7 |
After the iteration steps (3, 4, 5, 6) was complete, the distance was found in cell d[n,m]. |
This section shows how the Levenshtein distance was computed when the source string was "GUMBO" and the target string is "GA Mbol ".
Steps 1 and 2
|
|
G |
U |
M |
B |
O |
|
0 |
1 |
2 |
3 |
4 |
5 |
G |
1 |
|
|
|
|
|
A |
2 |
|
|
|
|
|
M |
3 |
|
|
|
|
|
B |
4 |
|
|
|
|
|
O |
5 |
|
|
|
|
|
L |
6 |
|
|
|
|
|
Steps 3 to 6 when i = 1
|
|
G |
U |
M |
B |
O |
|
0 |
1 |
2 |
3 |
4 |
5 |
G |
1 |
0 |
|
|
|
|
A |
2 |
1 |
|
|
|
|
M |
3 |
2 |
|
|
|
| /tr>
B |
4 |
3 |
|
|
|
|
O |
5 |
4 |
|
|
|
|
L |
6 |
5 |
|
|
|
|
Steps 3 to 6 when i = 2
|
|
G |
U |
M |
B |
O |
|
0 |
1 |
2 |
3 |
4 |
5 |
G |
1 |
0 |
1 |
|
|
|
A |
2 |
1 |
1 |
|
|
|
M |
3 |
2 |
2 |
|
|
|
B |
4 |
3 |
3 |
|
|
|
O |
5 |
4 |
4 |
|
|
|
L |
6 |
5 |
5 |
|
|
|
Steps 3 to 6 when i = 3
|
|
G |
U |
M |
B |
O |
|
0 |
1 |
2 |
3 |
4 |
5 |
G |
1 |
0 |
1 |
2 |
|
|
A |
2 |
1 |
1 |
2 |
|
|
M |
3 |
2 |
2 |
1 |
|
|
B |
4 |
3 |
3 |
2 |
|
|
O |
5 |
4 |
4 |
3 |
|
|
L |
6 |
5 |
5 |
4 |
|
|
Steps 3 to 6 when i = 4
|
|
G |
U |
M |
B |
O |
|
0 |
1 |
2 |
3 |
4 |
5 |
G |
1 |
0 |
1 |
2 |
3 |
|
A |
2 |
1 |
1 |
2 |
3 |
&NB SP; |
M |
3 |
2 |
2 |
1 |
2 |
|
B |
4 |
3 |
3 |
2 |
1 |
|
O |
5 |
4 |
4 |
3 |
2 |
|
L |
6 |
5 |
5 |
4 |
3 |
|
Steps 3 to 6 when i = 5
|
|
G |
U |
M |
B |
O |
|
0 |
1 |
2 |
3 |
4 |
5 |
G |
1 |
0 |
1 |
2 |
3 |
4 |
A |
2 |
1 |
1 |
2 |
3 |
4 |
M |
3 |
2 |
2 |
1 |
2 |
3 |
B |
4 |
3 |
3 |
2 |
1 |
2 |
O |
5 |
4 |
4 |
3 |
2 |
1 |
L |
6 |
5 |
5 |
4 |
3 |
2 |
Step 7
The distance is in the lower right hand corner of the matrix, i.e. 2. This intuitive realization "GUMBO" can is transformed into "Gambol" by substituting "A" for "U" corresponds D adding "L" (one substitution and 1 insertion = 2 changes).
Favorite Algorithms (guys)-Levenshtein distance