DP5 editing Distance from Edit Distance @ geeksforgeeks

Source: Internet
Author: User
Tags first string

 

Problem:Given two strings of size m, n and set of operations replace (R), insert (I) and delete (D) all at equal cost. find minimum number of edits (operations) required to convert one string into another.

Identifying Recursive Methods:

What will be sub-problem in this case? Consider finding edit distance of part of the strings, say small prefix. let us denote them as [1... i] and [1... j] for some 1 <I <m and 1 <j <n. clearly it is solving smaller instance of final problem, denote it as E (I, j ). our goal is finding E (m, n) and minimizing the cost.

In the prefix, we can right align the strings in three ways (I,-), (-, j) and (I, j ). the hyphen symbol (-) representing no character. an example can make it more clear.

Given strings SUNDAY and SATURDAY. we want to convert SUNDAY into SATURDAY with minimum edits. let us pick I = 2 and j = 4 I. e. prefix strings are SUN and SATU respectively (assume the strings indices start at 1 ). the right most characters can be aligned in three different ways.

Case 1:Align characters U and U. They are equal, no edit is required. We still left with the problem of I = 1 and j = 3, E (I-1, J-1 ).

Case 2:Align right character from first string and no character from second string. we need a deletion (D) here. we still left with problem of I = 1 and j = 4, E (I-1, j ).

Case 3:Align right character from second string and no character from first string. we need an insertion (I) here. we still left with problem of I = 2 and j = 3, E (I, J-1 ).

Combining all the subproblems minimum cost of aligning prefix strings ending at I and j given

E (I, j) = min ([E (I-1, j) + D], [E (I, J-1) + I], [E (I-1, J-1) + R if I, j characters are not same])

We still not yet done. What will be base case (s )?

When both of the strings are of size 0, the cost is 0. When only one of the string is zero, we need edit operations as that of non-zero length string. Mathematically,

E (0, 0) = 0, E (I, 0) = I, E (0, j) = j

Now it is easy to complete recursive method. Go through the code for recursive algorithm (edit_distance_recursive ).

Dynamic Programming Method:

We can calculate the complexity of recursive expression fairly easily.

T (m, n) = T (m-1, n-1) + T (m, n-1) + T (m-1, n) + C

The complexity of T (m, n) can be calculated by successive substitution method or solving homogeneous equation of two variables. it will result in an exponential complexity algorithm. it is evident from the recursion tree that it will be solving subproblems again and again. few strings result in explain overlapping subproblems (try the below program with stringsExponentialAndPolynomialAnd note the delay in recursive method ).

We can tabulate the repeating subproblems and look them up when required next time (bottom up ). A two dimen1_array formed by the strings can keep track of the minimum cost till the current character comparison. the visualization code will help in understanding the construction of matrix.

The time complexity of dynamic programming method isO (mn)As we need to construct the table fully. The space complexity is alsoO (mn). If we need only the cost of edit, we just needO (min (m, n ))Space as it is required only to keep track of the current row and previous row.

Usually the costs D, I and R are not same. in such case the problem can be represented as an acyclic directed graph (DAG) with weights on each edge, and finding shortest path gives edit distance.

Applications:

There are invalid practical applications of edit distance algorithm, refer Lucene API for sample. Another example, display all the words in a dictionary that are near proximity to a given wordincorrectly spelled word.


 

 

Package DP; import java. util. arrays;/*** Edit Distance ** Given two words word1 and word2, find the minimum number of steps required to convert word1 to word2. (each operation is counted as 1 step .) you have the following 3 operations permitted on a word: a) Insert a characterb) Delete a characterc) Replace a character */public class EditDistance {static int [] [] dist = null; public static void main (S Tring [] args) {String word1 = sdfsssjdfhsb; String word2 = cvdsfadfgkdfgj; dist = new int [word1.length () + 1] [word2.length () + 1]; for (int [] row: dist) {Arrays. fill (row,-1);} System. out. println (minDistance (word1, word2); System. out. println (minDistance2 (word1, word2);} public static int min3 (int a, int B, int c) {return Math. min (Math. min (a, B), c);} // DP bottom-uppublic static int minDistance (String word1, String word2) {int [] [] distance = new int [word1.length () + 1] [word2.length () + 1]; // boundary condition: if one of the strings is empty, you can add or delete them all the time. for (int I = 0; I <= word1.length (); I ++) {distance [I] [0] = I;} for (int j = 1; j <= word2.length (); j ++) {distance [0] [j] = j;} // recursive, where [I] [j] can be left or top, for (int I = 1; I <= word1.length (); I ++) {for (int j = 1; j <= word2.length (); j ++) {distance [I] [j] = min3 (distance [I-1] [j] + 1, // evolved from distance [I] [J-1] + 1, // Evolved from left distance [I-1] [J-1] + (word1.charAt (I-1) = word2.charAt (J-1 )? 0: 1); // evolved from the top left, consider whether to replace} return distance [word1.length ()] [word2.length ()]; // return the bottom right corner} // recursion, too slow public static int minDistance2 (String word1, String word2) {return rec (word1, word1.length (), word2, word2.length (); // return rec2 (word1, word1.length (), word2, word2.length ();} public static int rec (String word1, int len1, String word2, int len2) {if (len1 = 0) {return len2;} if (len2 = 0) {return len1;} if (word1.charAt (len1-1) = word2.charAt (len2-1) {return rec (word1, len1-1, word2, len2-1);} else {return min3 (rec (word1, len1-1, word2, len2-1) + 1, rec (word1, len1, word2, len2-1) + 1, rec (word1, len1-1, word2, len2) + 1) ;}// Add a global array, save the state, change the time with space DP top-downpublic static int rec2 (String word1, int len1, string word2, int len2) {if (len1 = 0) {return len2;} if (len2 = 0) {return len1;} if (word1.charAt (len1-1) = word2.charAt (len2-1) {if (dist [len1-1] [len2-1] =-1) {dist [len1-1] [len2-1] = rec2 (word1, len1-1, word2, len2-1);} return dist [len1-1] [len2-1];} else {if (dist [len1-1] [len2-1] =-1) {dist [len1-1] [len2-1] = rec2 (word1, len1-1, word2, len2-1);} if (dist [len1] [len2-1] =-1) {dist [len1] [len2-1] = rec2 (word1, len1, word2, len2-1);} if (dist [len1-1] [len2] =-1) {dist [len1-1] [len2] = rec2 (word1, len1-1, word2, len2 );} dist [len1] [len2] = min3 (dist [len1-1] [len2-1] + 1, dist [len1] [len2-1] + 1, dist [len1-1] [len2] + 1); return dist [len1] [len2];}

 

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.