The similarity of a string is defined as the cost to convert a string to another string. Transformations can be inserted, deleted, and replaced by three editing methods, so the cost of conversion is the number of edits to the string.
As a comparison, two ways are used: recursive algorithm and dynamic programming algorithm
Simple recursive approach: simple recursive way is clear, very concise, but time complexity is very high
public static int Editdistance (string srcstr,string destsrc,int srcstart,int descstart) {//If any of the strings end, return if ( Srcstr.length ()-srcstart) ==0 | | (Destsrc.length ()-descstart) ==0) {return Math.Abs ((Srcstr.length ()-srcstart)-(Destsrc.length ()-descstart));} if (Srcstr.charat (Srcstart) ==destsrc.charat (Descstart)) {return editdistance (srcstr,destsrc,srcstart+1,descstart+ 1);} int edins=editdistance (srcstr,destsrc,srcstart,descstart+1) +1;//insert character int eddel=editdistance (SRCSTR,DESTSRC, Srcstart+1,descstart) +1;//Delete character int edrep=editdistance (srcstr,destsrc,srcstart+1,descstart+1) +1;//replace character int min1= Edins>eddel?eddel:edins;return min1>edrep?edrep:min1;}
Now consider the method of dynamic programming to improve. First, define the state of the problem, define the recurrence relationship of the phase and sub-problems from the state transition relationship.
Assuming that the problem is defined as the minimum number of edits (edit distance) required to solve the [1.....N] character of the source converted to TARGET[1.....M] characters, the child problem can be defined as converting source[1......i] characters to target[ 1......J] The minimum number of edits required for a character, which is the optimal substructure for this problem, so we define the state as the editing distance from the substring source[1......i] to the substring TARGET[1......J].
The simple recursive approach is very time-complex because a large number of States are repeated computations. Next, the concept of memos is introduced, and the values of each state are recorded in a two-dimensional table, and the table is prioritized in the recursive process.
/* Improved Recursive method with memo */public static int editdistancebetter (String srcstr,string destsrc,int srcstart,int Descstart, Tagmemorecord[][] trecords) {//First look up the table if (trecords[srcstart][descstart].refcount!=0) {Trecords[srcstart][descstart]. Refcount++;return trecords[srcstart][descstart].distance;} int distance=0;//If any one string ends, return if ((Srcstr.length ()-srcstart) ==0 | | (Destsrc.length ()-descstart) ==0) {Distance=math.abs ((srcstr.length ()-srcstart)-(Destsrc.length ()-descStart));} else if (Srcstr.charat (Srcstart) ==destsrc.charat (Descstart)) {distance=editdistance (srcstr,destsrc,srcstart+1, descstart+1);} else {int edins=editdistance (srcstr,destsrc,srcstart,descstart+1) +1;//insert character int eddel=editdistance (SRCSTR,DESTSRC, Srcstart+1,descstart) +1;//Delete character int edrep=editdistance (srcstr,destsrc,srcstart+1,descstart+1) +1;//replace character int min1= Edins>eddel?eddel:edins;distance= min1>edrep?edrep:min1;} Trecords[srcstart][descstart].distance=distance;trecords[srcstart][descstart].refcount=1;return distance;}
Class Tagmemorecord{int Distance;int refcount;public Tagmemorecord () {this.distance=-1;this.refcount=0;}}
The editing distance of the string