The editing distance of the sentence

Source: Internet
Author: User

In machine translation, sometimes the similarity ratio of sentences is used, in which the calculation of the distance of editing is needed. Most of the data found on the network use the character as the smallest unit of editing distance calculation. In fact, for sentences, words are often more reasonable as the smallest unit of editing distance. With the method of dynamic programming, we can easily realize the calculation of editing distance.

It is important to note that the recursion boundary is a problem. That is, when the sentence to be translated, there will be 0-0,0-1,..., 0-n (n is the candidate sentence contains the number of words), in which case the number of changes we can know is, 0,1,....,n. Similarly, if the candidate sentence after deletion, the length of the remaining 0, then there will be 0-0,1-0,2-0,...,m-0 match (M is to be translated sentence contains the number of words), in this case, the number of changes is 0,1,...,m.
So at the beginning of the algorithm to do an array initialization, initialize the results of these known operation times.

In the following code, I try to write a recursive and non-recursive method.
Editdistancereverse is recursive, editdistance is a non-recursive method. For Chinese sentences, it is best to add a custom word segmentation algorithm. The reason I said it at the beginning.

//EditDistance.cpp: Defines the entry point of the console application. //#include"StdAfx.h"#include <string> #include <iostream> #include <Vector> UsingnamespaceSTD;intdist[ -][ -];intEditdistance (conststringPattern[],intPattern_size, conststringCandidate[],intCandidate_size) {intR1 =0;intr2 =0;intR3 =0;intI,j;//Because 0-0 editing distance is 0,0-1 to 1, and so on     for(i =0; I <= candidate_size; i++) dist[0][i] = i; for(i =0; I <= pattern_size; i++) dist[i][0] = i; for(i =1; I <= pattern_size; i++) { for(j =1; J <= Candidate_size; J + +) {r1 = dist[i-1][J] +1;//Deleter2 = dist[i][j-1] +1;//Insert            intDelta = (Pattern[i-1]! = candidate[j-1] ?1:0); R3 = dist[i-1][j-1] + Delta;int min= R1;min=min> r2? R2:min;min=min> r3? R3:min; DIST[I][J] =min; }    }returnDist[pattern_size][candidate_size];}intEditdistancecore (conststringPattern[],intPattern_size, conststringCandidate[],intCandidate_size) {intR1 =0;intr2 =0;intR3 =0;intI,j;if(Pattern_size = =0|| Candidate_size = =0)returnDist[pattern_size][candidate_size];if(!dist[pattern_size-1][candidate_size]) Dist[pattern_size-1][candidate_size] = Editdistancecore (Pattern, pattern_size-1, candidate, candidate_size); R1 = dist[pattern_size-1][candidate_size] +1;//Delete    if(!dist[pattern_size][candidate_size-1]) Dist[pattern_size][candidate_size-1] = Editdistancecore (pattern, pattern_size, candidate, Candidate_size-1); r2 = dist[pattern_size][candidate_size-1] +1;//Insert    intDelta = (Pattern[pattern_size-1]! = candidate[candidate_size-1] ?1:0);if(!dist[pattern_size-1][candidate_size-1]) Dist[pattern_size-1][candidate_size-1] = Editdistancecore (Pattern, pattern_size-1, candidate, Candidate_size-1); R3 = dist[pattern_size-1][candidate_size-1] + Delta;int min= R1;min=min> r2? R2:min;min=min> r3? R3:min; Dist[pattern_size][candidate_size] =min;return min;}intEditdistancereverse (conststringPattern[],intPattern_size, conststringCandidate[],intCandidate_size) {intI,j;//Because 0-0 editing distance is 0,0-1 to 1, and so on     for(i =0; I <= candidate_size; i++) dist[0][i] = i; for(i =0; I <= pattern_size; i++) dist[i][0] = i;returnEditdistancecore (pattern, pattern_size, candidate, candidate_size);}int_tmain (intARGC, _tchar* argv[]) {stringPattern[] ={"I","Love","Baby","Me2"};stringCandidate[] = {"I","Love","Me"}; cout << editdistancereverse (pattern, sizeof (pattern)/sizeof (string), candidate, sizeof (candidate)/sizeof (string));//cout << editdistance (pattern, sizeof (pattern)/sizeof (string), candidate, sizeof (candidate)/sizeof ( string));    return 0;}

The editing distance of the sentence

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.