Using the minimum editing distance algorithm to find the similarity of strings

Source: Internet
Author: User
Tags min

Edit distance (edit Distance), also known as Levenshtein distance, is the minimum number of editing operations required to turn between two strings, from one to another. Permission edits include replacing one character with another character, inserting a character, and deleting a character.

The following illustration is a two-dimensional plot that calculates the distance between the characters "beauty" and "Batyu":


Detailed steps: 1) first create a two-dimensional table (matrix), as shown in the figure above. The position in the matrix (0,0) does not correspond to letters.

2) The value of the computed matrix (that is, the place where the red circle 1 is located), which is a convenient description, defined as a point.

The value of point a needs to be determined by the value of the top left, left, and top of point A. For the convenience of describing the three values required for point A, assign to variable A, then A= (0,1,1)

A point corresponding to the letter is (B,B), the same letter, then a point in the upper left corner of the value plus 0 (different then add 1), a point to the left and the upper value is added 1 respectively.

At this time a= (0,2,2), take A to do a small value to fill a point position, see the right figure.

The matrix position (i.e., the position of the red Circle 2), defined as point B.

The b point is assigned a value of b= (1,0,2). Because the letter of the B point corresponds to (b,e), the letter is different, the value of the upper left corner of Point B is added 1, and the upper left side of Point B is added 1 respectively.

At this point b= (2,1,3), take the minimum value in B to assign to point B.

3) Find the values in each lattice according to step 2. After all values are calculated, the value in the lower-right corner is the minimum editing distance.

Note: Both the left and the top sides need to be added 1 regardless of the letter (b,b) or the letter (b,e) corresponding to point A in the above step.


Implementation of the Java version:

private static int compare (string str, string target) {

		int d[][];//matrix

		int n = str.length ();

		int m = Target.length ();

		int i; Iterate int J of STR;//traversal of the

		target

		char ch1;//str

		char CH2;//target

		int temp;//Record the same character, increment at a matrix position value, not 0 1

		if (n = = 0) {

			return m;

		}

		if (M = = 0) {

			return n;

		}

		D = new Int[n + 1][m + 1];

		for (i = 0; I <= N; i++) {//Initialize first column

			d[i][0] = i;

		}

		for (j = 0; J <= M; j + +) {//Initialize first line

			d[0][j] = j;

		}

		for (i = 1; I <= n; i++) {//traverse str

			ch1 = Str.charat (i-1);

			To match target for

			(j = 1; j <= M; j + +) {

				CH2 = Target.charat (j-1);

				if (ch1 = = CH2) {

					temp = 0;

				} else {

					temp = 1;

				}

				Left +1, top +1, upper left corner +temp min

				d[i][j] = min (D[i-1][j] + 1, d[i][j-1] + 1, d[i-1][j-1]

				+ temp);

			}

		}

		return d[n][m];

	}


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.