Maximum common subsequence LCS and string editing distances

Source: Internet
Author: User
Tags character set

The longest common subsequence problem is to find the same sequential character set (which can be discontinuous) that occurs in two strings, and the continuous common subsequence is a common substring problem.

Reference http://www.cnblogs.com/huangxincheng/archive/2012/11/11/2764625.html

The solution is dynamic programming algorithm, understand recursive formula can quickly write code, dynamic programming algorithm is to solve sub-problem overlap scene, continue to use the optimal solution of sub-problem, so generally use the auxiliary matrix to store sub-problem optimal solution (Space complexity O (m*n), can be optimized to O (N)). In this problem, with matrix C[m][n] storage, C[i][j] represents the longest common subsequence length of a substring XI and a substring YJ.



The Java implementation code is as follows:

/** * Use dynamic programming to solve the longest common subsequence of two strings str1 and str2 * Search time complexity O (m*n), auxiliary Matrix O (m*n), backtracking output time complexity O (m+n) * Common subsequence may not be contiguous string * @author pixel *
	*/public class LCS {private static bytes left = 1;
	private static byte up = 2;
	
	private static byte LU = 3;
	Private String str1;
	Private String str2;
	private int matrix[][];
	private byte direction[][];
	private int longest = 0;
	
	Private String sequence = "";  /** * constructor, must pass in two strings * @param str1 string 1 * @param str2 String 2 */Public LCS (String str1, String str2) {THIS.STR1
		= STR1;
	THIS.STR2 = str2;
		public void LCS () {if (str1 = = NULL | | str2 = = NULL | | str1.length () = = 0 | | str2.length () = = 0) return;
		DP ();
		System.out.print ("The longest common sub-sequence is:");
	Sub (str1.length (), str2.length ());
	}/** * Returns the length of the longest common subsequence * @return longest must first call the LCS () calculation, otherwise return 0 */public int getlongest () {return longest;
	}/** * Returns the longest common subsequence * @return sequence must first call the LCS () calculation, otherwise return "*/public String getsequence () {return sequence; } private void DP () {matrix = new Int[str1.length () +1][str2.length () +1];
		Direction = new Byte[str1.length () +1][str2.length () +1];
		When initialized, I=0 j=0, the longest common subsequence is 0 for (int i = 0; I <= str1.length (); i + +) matrix[i][0] = 0;
		for (int i = 0; I <= str2.length (); i + +) matrix[0][i] = 0;
			Dynamic programming, continuous utilization of sub-problem optimal results for (int i = 1; I <= str1.length (); i + +) {for (int j = 1; J <= Str2.length (); j + +)
					{if (Str1.charat (i-1) = = Str2.charat (j-1)) {Matrix[i][j] = matrix[i-1][j-1] + 1;
					DIRECTION[I][J] = LU;
				System.out.println ("str1[" +i+ "]=str2[" +j+ "], direction=lu");
						} else {if (matrix[i][j-1] > Matrix[i-1][j]) {matrix[i][j] = matrix[i][j-1];
					DIRECTION[I][J] = left;
						} else {matrix[i][j] = matrix[i-1][j];
					DIRECTION[I][J] = up;
		}}}} longest = Matrix[str1.length ()][str2.length ()];
	System.out.println ("The longest common subsequence length is:" + longest); }/** * Backtracking output characters in common subsequence * time complexity O (m+n) * @param i Row coordinates in the auxiliary matrix * @param column coordinates in the J auxiliary Matrix */private void sub (int i, int j) {if (i = = 0 | | j = = 0) return;
			if (direction[i][j] = = LU) {sub (i-1, j-1);
			System.out.print (Str1.charat (i-1));
		Sequence + = Str1.charat (i-1);
			} else {if (direction[i][j] = = up) {sub (i-1, j);
			} else {sub (i, j-1);
		}}} public static void Main (string[] args) {String str1 = "Cnblog";
		String str2 = "belong";
		LCS LCS = new LCS (STR1, STR2);
		Lcs.lcs ();
		System.out.println ();
		System.out.println (Lcs.getlongest ());
	System.out.println (Lcs.getsequence ()); }
}

The string editing distance, or the similarity of the string, is similar to the longest common sub-sequence problem. The goal is to add, delete, change a string into another string, requiring the least amount of operation, which is the editing distance between the string and the other string.

The solution is to use a matrix c[m][n] to save the optimal solution of sub-problems, C[i][j] Save the editing distance of the substring XI and YJ, the recursive formula is as follows:

①: When Xi = Yi, then C[i, J]=c[i-1, j-1];
②: when Xi! = Yi, then C[i, J]=min{c[i-1, J-1], C[i-1, J], C[i, J-1]};




Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.