A simple algorithm for calculating the similarity of strings

Source: Internet
Author: User
Tags min stringbuffer

Algorithm Design background:

Recently, the design of knowledge management system resource Import function, in order to achieve as much as possible component, easy to expand, convenient for other modules to use. simplifies the interface provided by the components, and designs and implements the import framework based on the Mapping mechanism. One of the functions used to compute the two-string similarity algorithm, the simple design is as follows:

Design idea:

The two strings into the same basic operation are defined as follows:

1. Modify a character (e.g. change A to B)

2. Add one character (such as Abed into Abedd)

3. Delete one character (such as Jackbllog into Jackblog)

For Jackbllog to Jackblog only need to delete one or add one L can change two strings to be the same. The number of times required for this operation is defined as the distance L of two strings, then the similarity is defined as 1/(l+1), i.e. the reciprocal of the distance plus one. So the similarity between Jackbllog and Jackblog is 1/1+1=1/2=0.5 and the similarity of the two strings is 0.5, which means the two strings are already very close.

The distance of any two strings is limited and will not exceed the sum of their lengths, and we do not care what the two identical strings are like after a series of modifications. So it takes one step at a time, and the next calculation is recursive. The JAVA implementation is as follows:

1/** *//**
2 *
3 */
4package org.blogjava.arithmetic;
5
6import Java.util.HashMap;
7import Java.util.Map;
8
9/** *//**
Ten * @author Jack.wang
11 *
12 */
13public class Stringdistance {
14
public static final map<string, string> distance_cache = new hashmap<string, string> ();
16
The private static int caculatestringdistance (byte[] firststr, int firstbegin,
int firstend, byte[] secondstr, int secondbegin, int secondend) {
String key = Makekey (Firststr, Firstbegin, Secondstr, Secondbegin);
if (Distance_cache.get (key)!= null) {
Return Integer.parseint (Distance_cache.get (key));
} else {
if (Firstbegin >= firstend) {
if (Secondbegin >= secondend) {
0 return;
} else {
Secondend-secondbegin + 1;
28}
29}
if (Secondbegin >= secondend) {
if (Firstbegin >= firstend) {
0;
%} else {
return firstend-firstbegin + 1;
35}
36}
Panax Notoginseng if (firststr[firstbegin] = = Secondstr[secondbegin]) {
Return Caculatestringdistance (FIRSTSTR, Firstbegin + 1,
Firstend, Secondstr, Secondbegin + 1, secondend);
} else {
the int onevalue = Caculatestringdistance (firststr, Firstbegin + 1,
Firstend, Secondstr, Secondbegin + 2, secondend);
int twovalue = Caculatestringdistance (firststr, Firstbegin + 2,
Firstend, Secondstr, Secondbegin + 1, secondend);
the int threevalue = Caculatestringdistance (Firststr,
Firstbegin + 2, Firstend, SECONDSTR, Secondbegin + 2,
Secondend);
Distance_cache.put (Key, string.valueof (min onevalue, Twovalue,
(threevalue) + 1));
return min (Onevalue, Twovalue, threevalue) + 1;
51}
52}
53}
54
The public static float similarity (string stringone, String stringtwo) {
1f/Caculatestringdistance (Stringone.getbytes (), 0, Stringone
GetBytes () length-1, Stringtwo.getbytes (), 0, Stringone
GetBytes (). length-1) + 1);
59}
60
The private static int min (int onevalue, int twovalue, int threevalue) {
Onevalue > Twovalue? Twovalue
63:onevalue > Threevalue? Threevalue:onevalue;
64}
65
The private static String Makekey (byte[] firststr, int firstbegin,
Byte[] secondstr, int secondbegin) {
StringBuffer sb = new StringBuffer ();
Sb.append return (FIRSTSTR). Append (Firstbegin). Append (Secondstr). Append (
Secondbegin). toString ();
71}
72
73/** *//**
args * @param
75 */
string[public static void Main (] args) {
The Float i = stringdistance.similarity ("Jacklovvedyou", "jacklodveyou");
System.out.println (i);
79}
80}
81

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.