1. Traffic Clustering: Editing distance (Levenshtein distance) Java implementation

Source: Internet
Author: User
Tags first string

1. In the recent work to achieve the user vehicle driving route clustering, because the data given only the user a day in the traffic card mouth monitored bayonet name: Qingdao Road-Weihai Road-Jiyang Road.

To realize the rule analysis of vehicle route through clustering, the first thing to solve is the similarity problem, we know the computational similarity can be used: space vector distance (Euclidean distance, cosine similarity) and other algorithms. However, these requirements do not adapt, it is necessary to use the editing distance to solve the problem

2. Editing the idea of distance:

A. refers to the minimum number of edit operations required between two strings, which is converted from one to the other. Permission edits include replacing one character with another character, inserting a character, and deleting a character.

For example: Turn kitten into sitting:

1.sitten (K->s)

2.sittin (e–> i)

3.sitting (g)

The concept proposed by Russian scientist Vladimir Levenshtein in 1965

3. Problem: Find the editing distance of the string, that is, a string S1 the minimum number of steps into a string S2, the operation has three kinds: add characters, delete characters, modify a character.

Parsing: First define a function-edit (I,J) that represents the editing distance of the substring of the first string to the substring of the second string of length J.

There are clearly the following dynamic programming formulas:

1.if i = = 0 and J = = 0,edit (i,j) ==0

2. If i = = 0 and J >0, edit (I,J) ==j

3.if i >0 and J ==0, edit (I,J) ==i

4.if I >= 1 and j>=1,edit (i,j) = = min{edit (i-1,j) +1, edit (i,j-1) +1, edit (i–1,j-1) +f (I,J)}, when the first string of the I-character is not equal to the second string of the third J characters, f (i,j) = 1; Otherwise, F (i,j) =0.

4. On the code:

Package Com.dk.route;/*** Edit distance, calculate the similarity of text* @author Zzy * */ Public classlevenshteindistance { Public Static voidMain (string[] args) {String str1 ="Donghai Road and Yan Er Island Road intersection Shandong Road Sea Park Bridge Shandong Road and Fushun Road intersection Liaoyang West Road and Jinsong Four road junction Chongqing Road and Zhenhua Road intersection"; String str2 ="Qingdao East toll station Xia Zhuang main station toll station S217 Zhu Lu-Zhang Qingda George Zhu Plastic Ning elevated road 300 Hui Bridge";//String str2 = "S217 Zhu Lu-Zhang Qingda George Zhu";//s217 Zhu Lu-Zhang Qingda George Zhu//String str1 = "Tsing Lan High Speed (double port-housekeeper building) k23+800 station increase direction";//String str2 = "Qingdao Road";//String str1 = "Shandong";System. out. println ("ld="+ levenshteindistance (str1, str2));//System.out.println ("sim=" +sim (STR1,STR2));}Private Static intMinintOneintBoth,intThree) {intmin = one;if(Both < min)        {min = n; }if(Three <min)        {min = three; }returnMin } Public Static intLevenshteindistance (String str1,string str2) {intD[][];//Matrix        intn = str1.length ();intm = Str2.length ();intI//For STR1        intJ//For STR2        CharCH1;CharCH2;intTemp//Record the same character, in a matrix position is worth increment, not 0 is 1;        if(n = = 0) {returnM }if(M = = 0) {returnN } d =New int[N+1] [M+1]; for(i = 0; i < n; i++) {//Initialize first columnD[i][0] = i; } for(j = 0; j<= m;j++) {//Initialize first lineD[0][J] = j; } for(i =1; i<= n;i++) {ch1 = Str1.charat (i-1); for(J=1;j <= m;j++) {CH2 = Str2.charat (j-1);if(ch1 = = CH2)                 {temp = 0; }Else{temp = 1;             } D[i][j] = min (d[i-1][j]+1,d[i][j-1]+1,d[i-1][j-1]+temp); }         }returnD[N][M]; } Public Static DoubleSim (Route Initr,route R) {if(Initr.routename = =NULL|| R.routename = =NULL){return0;        } String str1 = Initr.routename; String str2 = r.routename;DoubleLD = Levenshteindistance (str1, str2);DoubleStrmax = Math.max (Str1.length (), str2.length ());DoubleSim =1-ld/strmax;//if (LD < Strmax) {//Sim = 1-ld/math.min (Str1.length (), str2.length ());//        }//System.out.println (Initr.routename + "-------and the similarity of the-------" +r.routename + "=" + SIM);        returnSim }    }

1. Traffic Clustering: Editing distance (Levenshtein distance) Java implementation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.