1. In the recent work to achieve the user vehicle driving route clustering, because the data given only the user a day in the traffic card mouth monitored bayonet name: Qingdao Road-Weihai Road-Jiyang Road.
To realize the rule analysis of vehicle route through clustering, the first thing to solve is the similarity problem, we know the computational similarity can be used: space vector distance (Euclidean distance, cosine similarity) and other algorithms. However, these requirements do not adapt, it is necessary to use the editing distance to solve the problem
2. Editing the idea of distance:
A. refers to the minimum number of edit operations required between two strings, which is converted from one to the other. Permission edits include replacing one character with another character, inserting a character, and deleting a character.
For example: Turn kitten into sitting:
1.sitten (K->s)
2.sittin (e–> i)
3.sitting (g)
The concept proposed by Russian scientist Vladimir Levenshtein in 1965
3. Problem: Find the editing distance of the string, that is, a string S1 the minimum number of steps into a string S2, the operation has three kinds: add characters, delete characters, modify a character.
Parsing: First define a function-edit (I,J) that represents the editing distance of the substring of the first string to the substring of the second string of length J.
There are clearly the following dynamic programming formulas:
1.if i = = 0 and J = = 0,edit (i,j) ==0
2. If i = = 0 and J >0, edit (I,J) ==j
3.if i >0 and J ==0, edit (I,J) ==i
4.if I >= 1 and j>=1,edit (i,j) = = min{edit (i-1,j) +1, edit (i,j-1) +1, edit (i–1,j-1) +f (I,J)}, when the first string of the I-character is not equal to the second string of the third J characters, f (i,j) = 1; Otherwise, F (i,j) =0.
4. On the code:
Package Com.dk.route;/*** Edit distance, calculate the similarity of text* @author Zzy * */ Public classlevenshteindistance { Public Static voidMain (string[] args) {String str1 ="Donghai Road and Yan Er Island Road intersection Shandong Road Sea Park Bridge Shandong Road and Fushun Road intersection Liaoyang West Road and Jinsong Four road junction Chongqing Road and Zhenhua Road intersection"; String str2 ="Qingdao East toll station Xia Zhuang main station toll station S217 Zhu Lu-Zhang Qingda George Zhu Plastic Ning elevated road 300 Hui Bridge";//String str2 = "S217 Zhu Lu-Zhang Qingda George Zhu";//s217 Zhu Lu-Zhang Qingda George Zhu//String str1 = "Tsing Lan High Speed (double port-housekeeper building) k23+800 station increase direction";//String str2 = "Qingdao Road";//String str1 = "Shandong";System. out. println ("ld="+ levenshteindistance (str1, str2));//System.out.println ("sim=" +sim (STR1,STR2));}Private Static intMinintOneintBoth,intThree) {intmin = one;if(Both < min) {min = n; }if(Three <min) {min = three; }returnMin } Public Static intLevenshteindistance (String str1,string str2) {intD[][];//Matrix intn = str1.length ();intm = Str2.length ();intI//For STR1 intJ//For STR2 CharCH1;CharCH2;intTemp//Record the same character, in a matrix position is worth increment, not 0 is 1; if(n = = 0) {returnM }if(M = = 0) {returnN } d =New int[N+1] [M+1]; for(i = 0; i < n; i++) {//Initialize first columnD[i][0] = i; } for(j = 0; j<= m;j++) {//Initialize first lineD[0][J] = j; } for(i =1; i<= n;i++) {ch1 = Str1.charat (i-1); for(J=1;j <= m;j++) {CH2 = Str2.charat (j-1);if(ch1 = = CH2) {temp = 0; }Else{temp = 1; } D[i][j] = min (d[i-1][j]+1,d[i][j-1]+1,d[i-1][j-1]+temp); } }returnD[N][M]; } Public Static DoubleSim (Route Initr,route R) {if(Initr.routename = =NULL|| R.routename = =NULL){return0; } String str1 = Initr.routename; String str2 = r.routename;DoubleLD = Levenshteindistance (str1, str2);DoubleStrmax = Math.max (Str1.length (), str2.length ());DoubleSim =1-ld/strmax;//if (LD < Strmax) {//Sim = 1-ld/math.min (Str1.length (), str2.length ());// }//System.out.println (Initr.routename + "-------and the similarity of the-------" +r.routename + "=" + SIM); returnSim } }
1. Traffic Clustering: Editing distance (Levenshtein distance) Java implementation