By using Word segmentation technology, two string matching and similarity ratios are generated.

Last Update:2016-05-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

　　The business scenario is that the customer needs to submit a material list when the business is processed, the material will enter the material library, and the next time the customer enters the customer's ID card, it will be loaded through the material library, and we will not need to manually upload the material through the material name matching material similarity. ( first need ikanalyzer2012ff_u1.jar for download support jar)

1. The following is the core algorithm for processing two words

 PackageCom.ikanalyzer;ImportJava.util.HashMap;ImportJava.util.Iterator;ImportJava.util.Map;ImportJava.util.Vector;/*** Description: Percentage of similarity *@author: Administrator * @Date: 2015-1-22 pm 1:20:34 *@version1.0*/ Public classikanalyzerutil{//threshold Value     Public Static DoubleYuzhi = 0.2 ; /*** Return percentage *@author: Administrator * @Date: January 22, 2015 *@paramT1 *@paramT2 *@return     */     Public Static DoubleGetsimilarity (vector<string> T1, vector<string> T2)throwsException {intSize = 0, Size2 = 0 ; if(T1! =NULL&& (size = t1.size ()) > 0 && T2! =NULL&& (size2 = t2.size ()) > 0) {Map<string,Double[]> T =NewHashmap<string,Double[]>(); //T1 and T2 of the Assembly TString index =NULL ;  for(inti = 0; i < size; i++) {Index=T1.get (i); if(Index! =NULL){                    Double[] C =T.get (index); C=New Double[2]; c[0] = 1;//semantic score ci of T1C[1] = Yuzhi;//semantic score ci of T2T.put (index, c); }            }                  for(inti = 0; i < size2; i++) {Index=T2.get (i); if(Index! =NULL ){                    Double[] C =T.get (index); if(c! =NULL&& C.length = = 2) {c[1] = 1;//T2 also exist in the T2, the semantic score =1}Else{C=New Double[2]; c[0] = Yuzhi;//semantic score ci of T1C[1] = 1;//semantic score ci of T2T.put (index, c); }                }            }                            //start calculation, percentIterator<string> it =T.keyset (). iterator (); DoubleS1 = 0, s2 = 0, ssum = 0;//S1, S2             while(It.hasnext ()) {Double[] C =T.get (It.next ()); Ssum+ = C[0]*c[1]; S1+ = C[0]*c[0]; S2+ = C[1]*c[1]; }            //percentage            returnSSUM/MATH.SQRT (s1*S2); } Else {            Throw NewException ("There is a problem with the incoming parameter! "); }    }}

2. The following is a method of calling a word breaker returns the similarity of two words

 PackageCom.ikanalyzer;Importjava.io.IOException;ImportJava.io.StringReader;ImportJava.util.Vector;ImportOrg.wltea.analyzer.core.IKSegmenter;ImportOrg.wltea.analyzer.core.Lexeme; Public classCheckthesame {/*** participle *@author: Administrator * @Date: March 5, 2016 15:10:47 *@paramSTR *@return */ Public StaticVector<string>participle (String str) {Vector<String> str1 =NewVector<string> ();//word breaker on input        Try{StringReader Reader=NewStringReader (str); Iksegmenter ik=NewIksegmenter (Reader,false);//when True, the word breaker is intelligently slicedLexeme lexeme =NULL ;  while((Lexeme = Ik.next ())! =NULL) {Str1.add (Lexeme.getlexemetext ()); }                            if(str1.size () = = 0 ) {            return NULL ; }                 //after participle//System.out.println ("str after participle:" + str1);            } Catch(IOException E1) {//System.out.println ();    }    returnstr1;}/*** Returns the similarity of the two strings compared *@paramStrone *@paramStrtwo *@return */ Publicstring Getsemblance (String strone,string strtwo) {string semblancestring= "0.0000"; //participleVector<string> strs1 =participle (strone); Vector<String> strs2 =participle (strtwo); //return similarity based on participle    Doublesame = 0 ; Try{Same=ikanalyzerutil.getsimilarity (strs1, strs2); } Catch(Exception e) {//System.out.println (E.getmessage ());} semblancestring=string.valueof (same); //System.out.println ("similarity:" + same);    returnsemblancestring;}  Public Static voidMain (string[] args) {//participleVector<string> strs1 = participle ("Proof of Identity" ) ; Vector<String> strs2 = participle ("Copy of personal identification certificate" ) ; //return similarity based on participle        Doublesame = 0 ; Try{Same=ikanalyzerutil.getsimilarity (strs1, strs2); } Catch(Exception e) {System.out.println (E.getmessage ()); } System.out.println ("Similarity:" +same); }}

Specifically in the implementation of the following

Ikanalyzer also has a lot of algorithms to do the similarity of the match forget later more research

By using Word segmentation technology, two string matching and similarity ratios are generated.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

By using Word segmentation technology, two string matching and similarity ratios are generated.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

By using Word segmentation technology, two string matching and similarity ratios are generated.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support