By using Word segmentation technology, two string matching and similarity ratios are generated.

Source: Internet
Author: User

  The business scenario is that the customer needs to submit a material list when the business is processed, the material will enter the material library, and the next time the customer enters the customer's ID card, it will be loaded through the material library, and we will not need to manually upload the material through the material name matching material similarity. ( first need ikanalyzer2012ff_u1.jar for download support jar)

1. The following is the core algorithm for processing two words

 PackageCom.ikanalyzer;ImportJava.util.HashMap;ImportJava.util.Iterator;ImportJava.util.Map;ImportJava.util.Vector;/*** Description: Percentage of similarity *@author: Administrator * @Date: 2015-1-22 pm 1:20:34 *@version1.0*/ Public classikanalyzerutil{//threshold Value     Public Static DoubleYuzhi = 0.2 ; /*** Return percentage *@author: Administrator * @Date: January 22, 2015 *@paramT1 *@paramT2 *@return     */     Public Static DoubleGetsimilarity (vector<string> T1, vector<string> T2)throwsException {intSize = 0, Size2 = 0 ; if(T1! =NULL&& (size = t1.size ()) > 0 && T2! =NULL&& (size2 = t2.size ()) > 0) {Map<string,Double[]> T =NewHashmap<string,Double[]>(); //T1 and T2 of the Assembly TString index =NULL ;  for(inti = 0; i < size; i++) {Index=T1.get (i); if(Index! =NULL){                    Double[] C =T.get (index); C=New Double[2]; c[0] = 1;//semantic score ci of T1C[1] = Yuzhi;//semantic score ci of T2T.put (index, c); }            }                  for(inti = 0; i < size2; i++) {Index=T2.get (i); if(Index! =NULL ){                    Double[] C =T.get (index); if(c! =NULL&& C.length = = 2) {c[1] = 1;//T2 also exist in the T2, the semantic score =1}Else{C=New Double[2]; c[0] = Yuzhi;//semantic score ci of T1C[1] = 1;//semantic score ci of T2T.put (index, c); }                }            }                            //start calculation, percentIterator<string> it =T.keyset (). iterator (); DoubleS1 = 0, s2 = 0, ssum = 0;//S1, S2             while(It.hasnext ()) {Double[] C =T.get (It.next ()); Ssum+ = C[0]*c[1]; S1+ = C[0]*c[0]; S2+ = C[1]*c[1]; }            //percentage            returnSSUM/MATH.SQRT (s1*S2); } Else {            Throw NewException ("There is a problem with the incoming parameter! "); }    }}

2. The following is a method of calling a word breaker returns the similarity of two words

 PackageCom.ikanalyzer;Importjava.io.IOException;ImportJava.io.StringReader;ImportJava.util.Vector;ImportOrg.wltea.analyzer.core.IKSegmenter;ImportOrg.wltea.analyzer.core.Lexeme; Public classCheckthesame {/*** participle *@author: Administrator * @Date: March 5, 2016 15:10:47 *@paramSTR *@return */ Public StaticVector<string>participle (String str) {Vector<String> str1 =NewVector<string> ();//word breaker on input        Try{StringReader Reader=NewStringReader (str); Iksegmenter ik=NewIksegmenter (Reader,false);//when True, the word breaker is intelligently slicedLexeme lexeme =NULL ;  while((Lexeme = Ik.next ())! =NULL) {Str1.add (Lexeme.getlexemetext ()); }                            if(str1.size () = = 0 ) {            return NULL ; }                 //after participle//System.out.println ("str after participle:" + str1);            } Catch(IOException E1) {//System.out.println ();    }    returnstr1;}/*** Returns the similarity of the two strings compared *@paramStrone *@paramStrtwo *@return */ Publicstring Getsemblance (String strone,string strtwo) {string semblancestring= "0.0000"; //participleVector<string> strs1 =participle (strone); Vector<String> strs2 =participle (strtwo); //return similarity based on participle    Doublesame = 0 ; Try{Same=ikanalyzerutil.getsimilarity (strs1, strs2); } Catch(Exception e) {//System.out.println (E.getmessage ());} semblancestring=string.valueof (same); //System.out.println ("similarity:" + same);    returnsemblancestring;}  Public Static voidMain (string[] args) {//participleVector<string> strs1 = participle ("Proof of Identity" ) ; Vector<String> strs2 = participle ("Copy of personal identification certificate" ) ; //return similarity based on participle        Doublesame = 0 ; Try{Same=ikanalyzerutil.getsimilarity (strs1, strs2); } Catch(Exception e) {System.out.println (E.getmessage ()); } System.out.println ("Similarity:" +same); }}

Specifically in the implementation of the following

Ikanalyzer also has a lot of algorithms to do the similarity of the match forget later more research

By using Word segmentation technology, two string matching and similarity ratios are generated.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.