LIBSVM Java Engineering Practice

Source: Internet
Author: User
Tags idf

In the previous article on the LIBSVM process and the simple Java code test to explain, this article is simple to libsvm how to practice in the project short description, inappropriate places welcome you to correct.

The first step is to adjust the predictive function of LIBSVM, and I'm taking some of the code from the Svm_predict class to make predictive defenses, the code is as follows:

/*** Classify incoming text features according to a well-trained classification model *@parammodel already well-trained models *@paramContentfeature The characteristics of the incoming computed text participle *@return     */     Public Static intlibsvmpredict (Svm_model model, String contentfeature) {//Default Classification category is-1        intLabel =-1; //determines whether the incoming text feature is empty        if(Contentfeature = =NULL)returnlabel; //slicing an incoming featureStringTokenizer st =NewStringTokenizer (Contentfeature, "\t\n\r\f:"); //target is not used here, it will be used in the test, that is, the category tag of our test corpus//double target = atof (St.nexttoken ());        intm = St.counttokens ()/2; svm_node[] x=NewSvm_node[m];  for(intj=0;j<m;j++) {X[j]=NewSvm_node (); X[j].index=atoi (St.nexttoken ()); X[j].value=atof (St.nexttoken ()); }        Doublev =svm.svm_predict (model,x); Label= (int) v; returnlabel; }
View Code

The second step treats the classified text according to the method described in the previous article to generate LIBSVM required format according to the terms thesaurus, note I here in order to facilitate only the word TF,IDF default is 1, the code is as follows:

/*** Get Word glossary for model participle *@paramTermspath *@return     */     Public StaticMap<string, integer>getmodelterms (String termspath) {Map<string, integer> termsmap =NewHashmap<string, integer>(  ); Try{String Termsstr= Fileoptionutil.readfile (Termspath, "UTF-8" ); if(Termsstr! =NULL) {string[] terms= Termsstr.split ("\ r \ n" ); if(terms!=NULL&& terms.length>0){                     for(inti=0; i<terms.length; i++) {String term=Terms[i]; String[] TERMM= Term.split ("\ t" ); if(termm!=NULL&& termm.length==2) {termsmap.put (termm[0], Integer.parseint (termm[1]) ); }                    }                }            }        } Catch(IOException e) {e.printstacktrace (); }        returnTermsmap; }     Public StaticString Getcontentfeature (string content, map<string, integer>terms) {String Contentfature= ""; //word breaker for incoming textmap<string, integer> contenttermsmap =hanlpanalyser.segstring (content); Map<integer, double> CONTENTTFIDF =NewTreemap<integer, Double> (NewComparator<integer>() {@Override Public intCompare (integer O1, integer o2) {returnO1.compareto (O2);        }        } ); //Calculate TF-IDF, here we use a single stored tf instead of a TF-IDF,IDF value of 1         for(String word:contentTermsMap.keySet ()) {if(Terms.containskey (Word)) {Contenttfidf.put (Terms.get (Word), GETWORDTF (Word, contenttermsmap))            ; }        }         for(Integer key:contentTfIdf.keySet ()) {contentfature+ = key + ":" + contenttfidf.get (Key) + ""; }        returnContentfature.trim (); }
View Code

The third step is classified based on the above method, the main method code is as follows:

 Public Static voidMain (string[] args) {String s= "&nbsp&nbsp&nbsp&nbsp in accordance with the Shanghai Stock Exchange issued by the Securities company to create Baiyun airport warrants related matters of \ n" + "notice", Everbright Securities Co., Ltd. to the Shanghai Stock Exchange application to write off the Baiyun machine field warrants and \ n "+" have been approved, China Securities Registration and Settlement Co., Ltd. Shanghai branch has to handle the corresponding registration procedures. This \ \ "+" Company was allowed to write off the number of the Baiyun airport put warrants 15 million, the terms of the warrant and the original white \ n "+" cloud Airport put warrant (transaction referred to as airport JTP1, transaction code 580998, the right code 582998 The "+" section is exactly the same. \ n "+" &nbsp&nbsp&nbsp&nbsp\n "; Map<string, integer> terms = libsvmdataprocess.getmodelterms ("/users/zhouyh/work/yanfa/xunlianji/utf8/heji/ Terms.txt " ); String contentfeature=Libsvmdataprocess.getcontentfeature (s, terms); Svm_model Model= Getsvmmodel.getsvmmodelinstance (). Getmodel ("/users/zhouyh/work/yanfa/xunlianji/utf8/heji/model.txt" ); intLabel =libsvmpredict (model, contentfeature);    SYSTEM.OUT.PRINTLN (label); }
View Code

The test results, the finance and economics classes and the corpus categories we selected are also consistent, as shown in:

Finally, this code only for the LIBSVM how to practice in the project to do the next process, follow-up in the project to use, but also need to make a lot of adjustments.

LIBSVM Java Engineering Practice

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.