In the previous article on the LIBSVM process and the simple Java code test to explain, this article is simple to libsvm how to practice in the project short description, inappropriate places welcome you to correct.
The first step is to adjust the predictive function of LIBSVM, and I'm taking some of the code from the Svm_predict class to make predictive defenses, the code is as follows:
/*** Classify incoming text features according to a well-trained classification model *@parammodel already well-trained models *@paramContentfeature The characteristics of the incoming computed text participle *@return */ Public Static intlibsvmpredict (Svm_model model, String contentfeature) {//Default Classification category is-1 intLabel =-1; //determines whether the incoming text feature is empty if(Contentfeature = =NULL)returnlabel; //slicing an incoming featureStringTokenizer st =NewStringTokenizer (Contentfeature, "\t\n\r\f:"); //target is not used here, it will be used in the test, that is, the category tag of our test corpus//double target = atof (St.nexttoken ()); intm = St.counttokens ()/2; svm_node[] x=NewSvm_node[m]; for(intj=0;j<m;j++) {X[j]=NewSvm_node (); X[j].index=atoi (St.nexttoken ()); X[j].value=atof (St.nexttoken ()); } Doublev =svm.svm_predict (model,x); Label= (int) v; returnlabel; }
View Code
The second step treats the classified text according to the method described in the previous article to generate LIBSVM required format according to the terms thesaurus, note I here in order to facilitate only the word TF,IDF default is 1, the code is as follows:
/*** Get Word glossary for model participle *@paramTermspath *@return */ Public StaticMap<string, integer>getmodelterms (String termspath) {Map<string, integer> termsmap =NewHashmap<string, integer>( ); Try{String Termsstr= Fileoptionutil.readfile (Termspath, "UTF-8" ); if(Termsstr! =NULL) {string[] terms= Termsstr.split ("\ r \ n" ); if(terms!=NULL&& terms.length>0){ for(inti=0; i<terms.length; i++) {String term=Terms[i]; String[] TERMM= Term.split ("\ t" ); if(termm!=NULL&& termm.length==2) {termsmap.put (termm[0], Integer.parseint (termm[1]) ); } } } } } Catch(IOException e) {e.printstacktrace (); } returnTermsmap; } Public StaticString Getcontentfeature (string content, map<string, integer>terms) {String Contentfature= ""; //word breaker for incoming textmap<string, integer> contenttermsmap =hanlpanalyser.segstring (content); Map<integer, double> CONTENTTFIDF =NewTreemap<integer, Double> (NewComparator<integer>() {@Override Public intCompare (integer O1, integer o2) {returnO1.compareto (O2); } } ); //Calculate TF-IDF, here we use a single stored tf instead of a TF-IDF,IDF value of 1 for(String word:contentTermsMap.keySet ()) {if(Terms.containskey (Word)) {Contenttfidf.put (Terms.get (Word), GETWORDTF (Word, contenttermsmap)) ; } } for(Integer key:contentTfIdf.keySet ()) {contentfature+ = key + ":" + contenttfidf.get (Key) + ""; } returnContentfature.trim (); }
View Code
The third step is classified based on the above method, the main method code is as follows:
Public Static voidMain (string[] args) {String s= "     in accordance with the Shanghai Stock Exchange issued by the Securities company to create Baiyun airport warrants related matters of \ n" + "notice", Everbright Securities Co., Ltd. to the Shanghai Stock Exchange application to write off the Baiyun machine field warrants and \ n "+" have been approved, China Securities Registration and Settlement Co., Ltd. Shanghai branch has to handle the corresponding registration procedures. This \ \ "+" Company was allowed to write off the number of the Baiyun airport put warrants 15 million, the terms of the warrant and the original white \ n "+" cloud Airport put warrant (transaction referred to as airport JTP1, transaction code 580998, the right code 582998 The "+" section is exactly the same. \ n "+"     \n "; Map<string, integer> terms = libsvmdataprocess.getmodelterms ("/users/zhouyh/work/yanfa/xunlianji/utf8/heji/ Terms.txt " ); String contentfeature=Libsvmdataprocess.getcontentfeature (s, terms); Svm_model Model= Getsvmmodel.getsvmmodelinstance (). Getmodel ("/users/zhouyh/work/yanfa/xunlianji/utf8/heji/model.txt" ); intLabel =libsvmpredict (model, contentfeature); SYSTEM.OUT.PRINTLN (label); }
View Code
The test results, the finance and economics classes and the corpus categories we selected are also consistent, as shown in:
Finally, this code only for the LIBSVM how to practice in the project to do the next process, follow-up in the project to use, but also need to make a lot of adjustments.
LIBSVM Java Engineering Practice