Ways to use mallet under eclipse

Source: Internet
Author: User

Mallet is an open source library of the UMass Daniel developed by the statistical natural language processing, a good thing. Can be used to learn topic model, training me models and so on. For developers, the technical documentation of their official website is very effective.

Mallet Download The address, browse the developer documentation, just click on the corresponding "Developer's Guide".

The following is an example of developing a simple maximum entropy classification model, which can be referenced in the documentation .

First download the Mallet Toolkit, which contains the code and the jar package, for simplicity, we import Mallet.jar and Mallet-deps.jar under Mallet-2.0.7\dist, and the import jar package process is: Project Right-click Properties->java Build path->libraries, click "Add JARs" and select the appropriate jar package in the path.

Create a new MaxEnt class with the following code:

ImportJava.io.File;ImportJava.io.FileInputStream;Importjava.io.FileNotFoundException;ImportJava.io.FileOutputStream;ImportJava.io.FileReader;Importjava.io.IOException;ImportJava.io.ObjectInputStream;ImportJava.io.ObjectOutputStream;Importjava.io.Serializable;Importjava.util.ArrayList;Importjava.util.Arrays;Importjava.util.List;ImportCc.mallet.classify.Classifier;ImportCc.mallet.classify.ClassifierTrainer;ImportCc.mallet.classify.MaxEntTrainer;Importcc.mallet.classify.Trial;ImportCc.mallet.pipe.iterator.CsvIterator;ImportCc.mallet.types.Alphabet;ImportCc.mallet.types.FeatureVector;Importcc.mallet.types.Instance;Importcc.mallet.types.InstanceList;ImportCc.mallet.types.Label;ImportCc.mallet.types.LabelAlphabet;Importcc.mallet.types.Labeling;Importcc.mallet.util.Randoms; Public classMaxentImplementsserializable{//Train a classifier     Public StaticClassifier trainclassifier (instancelist traininginstances) {//Here we use a maximum entropy (ie polytomous logistic regression) classifier. Classifiertrainer trainer =NewMaxenttrainer (); returnTrainer.train (traininginstances); }        //Save a trained classifier/write a trained classifier to disk     Public voidSaveclassifier (Classifier classifier,string Savepath)throwsioexception{ObjectOutputStream Oos=NewObjectOutputStream (NewFileOutputStream (Savepath));        Oos.writeobject (classifier);        Oos.flush ();            Oos.close (); }        //Restore a saved classifier     PublicClassifier loadclassifier (String savedpath)throwsFileNotFoundException, IOException, classnotfoundexception{//Here we load a serialized classifier from a file.Classifier Classifier; ObjectInputStream Ois=NewObjectInputStream (NewFileInputStream (NewFile (Savedpath)); Classifier=(Classifier) ois.readobject ();        Ois.close (); returnclassifier; }        //Predict & Evaluate     PublicString Predict (Classifier classifier,instance testinstance) {Labeling Labeling=classifier.classify (testinstance). getlabeling (); Label Label=Labeling.getbestlabel (); return(String) label.getentry (); }         Public voidEvaluate (Classifier Classifier, String Testfilepath)throwsIOException {instancelist testinstances=Newinstancelist (Classifier.getinstancepipe ()); //format of input data:[name] [Label] [data ...] Csviterator reader =NewCsviterator (NewFileReader (NewFile (Testfilepath)), "(\\w+) \\s+ (\\w+) \\s+ (. *)", 3, 2, 1);//(data, label, name) field indices//ADD All instances loaded by the iterator to our instance listtestinstances.addthrupipe (reader); Trial Trial=NewTrial (classifier, testinstances); //evaluation Metrics.precision, recall, and F1SYSTEM.OUT.PRINTLN ("Accuracy:" +trial.getaccuracy ()); System.out.println ("F1 for class ' good ':" + TRIAL.GETF1 ("good")); System.out.println ("Precision for Class" +Classifier.getlabelalphabet (). Lookuplabel (1) + "':" +Trial.getprecision (1)); }    //perform n-fold cross validation      Public StaticTrial Testtrainsplit (maxenttrainer trainer, instancelist instances) {intTRAINING = 0; inttesting = 1; intVALIDATION = 2; //Split the input list into training (90%) and testing (10%) lists.instancelist[] instancelists = Instances.split (NewRandoms (),New Double[] {0.9, 0.1, 0.0}); Classifier Classifier=Trainclassifier (instancelists[training]); return NewTrial (classifier, instancelists[testing]); }          Public Static voidMain (string[] args)throwsfilenotfoundexception,ioexception{//Define Training SamplesAlphabet Featurealphabet =NewAlphabet ();//Feature DictionariesLabelalphabet Targetalphabet =NewLabelalphabet ();//class label DictionariesTargetalphabet.lookupindex ("Positive"); Targetalphabet.lookupindex ("Negative"); Targetalphabet.lookupindex ("Neutral");        Targetalphabet.stopgrowth (); Featurealphabet.lookupindex ("F1"); Featurealphabet.lookupindex ("F2"); Featurealphabet.lookupindex ("F3"); Instancelist traininginstances=NewInstancelist (Featurealphabet,targetalphabet);//instance Set Object        Final intSize =targetalphabet.size (); Double[] FeatureValues1 = {1.0, 0.0, 0.0}; Double[] FeatureValues2 = {2.0, 0.0, 0.0}; Double[] FeatureValues3 = {0.0, 1.0, 0.0}; Double[] FeatureValues4 = {0.0, 0.0, 1.0}; Double[] featureValues5 = {0.0, 0.0, 3.0}; String[] Targetvalue= {"Positive", "positive", "neutral", "negative", "negative"}; List<Double[]> featurevalues =arrays.aslist (FEATUREVALUES1,FEATUREVALUES2,FEATUREVALUES3,FEATUREVALUES4,FEATUREVALUES5); inti = 0;  for(Double[]featurevalue:featurevalues] {Featurevector Featurevector=NewFeaturevector (Featurealphabet, (string[]) Targetalphabet.toarray (NewString[size]), featurevalue);//Change list to arrayInstance Instance =NewInstance (Featurevector,targetalphabet.lookuplabel (Targetvalue[i]), "xxx",NULL); I++;        Traininginstances.add (instance); } Maxent Maxent=NewMaxent (); Classifier Maxentclassifier=Maxent.trainclassifier (traininginstances); //Loading test Examples        Double[] Testfeaturevalues = {0.5, 0.5, 6.0}; Featurevector Testfeaturevector=NewFeaturevector (Featurealphabet, (string[]) Targetalphabet.toarray (Newstring[size]), testfeaturevalues); //new Instance (Data,target,name,source)Instance testinstance =NewInstance (Testfeaturevector,targetalphabet.lookuplabel ("negative"), "xxx",NULL);        System.out.print (Maxent.predict (Maxentclassifier, testinstance)); //maxent.evaluate (Maxentclassifier, "resource/testdata.txt");    }}

Note: Traininginstances is a training sample, Testinstance is a test sample, and the execution result of the program is "negative".

Ways to use mallet under eclipse

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.