Mallet is an open source library of the UMass Daniel developed by the statistical natural language processing, a good thing. Can be used to learn topic model, training me models and so on. For developers, the technical documentation of their official website is very effective.
Mallet Download The address, browse the developer documentation, just click on the corresponding "Developer's Guide".
The following is an example of developing a simple maximum entropy classification model, which can be referenced in the documentation .
First download the Mallet Toolkit, which contains the code and the jar package, for simplicity, we import Mallet.jar and Mallet-deps.jar under Mallet-2.0.7\dist, and the import jar package process is: Project Right-click Properties->java Build path->libraries, click "Add JARs" and select the appropriate jar package in the path.
Create a new MaxEnt class with the following code:
ImportJava.io.File;ImportJava.io.FileInputStream;Importjava.io.FileNotFoundException;ImportJava.io.FileOutputStream;ImportJava.io.FileReader;Importjava.io.IOException;ImportJava.io.ObjectInputStream;ImportJava.io.ObjectOutputStream;Importjava.io.Serializable;Importjava.util.ArrayList;Importjava.util.Arrays;Importjava.util.List;ImportCc.mallet.classify.Classifier;ImportCc.mallet.classify.ClassifierTrainer;ImportCc.mallet.classify.MaxEntTrainer;Importcc.mallet.classify.Trial;ImportCc.mallet.pipe.iterator.CsvIterator;ImportCc.mallet.types.Alphabet;ImportCc.mallet.types.FeatureVector;Importcc.mallet.types.Instance;Importcc.mallet.types.InstanceList;ImportCc.mallet.types.Label;ImportCc.mallet.types.LabelAlphabet;Importcc.mallet.types.Labeling;Importcc.mallet.util.Randoms; Public classMaxentImplementsserializable{//Train a classifier Public StaticClassifier trainclassifier (instancelist traininginstances) {//Here we use a maximum entropy (ie polytomous logistic regression) classifier. Classifiertrainer trainer =NewMaxenttrainer (); returnTrainer.train (traininginstances); } //Save a trained classifier/write a trained classifier to disk Public voidSaveclassifier (Classifier classifier,string Savepath)throwsioexception{ObjectOutputStream Oos=NewObjectOutputStream (NewFileOutputStream (Savepath)); Oos.writeobject (classifier); Oos.flush (); Oos.close (); } //Restore a saved classifier PublicClassifier loadclassifier (String savedpath)throwsFileNotFoundException, IOException, classnotfoundexception{//Here we load a serialized classifier from a file.Classifier Classifier; ObjectInputStream Ois=NewObjectInputStream (NewFileInputStream (NewFile (Savedpath)); Classifier=(Classifier) ois.readobject (); Ois.close (); returnclassifier; } //Predict & Evaluate PublicString Predict (Classifier classifier,instance testinstance) {Labeling Labeling=classifier.classify (testinstance). getlabeling (); Label Label=Labeling.getbestlabel (); return(String) label.getentry (); } Public voidEvaluate (Classifier Classifier, String Testfilepath)throwsIOException {instancelist testinstances=Newinstancelist (Classifier.getinstancepipe ()); //format of input data:[name] [Label] [data ...] Csviterator reader =NewCsviterator (NewFileReader (NewFile (Testfilepath)), "(\\w+) \\s+ (\\w+) \\s+ (. *)", 3, 2, 1);//(data, label, name) field indices//ADD All instances loaded by the iterator to our instance listtestinstances.addthrupipe (reader); Trial Trial=NewTrial (classifier, testinstances); //evaluation Metrics.precision, recall, and F1SYSTEM.OUT.PRINTLN ("Accuracy:" +trial.getaccuracy ()); System.out.println ("F1 for class ' good ':" + TRIAL.GETF1 ("good")); System.out.println ("Precision for Class" +Classifier.getlabelalphabet (). Lookuplabel (1) + "':" +Trial.getprecision (1)); } //perform n-fold cross validation Public StaticTrial Testtrainsplit (maxenttrainer trainer, instancelist instances) {intTRAINING = 0; inttesting = 1; intVALIDATION = 2; //Split the input list into training (90%) and testing (10%) lists.instancelist[] instancelists = Instances.split (NewRandoms (),New Double[] {0.9, 0.1, 0.0}); Classifier Classifier=Trainclassifier (instancelists[training]); return NewTrial (classifier, instancelists[testing]); } Public Static voidMain (string[] args)throwsfilenotfoundexception,ioexception{//Define Training SamplesAlphabet Featurealphabet =NewAlphabet ();//Feature DictionariesLabelalphabet Targetalphabet =NewLabelalphabet ();//class label DictionariesTargetalphabet.lookupindex ("Positive"); Targetalphabet.lookupindex ("Negative"); Targetalphabet.lookupindex ("Neutral"); Targetalphabet.stopgrowth (); Featurealphabet.lookupindex ("F1"); Featurealphabet.lookupindex ("F2"); Featurealphabet.lookupindex ("F3"); Instancelist traininginstances=NewInstancelist (Featurealphabet,targetalphabet);//instance Set Object Final intSize =targetalphabet.size (); Double[] FeatureValues1 = {1.0, 0.0, 0.0}; Double[] FeatureValues2 = {2.0, 0.0, 0.0}; Double[] FeatureValues3 = {0.0, 1.0, 0.0}; Double[] FeatureValues4 = {0.0, 0.0, 1.0}; Double[] featureValues5 = {0.0, 0.0, 3.0}; String[] Targetvalue= {"Positive", "positive", "neutral", "negative", "negative"}; List<Double[]> featurevalues =arrays.aslist (FEATUREVALUES1,FEATUREVALUES2,FEATUREVALUES3,FEATUREVALUES4,FEATUREVALUES5); inti = 0; for(Double[]featurevalue:featurevalues] {Featurevector Featurevector=NewFeaturevector (Featurealphabet, (string[]) Targetalphabet.toarray (NewString[size]), featurevalue);//Change list to arrayInstance Instance =NewInstance (Featurevector,targetalphabet.lookuplabel (Targetvalue[i]), "xxx",NULL); I++; Traininginstances.add (instance); } Maxent Maxent=NewMaxent (); Classifier Maxentclassifier=Maxent.trainclassifier (traininginstances); //Loading test Examples Double[] Testfeaturevalues = {0.5, 0.5, 6.0}; Featurevector Testfeaturevector=NewFeaturevector (Featurealphabet, (string[]) Targetalphabet.toarray (Newstring[size]), testfeaturevalues); //new Instance (Data,target,name,source)Instance testinstance =NewInstance (Testfeaturevector,targetalphabet.lookuplabel ("negative"), "xxx",NULL); System.out.print (Maxent.predict (Maxentclassifier, testinstance)); //maxent.evaluate (Maxentclassifier, "resource/testdata.txt"); }}
Note: Traininginstances is a training sample, Testinstance is a test sample, and the execution result of the program is "negative".
Ways to use mallet under eclipse