Recently, a project that applied maximum entropy model to movie-review two-yuan affective classification was done.
The maximum entropy model applied is the maximum entropy tool http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html of Professor Le Zhang.
The analysis data Movie-review application is Bo-pang http://www.cs.cornell.edu/people/pabo/movie-review-data/
As shown in Movie-review, since the Movie-review storage format is not the format required by the maximum entropy model, it is necessary to collate the Movie-review data.
。
The maximum entropy model requires that the first word in each line represents a category, so you need to convert the TXT file storage format above.
The data should be converted in the following C + + program:
/************************************************ Creator: Hangyuan li ago * creation time: 2014.12.14* Creation Purpose: Provides the format conversion function for the maximum entropy classifier of Le Zhang ************** /#include <string> #include <iostream> #include <fstream> # Include <io.h> #include <set>using namespace std;void ReadFile (set<string>& a,char* dir) {//Read di All txt filenames in the R folder and store the file name in the Set array//dir represents the file pathname, and a represents the set array name _finddata_t Filedir; char* dir= "Temp\\*.txt"; Long Lfdir; if ((Lfdir = _findfirst (dir,&filedir)) ==-1l) printf ("No file is found\n"); else{do{A.insert (filedir.name); }while (_findnext (lfdir, &filedir) = = 0); } _findclose (Lfdir); }int Main () {char* filename1= "pos\\*.txt"; all txt files in//pos folder char* filename2= "Neg\\*.txt";//neg folder all txt files set< String> posfile;//is used to store all txt filenames in the POS folder set<string> negfile;//is used to store all txt filenames in neg folder ReadFile (Posfile, FILENAME1); ReadFile (negfile,filename2); Ofstream outfile;//Create a new file output stream Outfile.open ("Result.txt");//output file named Result.txtifstream infile;//Create a new file input stream string sentence;// The string used to convert the file format for (Set<string>::iterator Iiter=posfile.begin (); Iiter!=posfile.end (); iiter++) {Infile.open (" Pos\\ "+*iiter); if (!infile) {cout<<" Can not open file "<<endl;system (" pause "); outfile<<endl<< "POS"; while (true) {infile>>sentence; Determines whether the end of the file is read, and if the end of the file is read, jumps out of the while () loop if (infile.eof ()) break; outfile<<sentence<< ""; } infile.close ();} For (Set<string>::iterator Jiter=negfile.begin (); Jiter!=negfile.end (); jiter++) {Infile.open ("neg\\" +*jIter) if (!infile) {cout<< "Can not open file" <<endl;system ("pause"); outfile<<endl<< "Neg", while (true) {infile>>sentence;//Determines whether the end of the file is read, and if the end of the file is read, jumps out of the while () loop if ( Infile.eof ()) break;outfile<<sentence<< ""; }infile.close ();} Outfile.close ();/*for (Set<string>::iterator iiter=posfile.begin (); Iiter!=posfile.end (); Iiter++) {outfile<<*iiter<< "";} Outfile<<endl;for (Set<string>::iterator jiter=negfile.begin (); Jiter!=negfile.end (); jIter++) {outfile <<*jIter<< "";} */system ("pause"); return 0;}
We take 90% of the data as training data and take 10% of the data as test data. The training data is transformed to get a result.txt file that is trained in these formats and then tested with data.
1) Training Command:
Where MaxEnt is the Run command;-M indicates the name of the model for the training output, given by modelname;-I indicates the number of times the training iteration is, and train.txt is the input feature text. This form will not have training information displayed
2) Test:
Outputs the predicted results for each event
Output detailed probability information
A command line implementation method for Le Zhang C + + maximum entropy model