using Maxent\icwb2-data to predict the stagnation boundary of Chinese prosodic words
1. Rhythm prediction with maximum entropy at command line
Using Dr Le Zhang's Maximum entropy toolkit to predict
[Http://http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html#intro] (http://http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html#intro "Toolkit download URL")
Training Corpus generation model file
maxent-m modelname-i 30-v maxtrain.txt >2.txt
//Predictive results
maxent-p-M Modelname-o maxoutput . txt maxtest.txt
//Use command--detail to generate the correct rate of n\y per article Maxent-p-M modelname--detail-o maxoutput.txt maxtest.txt
> Maxent_accuracy.txt
//Can
view Help documents with Maxent-h
2. Assessment of forecast results using the Icwb2-data toolkit
2.1 Tool Pack download address
http://http://sighan.cs.uchicago.edu/bakeoff2005/
Usage
Because the toolkit is executed under Linux, some software needs to be installed if you want to forecast under the Windows command line.
First : Install ActivePerl to identify the file with a suffix named. pl. This can be obtained from Baidu at night.
Second : Install the Diffutil Toolkit.
Http://http://gnuwin32.sourceforge.net/packages/diffutils.htm
Download the installation package directly, do not download bin package, which is steup format.
You also need to download two dynamic link libraries: Libint3.dll, Libiconv3.dll. These two can be installed directly under the Diffutil URL under the installation package.
Note to be installed under the same directory
finally :
Next, you need to modify the Icwb2-data/script/score script:
Modify 46 lines of code to:
$diff = "E:/diffutils/bin/diff"; the directory is the installation directory
Modify 52, 53 lines of code to: (Note that the D:/tmp directory exists)
tmp1= "D:/tmp/comp01 tmp1 =" d:/tmp/comp01$ ";
Tmp2= "d:/tmp/comp02 tmp2 =" d:/tmp/comp02$ ";
In addition, add the Diffutil Bin directory ActivePerl to the environment variable. 2.2 Command Run
Go to the previous forecast and test makefile directory
Perl score training file test file name output result name > output to filename
//For example:
perl score Maxtrain11.txt Maxtest11.txt Maxoutput.txt > O.txt
In the output file (o.txt), you can see the correct rate of recall and things like that.