The test of the word breaker is usually backoff2005 script, but the backoff2005 script is run on the Linux system. If you are in a Windows system, how do you use the script? Suppose the user already has a icwb2-data compressed package.
The Perl development Environment must be installed first . :
Https://dwimperl.googlecode.com/files/dwimperl-5.14.2.1-v7-32bit.exe
Next, you need to install the diff tool:
Http://superb-dca3.dl.sourceforge.net/project/gnuwin32/diffutils/2.8.7-1/diffutils-2.8.7-1-bin.zip
Unzip the diff Tool into the E:\diffutils directory and add the E:\diffutils\bin directory to the system's environment variables.
Next, you need to modify the icwb2-data/script/score script:
Change the code of the line to:
$diff = "E:/diffutils/bin/diff";
Change the code of the 52,53 line to:( Note that the d:/tmp directory exists )
$tmp 1 = "d:/tmp/comp01$$";
$tmp 2 = "d:/tmp/comp02$$";
Next, you can execute the test command:
in the Open the command-line tool in the E:\icwb2-data directory and execute the command as follows:
E:\icwb2-data>perl Scripts/score Gold/pku_training_words.utf8 Gold/pku_test_gold
. UTF8 Gold/pku_test_gold.utf8 > Pku_maxent.score
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/58/4C/wKiom1St-WLBkmCvAADUwi-33pY308.jpg "title=" aa.png "alt=" Wkiom1st-wlbkmcvaaduwi-33py308.jpg "/>
The execution of the command takes some time to wait.
after the test command is completed, the The Pku_maxent.score file is generated under the E:\icwb2-data directory and the final result is as follows:
Insertions:0
Deletions:0
Substitutions:0
Nchange:0
Ntruth:27
Ntest:27
TRUE WORDS RECALL:1.000
TEST WORDS PRECISION:1.000
= = = SUMMARY:
= = = Totalinsertions: 0
= = = TotalDeletions: 0
= = = TotalSubstitutions: 0
= = = Totalnchange: 0
= = = Total TRUE WORDCOUNT: 104372
= = = Total TEST WORDCOUNT: 104372
= = = Total TRUE WORDSRECALL: 1.000
= = = Total TEST WORDSPRECISION: 1.000
= = = FMEASURE: 1.000
= = = OOVrate: 0.058
= = = OOV Recallrate: 1.000
= = = IV Recallrate: 1.000
###Gold/pku_test_gold.utf800001043721043721.0001.0001.0000.0581.0001.000
because we use the test set and the word segmentation result set is the same file, so the correct rate, recall, and so on are 100%.
This article is from a "little progress every Day" blog, make sure to keep this source http://sbp810050504.blog.51cto.com/2799422/1600586
Test segmentation results using backoff2005 test scripts in Windows systems