Use Python to calculate the accuracy of Word segmentation results, recall and F-values
The test file Output.txt format is as follows:
Regiment b B round e e is be being is spring b B section e e thousand b b e e no b b Change E E's be is ... ... ....
The Python code is as follows:
1, read into the Output.txt file, and establish the corresponding column named ' character ', ' train ', ' Test ' dataframe
ImportPandasline=[]file=open (R'e:\ \ Wang Dongbo \CRF related \CRF \crf++ tools\output.txt','R', encoding='Utf-8') forIinchfile.readlines (): I=i[0:-1] ifLen (i)!=0 andLen (i)!=1: Line.append (I.split ('\ t')) DF=pandas. DataFrame (line,columns=['character','Train','Test'])
Note: Adding new rows Using Df.loc is too slow, so use the list to convert to Dataframe
2. Construct new Dataframe to save accurate parts of participle
Correct=df[df.train==df.test]
3. Calculate recall rate, accuracy and f value
forIinch('B','C','E',' be'): R=sum (correct.test==i)/sum (df.train==i) P=sum (correct.test==i)/sum (df.test==i) F=r*p*2/(r+P)PrintI': \ n','r='R'p='P'f='F
The calculation results are as follows:
b:r= 0.915480621852 p= 0.87615255658 f= 0.895384944855c:r= 0.674981658107 p= 0.757201646091 f= 0.713731574864e:r= 0.919001751313 p= 0.879715004191 f= 0.898929336188be : R= 0.865064695009 p= 0.940703517588 f= 0.901299951854
Accuracy rate of segmentation result and recall rate calculation-python