Python statistics document morphemes-Frequency applet
Python version 2.7
The program is as follows, test files and complete programs in my GitHub
1 # count the number of spaces and the number of words this function returns only the number of spaces needed to return multiple values on its own2 defcount_space (path):3Number_counts =04Space_counts =05Number_list = []6 7With open (path,'R') as F:8 forLineinchF:9line =Line.strip ()TenSpace_split_list = Line.split (' ') OneSpace_counts + = Len (space_split_list)-1 A forWordinchspace_split_list: - ifword.isdigit (): - number_list.append (Word) theNumber_counts =Len (number_list) - - returnspace_counts - # Uppercase to lowercase filter special characters, etc. + defCount_word (path): -result = {} + With open (path) as Fileread: AAlltext =Fileread.read () at -Alltext =Alltext.lower () - -Alltext = Re.sub ("\"|,|\.","", Alltext) - - forWordinchalltext.split (): in ifWord not inchResult: -Result[word] =0 toResult[word] + = 1 + - returnresult the * $ defSort_by_count (d):Panax Notoginseng -D = Collections. Ordereddict (Sorted (D.items (), key =LambdaT:-t[1])) the returnD + A the if __name__=='__main__': + Try: -filename ='Read.txt' $ $DWORD =count_word (filename) -DWORD =Sort_by_count (DWORD) - theCountspace =count_space (filename) - Print "space_counts", CountspaceWuyi count_word (filename) the forKey,valueinchDword.items (): - PrintKey +":%d"%value Wu - exceptIOError: About Print 'cannot open file%s for read'% filename
Python Statistical document Morphemes frequency