Using Python to do a word frequency statistic
GitHub address: Fightingbob "Give me a star, thanks."
- Word Frequency statistics
Statistics on the number of English words appearing in plain English text files "Eg: Walden Pond (English edition). txt" and recorded
- Code implementation
1 Importstring2 fromOsImportPath3With open ('Walden Pond (English version). txt','RB') as Text1:4words = [Word.strip (string.punctuation). Lower () forWordinchStr (Text1.read ()). Split ()]5Words_index =set (words)6Count_dict = {Index:words.count (index) forIndexinchWords_index}7With Open (Path.dirname (__file__) +'/file1.txt','A +') as Text2:8Text2.writelines ('The following are the results of Word frequency statistics:'+'\ n')9 forWordinchSorted (count_dict,key=LambdaX:count_dict[x],reverse=True):TenText2.writelines ('{}--{} times'. Format (Word,count_dict[word]) +'\ n') One text1.close () AText2.close ()
- Code parsing
- Get file, open file in binary format for reading content
- Get word List
- Read Content First
- Get the word list again (use Split () to slice the string by specifying a delimiter)
- Change the word capitalization to lowercase and remove the words before and after
- Word,strip (string.punctuation). Lower ()
- Remove duplicate words
- Words_index = set (words)
- Set Word: Dictionary of number of words
- Count_dict = {Index:words.count (index) for index in Words_index}
- Write Word Frequency statistics
- Create the file first, get the current directory, and write it as an append write
- With open (Path.dirname (__file__) + '/file1.txt ', ' A + ') as Text2:
- Line break Write
- Text2.writelines (' Below is the result of Word frequency statistics: ' + ' \ n ')
- Sort words by number of times from large to small "key=lambda x:count_dict[x" to sort by value "
- Sorted (Count_dict,key=lambda x:count_dict[x],reverse=true)
- Line break Write word frequency
- Text2.writelines (' {}--{} times '. Format (Word,count_dict[word]) + ' \ n ')
- Close Resource
- Text1.close ()
- Text2.close ()
GitHub address: Fightingbob "Give me a star, thanks."
Python Word Frequency statistics