Python Word Frequency statistics

Source: Internet
Author: User

Using Python to do a word frequency statistic

GitHub address: Fightingbob "Give me a star, thanks."

    • Word Frequency statistics
Statistics on the number of English words appearing in plain English text files "Eg: Walden Pond (English edition). txt" and recorded
  • Code implementation
  • 1 Importstring2  fromOsImportPath3With open ('Walden Pond (English version). txt','RB') as Text1:4words = [Word.strip (string.punctuation). Lower () forWordinchStr (Text1.read ()). Split ()]5Words_index =set (words)6Count_dict = {Index:words.count (index) forIndexinchWords_index}7With Open (Path.dirname (__file__) +'/file1.txt','A +') as Text2:8Text2.writelines ('The following are the results of Word frequency statistics:'+'\ n')9          forWordinchSorted (count_dict,key=LambdaX:count_dict[x],reverse=True):TenText2.writelines ('{}--{} times'. Format (Word,count_dict[word]) +'\ n') One text1.close () AText2.close ()

  • Code parsing
    • Get file, open file in binary format for reading content
      • 1 with open (' Walden (English version). txt ', ' RB ') as Text1:

    • Get word List
      • Read Content First
        • Content = Text1.read ()
      • Get the word list again (use Split () to slice the string by specifying a delimiter)
        • Words = Content.split ()
      • Change the word capitalization to lowercase and remove the words before and after
        • Word,strip (string.punctuation). Lower ()
      • Remove duplicate words
        • Words_index = set (words)
    • Set Word: Dictionary of number of words
        • Count_dict = {Index:words.count (index) for index in Words_index}
    • Write Word Frequency statistics
      • Create the file first, get the current directory, and write it as an append write
        • With open (Path.dirname (__file__) + '/file1.txt ', ' A + ') as Text2:
      • Line break Write
        • Text2.writelines (' Below is the result of Word frequency statistics: ' + ' \ n ')
      • Sort words by number of times from large to small "key=lambda x:count_dict[x" to sort by value "
        • Sorted (Count_dict,key=lambda x:count_dict[x],reverse=true)
      • Line break Write word frequency
        • Text2.writelines (' {}--{} times '. Format (Word,count_dict[word]) + ' \ n ')
      • Close Resource
        • Text1.close ()
        • Text2.close ()

GitHub address: Fightingbob "Give me a star, thanks."

Python Word Frequency statistics

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.